Thursday, July 11, 2024

Bug's survival strategy

We recently stumbled over a bug hiding at a place where I really had to sigh:

"Sorry, really?"

The customer complaint that their documents could not be exported anymore. The system ended with a null pointer exception. Initially, we could not reproduce the problem in our internal test environment until we injected a real production document from the customer and tried that out. So far, okay. But we didn't stop there. We wanted to understand, what caused the problem. First, we assumed the resolution of the scanned document may be different (as we had some issues in the past heading towards this direction). We were wrong, that was not the problem. The resolution of the document was identical to what we had internally already.

 Then, we started to play with annotations as there were some weird errors in the back-end log files pointing to such direction. We added a marker-, a line- and a text-annotation to the document, saved it and tried again. Same issue. Then, we did the same on the second page. When we added an annotation there and marked it as a burned in annotation, the problem did NOT occur anymore for that imported document. That was an interesting observation.

 To really be sure we hit the jackpot, we removed the burned-in attribute and tried again. Indeed, now it threw the same exception again. Setting the burned-in annotation again and re-checking...yes, now worked again.

 Then, we did the same test with annotations on the first page. Not the same issue here. So, looked like it really played a significant role where the annotations were set and with which kind of attribute value.

 What a wonderful finding and great hint for our developers. That helped speeding up the fix.

After the outcome of this analysis, we asked ourselves whether we can expect the testing really to test all the possible variations like checking whether annotations of all kinds work on all sorts of page numbers using different kinds of documents such as colored, grey-scale, different DPIs, etc, etc. and all this in a time-frame that is far from fair.

 Fact is, no matter how hard and how accurate your testing is done, software will never be fully free of bugs. The complexity of software has increased dramatically during the last few decades and the number of needed test cases to gain a minimum level of coverage has grown so much, that it has become impossible to execute all tests and possible variations within the time given.

 "Even a seemingly simple program can have hundreds or thousands of possible input and output combinations. Creating test cases for all these possibilities is impractical. Complete testing of a complex application would take too long and require too many human resources to be economically feasible". [1]

 Software development today still is a process of trial-and-error. I have never seen a programmer developing a software component from scratch without several times of failed compiling, fixing, re-writing, re-testing and finally shipping it, then failing at the customer and going through the same process again.

 "Software errors are blunders caused by our inability to fully understand the intricacies of these complex products" [2]

 Ivars Peterson compares today's technology with black magic where engineers act like wizards who brew their magic potion by mixing various ingredients of terminology that only them understand while the public dazzled by the many visible achievements of modern technology often regards engineers as magicians who can solve any problem.

References

[1] The Art of Software Testing, Glenford J. Myers
[2] Fatal defect, Ivars Peterson

Tuesday, April 16, 2024

AI and a confused elevator

A colleague recently received a letter from the real estate agent, stating that several people reported a malfunction of the new elevator. The reason as it turned out after an in depth-analysis: the doors were blocked by people moving into the building while hauling furniture. This special malfunction detection was claimed to be part of the new elevator systems that is based on artificial intelligence.
The agent kindly asked the residents to NOT block the doors anymore as it confuses the elevator and it is likely for the elevator to stop working again. 

I was thinking..."really"?

 I mean...If I hear about AI in elevators, then the first thought that crosses my mind is smart traffic management [1]. For example, in our office building, at around noon, most of the employees go for lunch and call the elevators. An opportunity to do employees a great favor is to move back to the floor where people press the button right after having delivered the previous set of people. Or, if several elevators exist, make sure the elevators move to different positions in the building so people never have to wait too long to get one. 

But, I had never expected an elevator to get irritated and distracted for several days for the case where someone temporarily blocks the doors. It is surprising to me that such elevator has no built in facility to automatically reset itself after a while. It's weird that a common use-case like blocking doors temporarily wasn't even in the technical reference book and required a technician to come twice as he/she could not resolve the problem the first time.

A few weeks later, I visited my friend in his new appartment and I wanted to also see that  mystic elevator. D'oh! Another brand new elevator that does not allow to undo an incorrect choice made while clicking the wrong button. But, it contains an integrated iPad showing today's latest news.
Pardon me? Who needs that in a 4 floor building?

Often, I hear or read about software components where marketing claims they are using AI, whereas in reality the most obvious use-cases were not even considered like that undo button [2] which I'll probably miss in elevators till the end of my days.

 References

[1] https://www.linkedin.com/pulse/elevators-artificial-intelligence-ascending-towards-safety-guillemi/

[2] https://simply-the-test.blogspot.com/2018/05/no-undo-in-elevator.html