Thursday, July 11, 2024

Bug's survival strategy

We recently stumbled over a bug hiding at a place where I really had to sigh:

"Sorry, really?"

The customer complaint that their documents could not be exported anymore. The system ended with a null pointer exception. Initially, we could not reproduce the problem in our internal test environment until we injected a real production document from the customer and tried that out. So far, okay. But we didn't stop there. We wanted to understand, what caused the problem. First, we assumed the resolution of the scanned document may be different (as we had some issues in the past heading towards this direction). We were wrong, that was not the problem. The resolution of the document was identical to what we had internally already.

 Then, we started to play with annotations as there were some weird errors in the back-end log files pointing to such direction. We added a marker-, a line- and a text-annotation to the document, saved it and tried again. Same issue. Then, we did the same on the second page. When we added an annotation there and marked it as a burned in annotation, the problem did NOT occur anymore for that imported document. That was an interesting observation.

 To really be sure we hit the jackpot, we removed the burned-in attribute and tried again. Indeed, now it threw the same exception again. Setting the burned-in annotation again and re-checking...yes, now worked again.

 Then, we did the same test with annotations on the first page. Not the same issue here. So, looked like it really played a significant role where the annotations were set and with which kind of attribute value.

 What a wonderful finding and great hint for our developers. That helped speeding up the fix.

After the outcome of this analysis, we asked ourselves whether we can expect the testing really to test all the possible variations like checking whether annotations of all kinds work on all sorts of page numbers using different kinds of documents such as colored, grey-scale, different DPIs, etc, etc. and all this in a time-frame that is far from fair.

 Fact is, no matter how hard and how accurate your testing is done, software will never be fully free of bugs. The complexity of software has increased dramatically during the last few decades and the number of needed test cases to gain a minimum level of coverage has grown so much, that it has become impossible to execute all tests and possible variations within the time given.

 "Even a seemingly simple program can have hundreds or thousands of possible input and output combinations. Creating test cases for all these possibilities is impractical. Complete testing of a complex application would take too long and require too many human resources to be economically feasible". [1]

 Software development today still is a process of trial-and-error. I have never seen a programmer developing a software component from scratch without several times of failed compiling, fixing, re-writing, re-testing and finally shipping it, then failing at the customer and going through the same process again.

 "Software errors are blunders caused by our inability to fully understand the intricacies of these complex products" [2]

 Ivars Peterson compares today's technology with black magic where engineers act like wizards who brew their magic potion by mixing various ingredients of terminology that only them understand while the public dazzled by the many visible achievements of modern technology often regards engineers as magicians who can solve any problem.

References

[1] The Art of Software Testing, Glenford J. Myers
[2] Fatal defect, Ivars Peterson