Saturday, November 9, 2024

The Fosbury Flop

The Fosbury Flop & the Dunning-Kruger Effect
When Richard Fosbury surprised the audience with a new world-record in high jump at the Olympic Games in Mexico, I was just about 1 year old. Too young to understand what he achieved. I learnt about him long time later and was surprised how professionals and experts looked at him as an outsider with little chance to achieve anything meaningful.

Richard proved them all wrong and demonstrated their cognitive limits. He broke the world record with his declined technique by jumping backwards with a curved body while all others used really awkward looking ways to jump over the bar. 

Soon after Richard's breakthrough, his technique has established as THE standard...despite Cassandra.

I am sure that such incidents still happen a lot no matter in which field. Many great ideas vanish because of prejudice and limiting spartial sense. Our experience is based on what we have seen working and other things we've seen went badly. 

It's all too natural that we enforce our "formula" of success for future undertakings while we tend to reject what sounds strange or what reminds us to past bad experiences.

But, do we really understand the reasons for our success and are we clear about all factors that led to good or bad experience?

Many great ideas vanish because of some crestfallen and misled decision makers ruling their little universe. Some ideas may show up again years or decades later and claimed as the new hype while the original inventors in vain long disappeared in the dark.

Some of the knowledge and theories that people apply to their actions are sound and meet with favorable results. Others are wrong-headed or incompetent.

Want some examples?

  1. Mr. Arthur Wheeler who robbed two banks at broad daylight with this face rubbed with lemon juice. He was convinced the videotape cameras could not render it. He was arrested the same night (Fuocco 1996).

  2. 271 Fearless, was a courageous woman who proved them all wrong when old grey men claimed "Women can't and must not run a Marathon". At the Boston Marathon, some unreasonable errants tried to push her out of the crowed while running. With her courage, she changed the minds and enabled women being accredited to run Marathons despite all that hypocritical advices.

  3. Switzerland introduced the right for women to vote in 1971. Why did that take so long? If you watch interviews in Switzerland shortly before the elections, it is shocking to hear what the people had in their mind at that time. Believe it or not, but Afghanistan allowed women to vote 8 years before Switzerland.

So what?
Don't laugh at crazy ideas. Accept exotic or extraordinary thoughts and suggestions as a possible alternative even if you don't like it. Don't deny it just because it doesn't fit into your pattern of thinking. Give the team a chance to fail and succeed. If the idea proves wrong, at least you have some facts at hand to argue. If it succeeds, you have helped an innovative team achieve a breakthrough.

Wednesday, October 23, 2024

Fatal exception - or how to get rid of errors

Dave, a good friend,  sent me a WhatsApp message and made fun of a software that– when put under load - did not properly log invalid login attempts.

All he could see was “error:0” or “error:null” in the log-file.

 This reminded me to a similar situation where we were hunting down the root causes of unhandled exceptions showing several times per day at customers out of the blue. 

 The log-files didn’t really help. We suspected a third party provider software component and of course the vendor answered that it cannot come from them.

We collected more evidence and after several days of investigation we could prove with hard facts that the cause was indeed the third party component. We turned off the suspected configuration and it all worked smoothly after that.

 Before we identified the root cause, we were overwhelmed about the many exceptions found in the log-files. It was a real challenge to separate errors that made it to a user’s UI from those that remained silent in the background although something obviously went wrong with these errors, too. The architects decided the logs must be free from any noise by catching the exceptions and properly deal with it. Too many errors can hamper the analysis of understanding what’s really going wrong.

 Two weeks later, the logfiles were free from errors. Soon after one of the developers came to me and moaned about a detection he just made. He stated someone simply assigned higher log-levels for most of the detected errors so these could not make it anymore into the logfiles. In his words, the errors were just swept under the carpet.

I cannot judge whether that was really true or whether it was done with just a few errors where it made sense, but it was funny enough to put a new sketch on paper.

Tuesday, October 8, 2024

Leaving Springfield

 For once, not a software-testing related cartoon. Inspired by Donald Trump who threw the ball in my court with his statement about the immigrants in Springfield "eating" their pets. 




There is also a second drawing which I had submitted to the "New Yorker", but unfortunately wasn't published.



 

 

Last but not least, after Trump won the election, I felt I need to draw another one where all the pets could return to their homes as there is nothing to fear anymore. This included P'Nut the squirrel which - in real life - was less lucky. Unfortunately, it could not return home.

 


 



Wednesday, October 2, 2024

The Birthday Paradox - and how the use of AI helped resolving a bug

While working / testing a new web based solution to replace the current fat client, I was assigned an an interesting customer problem report.

The customer claimed that since the deployment of the new application, they noticed at least 2 incidents per day where the sending of electronic documents contained a corrupt set of attachments. For example, instead of 5 medical reports, only 4 were included in the generated mailing. The local IT administrator observed the incident to happen due to duplicate keys set for the attachments from time to time.

But, they could not evaluate whether the duplicates were created by the old fat client or the new web based application. Both products (old and new) were used in parallel. Some departments used the new web based product, while others still stick to using the old. Due to compatibility reasons, the new application inherited also antiquated algorithms to not mess up the database and to guarantee parallel operation of both products.

One of these relicts was a four digit alphanumeric code for each document created in the database. The code had to be unique only within one person’s file of documents. If another person had a document using the same code, that was still OK. The code (eg. at court) was used to refer to a specific document. It had to be easy to remember.

 At first, it seemed very unlikely that a person’s document could be assigned a duplicate code. And, there was a debate between the customer and different stakeholders on our side.
The new web application was claimed to be free of creating duplicates but I was not so sure about that. The customer ticket was left untouched and the customer lost out until we found a moment, took the initiative and developed a script to observe all new documents created during our automated tests and also during manual regression testing of other features. 

The script was executed every once an hour. We never had any duplicates until after a week, all of a sudden the Jenkins script alarmed claiming the detection of a duplicate. That moment was like Xmas and we were so excited to analyze the two documents. 

In fact, both documents belonged to the same person. Now, we wanted to know who created these and what was the scenario applied in this case. 

Unfortunately, it was impossible to determine who was the author of the documents. My test team claimed not having done anything with the target person. The person’s name for which the duplicates were created occurred only once in our test case management tool, but not for a scenario that could have explained the phenomena. The userid (author of the documents) belonged to the product owner. He assured he did not do anything with that person that day and explained that many other stakeholders could have used the same userid within the test environment where that anomaly was detected.

 An appeal in the developer group chat did not help resolve the mystery either. The only theory in place was “it must have happened during creation or copying of a document”.  The most easy explanation had been the latter; the copy-procedure.

Our theory was that a copied document could result in assigning the same code to the new instance. But, we tested that; copying documents was working as expected. The copied instance received a new unique code that was different from its origin. Too easy anyway.

 Encouraged to resolve the mystery, we asked ChatGBT about the likelihood of duplicates to happen in our scenario. The response was juicy. 

It claimed an almost 2% chance of duplicates if the person had already 200 assigned codes (within his/her documents). That was really surprising and when we further asked ChatGBT, it turned out the probability climbed up to 25% if the person had assigned 2000 varying codes in her documents.

This result is based on the so called Birthday Paradox which states, that it needs only 23 random individuals to get a 50% chance of a shared birthday. Wow!

 Since I am not a mathematician, I wanted to test the theory with my own experiment. I started to write down the birthdays of 23 random people within my circle of acquaintances. Actually, I could stop already at 18. Within this small set I had identified 4 people who shared the same birthday. Fascinating!

 That egged us to develop yet another script and flood one exemplary fresh person record with hundreds of automatically created documents.

The result was revealing:


Number of assigned codes for 1 person

 

500

1000

1500

2000

2500

Number of identified duplicates (Run 1)

0

0

5

8

11

Number of identified duplicates (Run 2)

0

0

3

4

6

 With these 2 test runs, we could prove that the new application produced duplicates if only we had enough unique documents assigned to the person upfront.

 The resolution could be as simple as that:

  • When assigning the code, check for existing codes and re-generated if needed (could be time-consuming depending on the number of existing documents)
  • When generating the mailing, the system could check all selected attachments and automatically correct the duplicates and re-generate these or warn the user about the duplicates to correct it manually.

 What followed was a nicely documented internal ticket with all our analysis work. The fix was given highest priority and made into the next hot-fix. 

 When people ask me, what  I find so fascinating about software testing, then this story is a perfect example. Yes sure, often, we have to deal with boring regression testing or repeatedly throwing back pieces of code back to the developer because something was obviously wrong, but the really exciting moments for me are puzzles like this one; fuzzy ticket descriptions, false claims, obscure statements, contradictory or incomplete information, accusations and none really finds the time to dig deeper into the "forensics".

That is the moment where I cannot resist and love to jump in. 

 But, the most interesting finding in this story has not been betrayed yet. While looking at the duplicates, we noticed that all ended up with the character Q. 


 And when looking closer at the other non-duplicated codes, we noticed that actually ALL codes ended up with character Q. This was even more exciting. This fact reduced the number of possibilities from 1.6 million variants down to only 46656 and with it, the probability of duplicates.
 

At the end I could also prove that the old legacy program was not 100% free from creating duplicates even though it was close to impossible. The only way to force the old program to create duplicates was to flood one exemplary person with 46656 records, meaning all codes were used then. But that is such an unrealistic scenario, it was already pathetic.

At the end as it turned out, the problem was a configuration in a third party vendor's randomizer API function. In order to make the Randomizer create random values, you had to explicitly tell it to RANDOMIZE it. =;O) The title of this blog therefore could also be "Randomize the Randomizer" =;O)

 

Further notes for testers
Duplicate records can be easily identified using Excel by going to DataTools and choose "RemoveDuplicates".


 

 

 

Or, what was also helpful in my case was the online tool at 

https://www.zickty.com/duplicatefinder
where you could post your full data set and it returned all duplicates.
 

Yet another alternative is use to SQL group command like in my case

SELECT COUNT(*) as num, mycode FROM table WHERE personid=xxx GROUP BY mycode ORDER BY num DESC;

 And...my simple piece of code in C# which I used within the automated scripts to alarm when the returned value was higher than 1. 







 

 

 

 

Thursday, July 11, 2024

Bug's survival strategy

We recently stumbled over a bug hiding at a place where I really had to sigh:

"Sorry, really?"

The customer complaint that their documents could not be exported anymore. The system ended with a null pointer exception. Initially, we could not reproduce the problem in our internal test environment until we injected a real production document from the customer and tried that out. So far, okay. But we didn't stop there. We wanted to understand, what caused the problem. First, we assumed the resolution of the scanned document may be different (as we had some issues in the past heading towards this direction). We were wrong, that was not the problem. The resolution of the document was identical to what we had internally already.

 Then, we started to play with annotations as there were some weird errors in the back-end log files pointing to such direction. We added a marker-, a line- and a text-annotation to the document, saved it and tried again. Same issue. Then, we did the same on the second page. When we added an annotation there and marked it as a burned in annotation, the problem did NOT occur anymore for that imported document. That was an interesting observation.

 To really be sure we hit the jackpot, we removed the burned-in attribute and tried again. Indeed, now it threw the same exception again. Setting the burned-in annotation again and re-checking...yes, now worked again.

 Then, we did the same test with annotations on the first page. Not the same issue here. So, looked like it really played a significant role where the annotations were set and with which kind of attribute value.

 What a wonderful finding and great hint for our developers. That helped speeding up the fix.

After the outcome of this analysis, we asked ourselves whether we can expect the testing really to test all the possible variations like checking whether annotations of all kinds work on all sorts of page numbers using different kinds of documents such as colored, grey-scale, different DPIs, etc, etc. and all this in a time-frame that is far from fair.

 Fact is, no matter how hard and how accurate your testing is done, software will never be fully free of bugs. The complexity of software has increased dramatically during the last few decades and the number of needed test cases to gain a minimum level of coverage has grown so much, that it has become impossible to execute all tests and possible variations within the time given.

 "Even a seemingly simple program can have hundreds or thousands of possible input and output combinations. Creating test cases for all these possibilities is impractical. Complete testing of a complex application would take too long and require too many human resources to be economically feasible". [1]

 Software development today still is a process of trial-and-error. I have never seen a programmer developing a software component from scratch without several times of failed compiling, fixing, re-writing, re-testing and finally shipping it, then failing at the customer and going through the same process again.

 "Software errors are blunders caused by our inability to fully understand the intricacies of these complex products" [2]

 Ivars Peterson compares today's technology with black magic where engineers act like wizards who brew their magic potion by mixing various ingredients of terminology that only them understand while the public dazzled by the many visible achievements of modern technology often regards engineers as magicians who can solve any problem.

References

[1] The Art of Software Testing, Glenford J. Myers
[2] Fatal defect, Ivars Peterson

Tuesday, April 16, 2024

AI and a confused elevator

A colleague recently received a letter from the real estate agent, stating that several people reported a malfunction of the new elevator. The reason as it turned out after an in depth-analysis: the doors were blocked by people moving into the building while hauling furniture. This special malfunction detection was claimed to be part of the new elevator systems that is based on artificial intelligence.
The agent kindly asked the residents to NOT block the doors anymore as it confuses the elevator and it is likely for the elevator to stop working again. 

I was thinking..."really"?

 I mean...If I hear about AI in elevators, then the first thought that crosses my mind is smart traffic management [1]. For example, in our office building, at around noon, most of the employees go for lunch and call the elevators. An opportunity to do employees a great favor is to move back to the floor where people press the button right after having delivered the previous set of people. Or, if several elevators exist, make sure the elevators move to different positions in the building so people never have to wait too long to get one. 

But, I had never expected an elevator to get irritated and distracted for several days for the case where someone temporarily blocks the doors. It is surprising to me that such elevator has no built in facility to automatically reset itself after a while. It's weird that a common use-case like blocking doors temporarily wasn't even in the technical reference book and required a technician to come twice as he/she could not resolve the problem the first time.

A few weeks later, I visited my friend in his new appartment and I wanted to also see that  mystic elevator. D'oh! Another brand new elevator that does not allow to undo an incorrect choice made while clicking the wrong button. But, it contains an integrated iPad showing today's latest news.
Pardon me? Who needs that in a 4 floor building?

Often, I hear or read about software components where marketing claims they are using AI, whereas in reality the most obvious use-cases were not even considered like that undo button [2] which I'll probably miss in elevators till the end of my days.

 References

[1] https://www.linkedin.com/pulse/elevators-artificial-intelligence-ascending-towards-safety-guillemi/

[2] https://simply-the-test.blogspot.com/2018/05/no-undo-in-elevator.html