Tuesday, April 16, 2024

AI and a confused elevator

A collegue recently received a letter from the estate agent, stating that several people reported a malfunction of the new elevator. The reason as it turned out after an in depth-analysis: the doors were blocked by people moving into the building while hauling furniture. This special malfunction detection was claimed to be part of the new elevator systems that is based on artificial intelligence.
The agent kindly asked the residents to NOT block the doors anymore as it confuses the elevator and it is likely for the elevator to stop working again. 

I was thinking..."really"? I mean...if AI is put into an elevator software, then I first expect the elevator to learn at what times the elevator is called on which floors most often and then automatially move to that position when it makes most sense...based on what the elevator learned over time. For example, in our office, at around noon, most of the employees go for lunch. An opportunity to do employees a great favor is to move back to the floor where people press the button right after having delivered the previous ones. But, when an elevator gets irritated due to blocking doors and cannot be settled within several days...then what benefit is such kind of software providing to the people?


Wednesday, November 29, 2023

Software made on Earth

This is a remake of my original cartoon which was published at SDTimes, N.Y. in their newsletter as of April 1, 2008

Sunday, November 5, 2023

Testing under the Hood

 Even when a particular test passed at first glance, there might still be things going wrong. You may just not have noticed it, because the User Interface stays quiet; at least for the moment. Things can go wrong in a black box long after you executed the test; hours, days or even weeks later. The longer such problems remain undetected the more effort it takes to fix the problem and repair the damage it caused, especially if the system is already LIVE in production. See also Cheerful Debugging Messages and its Consequences in this blog.

It is not enough to look only at the front-end of an application. You should also watch carefully what’s going on behind the curtains. Give all testers a facility to query the underlying database. A lot of things can go wrong there and remain undetected for too long. It will start hurting only when such data is shared with or passed to other programs using a corresponding interface to read or exchange data. I have seen a lot of things stored inappropriate only to hurt when such data was later used by another program.

I developed an SQL query tool with some extra facilities like an analyser to compare all tables before and after a triggered action.

 How can testers live without such tools? It opens a whole new universe of potential problems just waiting to get reported.

Wednesday, September 27, 2023

Revise the Test Report

I thought, I'd published this cartoon in 2020 already, but couldn't find it, so I am doing it now, with a 3 years delay...=;O)

Tuesday, August 15, 2023

Mutation Testing and why we don't need it, or do we?

 When our kids were still small, every Easter, it was a tradition to hide chocolate eggs, sweets and small presents in the garden, around the house, at the carport and sometimes also within the house.
While the kids were so excited to find all the little things, we parents watched them equally excited.

When Easter was long over, often, one or the other egg was still found by accident in a corner or somewhere in a plant pot; too old to still be edible. In other words, our kids didn't track down all of them at Easter. Over the years, they got better and better. We had to be more creative finding new extraordinary places to hide the little things from them, so they didn't have an easy catch ("long hanging fruits" how testers would say).

While we never spent a thought about our kids' "mathematical" effectiveness of finding all these little presents, this is exactly what mutation testing is all about.

It is a method to measure the effectiveness of unit tests in detecting anomalies in the code. The idea is to inject bugs by purpose and then verify how many of these are found. That's pretty much the same like hiding chocolate eggs in the garden.

A typical example of injected bugs (mutants) could be the change of a comparison operator from something like (x<y) to (x>y) or a boolean value that is changed from an initial value true to false or vice versa. In case of a calculation engine, the computed value could be fuzzed and made return an incorrect result. The point is, that these bugs are implemented by purpose and - in contrary to our annual tradition at Easter - the tool that modifies the code knows exactly how many mutants were added and where.

When executing a unit test, the mutation test tool compares the number of failed tests with and without the modified code. If the number of failed tests is the same for both scenarios, then this is an indication of inadequate tests.

I am not experienced in automated mutation testing, but I find this topic quite interesting, especially because IT-companies tend to measure just the test coverage but often, have no idea whether their unit-tests are really effective. Test Coverage doesn't tell you anything about the quality of the code. You can have 100% test coverage for one method and still fail miserably with uncaught exceptions by applying other valid inputs to the method.

Although mutation testing is usually done as part of automated tests using corresponding plug-ins, you can do mutation testing also manually. When I was drawing the cartoon, I was more focused on the manual aspect and less about the potential of using it to test existing automated tests.

Let's go a few steps back and look at our today's approach. We have a lot of manual tests (>1000), we also have a lot of unit tests (> 20'000), a very effective API test suite (> 3000 tests) and also a few UI tests (ca.100), following the typical test automation pyramid in terms of distribution of the tests, but we haven't integrated any sort of mutation testing yet.

I get emailed automatically whenever our testers find defects either through manual testing and/or by findings in the automated UI test-scripts and/or automated API tests. Based on the amount of emails received daily, I draw the conclusion that we are an effective test-team finding many defects. But, of course, it would be more interesting to learn whether we could even do any better. Are there even more bugs around to catch? Honestly, with the current amount of anomalies reported by my testers, my first reaction was rather defensive. Why I should inject any additional bugs by purpose? We have already enough to do while analyzing all the findings that slipped into the code unintentionally. This was also the original idea behind the cartoon, but..here is my mistake:

We have no facts at hand but simply a certain amount of defects we raise every week. 

Mutation Testing could help us collect more facts. Mutation Testing can not only be applied through tools, it can also be done manually. For example, if you want to understand how long it takes to find out a certain (obvious) bug introduced by purpose, just add it and let's see. You don't even need to inject code, you can also change a configuration that leads to a different (unexpected) behavior.

For example, one of my tested software creates documents with inquiries to doctors. A configuration allows the documents to be fit with a data-matrix code on pages the doctors have to fill out and return. When the letters are returned with the data-matrix code on it, a software-component can automatically identify the original request and related patient, then map it to the answer received. This enables quick access to both, original request-letter and response. 

The configuration could be turned off (by purpose), causing the created letters being sent out without a data-matrix code. How long do you think will it take until our testers notice the missing data-matrix code on the letters? 

I am pretty sure, it won't take long, because such a test is well documented in the regression test suite. But, what if we challenge them more - like making the letters print a hard-coded data-matrix code that is the same for all letters? 

It takes more efforts for a tester to find the problem. 

If the test is not documented, it is likely for the testers to miss the bug. If it is documented, it may still depend on the priority set for the test case whether the test is executed at all. If testers are all too confident that this piece is likely not to fail, they won't test it either.

If you inject such mutants, you need to be clear on your goal. Do you want to test the efficiency of the testers, the accuracy of the test cases or the effectiveness of automated tests?

Saturday, May 13, 2023

License Expired

 In an amusing short video from CNN[1], Alexei Navalny, a Russian opposition leader and anti-corruption activist, explained the meme MOSCOW4. It is representative for the stupidity of Putin’s command structure which - according to Navalny - consists of an array of complete morons. He underlined the statement by naming an example with one of them who was hacked his email passwords several times in sequence. The first password was “Moscow1”, then “Moscow2”, etc.  

After we ourselves managed several times to forget updating expiring license keys for our customers, I remembered this story. We are not any better and I thought, it was about time to honour our repetitive mishap with a corresponding cartoon. For the dinosaurs, I was experimenting with a different kind of filling grey; something Gary Larson had in his cartoons, too.

[1] https://edition.cnn.com/videos/world/2022/04/19/navalny-moscow-4-origseriesfilms-3.cnn

Sunday, March 12, 2023

Lindy's Law in Test Automation

by T. J. Zelger, March 12, 2023

When I developed my first "robot" 20 years ago with the goal to automatically test the software so that my team didn't have to do the same manual tests every day, there were at most a handful of products enabling it. There were tools from well-known companies like IBM and Mercury (now HP), and these were extremely expensive. You didn't have much choice. Once you made a decision for a tool, it was almost impossible to revert that decision and go for another. It inevitably resulted in enormous extra costs.

A little later, a few interesting and cheaper alternatives emerged, such as RANOREX, an Austrian company that soon taught the big ones to fear because of their quality and an attractive value for money.

We also experimented with other products that we used for specific tasks and later replaced with other newer/better ones. Among these, I had a Canadian product on my focus which provided us with valuable services when testing a vehicle valuation calculation engine. As far as I know, the product no longer exists today and my memories are patchy, but I believe it was a forerunner of one of the open source systems that are widely used today or it may have been something similar.

20 years later, you will find a flood of even cheaper or free offers. A closer look reveals that most products are based on a few identical core modules. Selenium is currently one of the most popular "engines" on which most tools are based on. I also use Selenium, and because it's actually just an "engine", you have to develop additional methods and modules on top of it to make it a stable and easy to maintain test automation suite.

Nowadays, when building your own framework, most stick to the page-object model. However, we used to have a different approach. Our test data and instructions (action words) were kept in Excel. The idea was to enable testers with no programming skills to write automed tests. At the beginning we even thought, we could convince our business-analyts to maintain their own tests.

The idea to keep data and keywords out of the code was not new. It already existed at the time I was still working with IBM Rational Robot. For example, the SAFS Framework by Carl Nagle [1] was the first framework I learnt about following a similar approach or take TestFrame which is an implementation of Hans Buwalda's so called "Action Words" [2].

Our Excel based framework was quite a success within our headquarters in Switzerland and Germany and our plan to have testers write their own scripts without programming skills went well, but the maintenance of our framework wasn't quite that easy and it needed an expert to maintain all the UI locators and required extensions. This was sometimes a little too tricky for the non-techies. And, we never managed the business analysts to go for it.

Later (in a different company), I used the same approach but realized that it didn't have the same effect if you have testers WITH programming skills. Excel isn't seen cool enough to write automated tests and if you sell such approach to techies, they will raise their eye brows.

So, we removed the Excel part and integrated everything into NUNIT. That was easier to debug also.

And now? A newcomer called Cypress enters the market [3]. As I don't want to get stuck in sweet idleness, we are now starting a new adventure to see what it has to offer. We still keep our Selenium scripts, but they are going into a maintenance mode right now.

But, do we really have to follow each new fashion trend? Who guarantees that new stuff and ideas are really better than what we already have in place?

Fortunately, in my position as QA Manager, I can mostly set the goals myself in this area. If things are going well, you have the dilemma between "don't touch a running system" and ensuring we are not missing something. 

Today, we use Selenium/C# with NUNIT for automated UI tests triggered daily by Jenkins. And, we have an automated test suite that fires requests on an interface level (below the UI) following the Test Automation Pyramid approach [4].

The problem: Everything works smooth since years! Why should I spend time to investigate alternatives?

I am thinking here of the Lindy's Law [5]. If something has proven itself for a long time, there is a high probability that it will continue to prove itself in the future. In my case, this applies in particular to these automated API tests, which are based on a framework we developed ourselves with the aim to keep the test scripts at the highest possible level of abstraction. The technical details such as authentication communication with the backend remain hidden. Also, instead of just having JSON based input/output, we are dealing with deserialized business objects. 

We have at least 3000 automated tests that have identified bugs that were not caught in the deveoper's unit-tests. Simply said, these interface tests are a success story and I don't spend a second thinking about replacing it with a standard product. Why should we scan the market for "better" stuff that maybe isn't?

Because the examination of alternatives does not necessarily lead to the replacement of a tried and tested system. It can become a supplier of interesting and new ideas and extensions for the existing solution. Being open minded also helps recognizing the limits of one's own system and thus to check whether the current version can last not only in the near, but also far future and/or can be supplemented with one or the other useful feature.

The only constraint I am dealing with in this regard is my available time. But that's another story.


[1] SAFS, Carl Nagle

[2] "Action Words" by Hans Buwalda, Software Test Automation (Fewster/Graham), Addison-Wesley

[3] https://www.testautomatisierung.org/testautomatisierung-cypress-vs-selenium/

[4] Test Automation Pyramid, Fowler, https://martinfowler.com/articles/practical-test-pyramid.html

[5] Lindy's Law by Albert Goldman, 1964, https://www.sciencedirect.com/science/article/abs/pii/S0378437117305964 and "Das Magazin", Nr. 10, 10-11. März 2023

Saturday, February 4, 2023

AI Adventures in Babysitting

I recently stumbled over an article about Microsoft's chatbot Tay which - after only 24 hours "training" turned into a more than questionable little "monster". The article was the inspiration for this one cartoon.

Saturday, January 14, 2023

Time to illuminate

A crisis is a productive state. You simply need to get rid of its aftertaste of catastrophe.
Max Frisch.

Thursday, December 23, 2021

Cheerful debugging messages and its consequences

Over a year ago, we tested the automated printing of a clerk’s signature on letters being sent to doctors, lawyers, etc. There were some issues where the signature was missing, that’s why we marked the defective signature-template with a debugging text. The idea was to test whether the template is processed at all or whether the problem is within the signature itself.

I don't know what came over me when I used "meow-meow" as the debugging text. Probably, Luna is to blame for it. Luna was our 16 year old cat which died shortly before. She's now eternalized in this (less testing related) cartoon.

We quickly found the cause for the missing signature, we fixed the bug and shipped it to the customer along with an updated template. Unfortunately, I forgot to remove the debugging text in the template. The customer found the problem during their internal BETA test. We fixed the template, shipped it, no big deal, over.

But, about half a year later, the customer reported that he'd seen a letter in the productive document management system containing the complimentary close "meow-meow" right below the printed signature.

I could feel my face becoming soaked with blood. How the heck could this happen again? Even though we don't execute all regression tests each time we ship a minor release or hotfix, the print-out is part of all smoke tests. Therefore, my thoughts were "this simply can't be true", 'cause we've never seen any such text being printed on any of the letters that ended on our printers.

A little investigation and it became soon clear what happened. Even though we had originally sent the correct template, there was still a copy of the old "meow-meow" template around. When someone decided to include the template into the git-repository, he took the wrong one. As a result, with the next minor software-version shipped, the old signature-template was deployed again. 

While this explains how the bug was re-introduced, it does not explain, why testers did not find the issue while testing the print-out. 

Further investigation revealed, even though ALL letters processed the "meow-meow" signature-template, NOT all letters were printing the text below the signature. We found out, it depended on several things such as the size of the signature and the configuration of the target letter being sent out. If there was a fixed size between signature and complimentary close, then the text "meow-meow" simply didn't make it in between and wasn't printed. If instead, the letter was configured to dynamically grow with the size of the signature, then also "meow-meow" made it to the printer.

Without going into too much details, only one particular type of letter was affected and fortunately, it was a letter that customers sent only to internal addresses (like an internal email) and not to stakeholders outside the company. Lucky me! 

Well, not really...because...

..one of the customers told me that - despite this letter is sent only internally - it may still be used as an enclosure for other letters going out to lawyers and doctors. The company feared they'd lose credibility when such letters made it to their customers. It was therefore important to remove all identified documents in production before they pay a high price for this slip.

What are the lessons learnt?
First of all, it could have been worse. I mean, if you get a letter that ends with "meow-meow", it's likely to put a smile on your face. In my 20 years career as a tester and developer, I've seen comments and debugging texts that are much worse and sometimes below-the line. My friends told me several stories about similar happenings. 

A colleague told me, they had added an exhilarant test-text for their ATM. The text was triggered when the card was about to get disabled. When in production - their boss wanted to withdraw money from the machine, the card got blocked and showed the text "You are a bad guy. That's why we take your card away". I only know, he wasn't amused about it. 

Looks like, we are not alone.

But, as we have just learnt, it depends on who is affected and who will read it. I can only suggest to never ever type anything like that in any of your tests because you never know where it's going to show up.

Instead of "meow-meow" a simple DOT or DASH would have done the same trick, and didn't raise the same kind of alarm in case such debugging messages make it to the bottom of a letter. 

This was the last cartoon and blog entry for 2021. There are more to come next year. Please excuse, this cartoon here wasn't a pure testing related cartoon, but the story still is.
I wish you all a merry XMAS and a happy new year. I hope you enjoy the blog and continue to regularly visit me here.


Friday, October 22, 2021

Test Patterns out of control

 In my blog entry Apostrophe, Ampersand, Exclamation mark & Co. early this year, I highlighted how important the use of special characters has become for our team during testing. A closer look at the article demonstrates that these characters aren't really exceptionnel in real life. They are commonly used; actually more common than what most of us initially thought. It's therefore even more important, we don't treat these letters as a special cases used by testers only.

I smirked when I recently noticed how many fake persons were added into our reference test data containing German umlauts or apostrophes, etc. Looks like a lot of testers are crazy to find maniac but still realistic data. It has become much harder to find vanilla data. Mission accomplished.


Sunday, August 15, 2021

Accurate Test Reports

One of the main challenges I faced during my 20++ yrs. career as a software tester and test manager is to resist project and release delivery pressure.

When deadlines are close - and the software-under-test is still not ready to ship - testing automatically becomes a "virtual" bottleneck.

Resisting the temptation of short-cutting testing and trying to keep thoughts out the way such as  “it’s not going to be that bad” can be extremely stressful.

Deploying a release in time but of poor quality will almost certainly fall back on QA. Critical questions are usually addressed first to the testers.

On the other hand, shifting a release-date may move customers into an uncomfortable situation when they booked resources for their final acceptance tests and/or training. Plus, it will trigger questions like “why have the testers not found these issues earlier”.

 To be honest, I had given in sometimes on the temptation to perform short-cutting and in most (but not all) of these cases it surprisingly worked out, even though these were tense moments. Shortcutting testing can work well if one sticks to the high priority test cases first, and those test cases you assume beeing affected by the latest changes.

But, accurate prioritization of test cases is only possible if you know the software and your customers well. Prioritizing test cases works if you have a good sense where the SUT is likely to break/survive depending on the change applied. This includes knowing how customers use your program. Some bad experience in the past might help here. Don't forget to visit your customers from time to time.You can get valuable insights when you see the customers working with your system.

Speed is an important element in testing.
High prio bugs are found earlier if high prio test cases are executed first. The longer QA needs to prove the software isn't worth shipping, the higher the risk people look at QA as perfectionists - trying to find/report all possible bugs. If QA gets such labels, their reports are interpreted similarly. At the worst, testers are considered a thwarting element in the delivery process.

When in doubt (and available time does not allow digging deeper into analysis of doubtful areas) - add these concerns to the report. Be as accurate as possible...and, for God's sake...don't let management make the test manager providing the final GO on the release. Instead, invite all important stakeholders, put all the facts on the table. This includes what you know and what you don't know. Then let the "going-live" decision be owned by the team.

It has worked for me...sometimes  =;O)



Saturday, May 1, 2021

Apostrophe, Ampersand, Exclamation mark & Co.

Since I am in the testing business, I am "preaching" the use of special characters and umlauts wherever and whatever we test. 

Umlauts (ä,ö,ü) and chars like é,â, etc. are very common in Europe. In the US and elsewhere, you often see last names with a single quote such as O'Neill, O'Connor or, think about company names like Macy's and McDonald's.

Sometimes, we forget to include these characters in our tests and often we promptly pay the price in the form of bugs reported in the field.

But, awareness is increasing in our team. I am really delighted when I see a colleague posting notes into the group-chat yelling: 

"Hey Buddy, there were no special chars in your demo.Will it work with an exclamation mark, too?"

  I love that, but I also recommend to first test the happy case using simple inputs. There is no means to attack an application with special chars if it can't even deal with the basics. Once I've convinced myself that the happy case works, I go over to feeding the program with more interesting input.

 It's always exciting to watch how a system deals with a combination of chars like
"{äöü},[áàâ]/|éèê|\ë!:;+(&)'%@". But, if such a test fails, developers and/or business analysts will likely give you the hairy eyeball when they see a defect documented with such example input. They probably close the issue unfixed, assuming such inputs are unrealistic and won't be seen in production.

  What has worked better for me: Use these characters in a more meaningful context so it becomes obvious to all stakeholders that the problem you spotted is worth getting fixed.

For example, I've made it a habit to use M&M's Worldstore Inc. whenever I use or create a company address. That's not because I love chocolate so much but rather because it has a mix of special characters that all caused headaches in my past and current career as a software tester: the ampersand, the apostrophe and the dot.

Same approach I apply for street-names and ZIP codes. It is a common misbelief that ZIP codes need to be numeric. Go visit the UK and check how their ZIP codes look alike. Street names can contain slashes, dashes and single quotes. Street numbers can be alphanumeric separated by a slash.

A blank in a name isn't exotic. Look at Leonardo Di Caprio or Carla Del Ponte. I've seen programs that cut away the second part, leaving the famous actor with the short name "Di" or "Del".

Below find a few examples of names I use regularly in testing:

  • M&M's Worldstore
  • Mr. & Mrs. O'Leary
  • Léonardo Di Caprio 
  • Renée-Louise Silberschneider-Kärcher
  • Praxisgemeinschaft D'Amico & Ägerter GmbH

Example street names:

  • 29, Queen's Gardens
  • 4711 Crandon Blvd., Appartment F#1000
  • Rue du Général-Dufour 100-102
  • 55b, Rue de l'Université 
  • Elftausendjungferngässlein 12b

and ZIP-Codes:

  • 4142 Münchenstein 1 (Switzerland)
  • 33149 Key Biscayne, FL (USA)
  • EH4 2DA Edinburgh (Scotland)

   Special characters are interesting wherever you can submit text.

If you post a message like "meet you at Lexington Av/59 St" and, if that text is stored/exported in an XML file without properly escaping the slash or embedding the input in a CDATA tag, you may find interesting bugs.

Backslashes are used in many programs to escape characters. Quotes and single quotes are used to terminate text elements. The latter is a common way for hackers to manipulate SQL statements so the command fired forces the back-end to reveal information not intended for the normal user. Semicolons can confuse web client logic or cause problems when information is exported to CSV files. Question marks and ampersands are both used in URL links, curly braces are common markers in REST payloads, etc. 

I've never really understood what's so funny using the pangram the quick brown fox jumps over the lazy dog in testing or the filler "Lore Ipsum". Both do not contain any interesting characters. When testing German language based programs, I prefer to use  Victor jagt zwölf Boxkämpfer quer über den großen Sylter Deich because it contains umlauts and this weird ß character which - in Switzerland - is unknown, even in the German speaking part.

Most often, I use my own pangram which I have further adapted over time:

Chloë Valérie O'Loughlin-Bäcker runs the "Deacon Brodie" (Scottish Pub & Largest Whisky Collection) at Saint-Louis du Ha!Ha!, a town in Québec Canada. Her car is a Jaguar X and runs a max. of > 200 km/h. Isn't this amazingly fast?

It contains all letters of the alphabet, plus umlauts and a selection of interesting characters that challenge text processors. But, it has not helped me later find a bug in our system where an exception was thrown when the annotation text contained a capital Ü. Other Umlauts were no problem and even the lowercase ü didn’t cause any harm. The system really failed only with the capital version of it. Crazy!

One of my most funny treasures is the Canadian town St. Louis du Ha! Ha! This is the weirdest town-name I've ever heard of, because it has two exclamation marks. Okay, the likelihood of such a city ever being entered by our customers is close to zero. Canadian citizens living near the area of Quebec probably disagree. Testing and test input is a question of context. Westward Ho! is a town in the UK which also houses an exclamation mark.


Null, True, All, Test and other funny bugs related to people's names

Besides the recommendation of using special characters, it is also worth to sneak a peek at reported stories related to people's names like Jennifer Null and Rachel True. Their names were processed and misinterpreted as NULL [BBC16] or - in the case of Rachel - a boolean value [9TO5] causing a problem using her iCloud account. I've not experienced either of the two cases myself in any of my tests but we found a similar issue where submitting the search term "Alli" returned documents titled "Alligator" and "Allister". All fine, but submitting "All" ended in an exception. 

Stephen O, a Korean native living in the US had been hassled by credit card companies, because his last name was too short. When he applied the workaround by adding an extra "O" to end up with "OO", it didn't take long until he had a meeting with the government because he provided false information [NYT91].
Graham-Cumming could not register his name on a web-site because the hyphen was considered an invalid character [GRA10].

There is also Natalie Weiner who could not register either because here name was considered offensive [PAN18].

Yet another example is the story of William and Katie Test who were both unable to book airplane tickets simply because the system scanned the names for the term "test". A match triggered a different procedure and disallowed the process of booking tickets in production [COY17].

Wooha! That's exactly what one of my clients had implemented, too! I was surprised there were never any reported issues with it. So I investigated a little and found out, in Switzerland there are only around 50-60 people carrying the last name "Tester". Second, my investigation also revealed the system did NOT scan the last name but rather the first name for the term "test" to avoid operations executed in production; Doozy! How clever!


Further similar reading: