Wednesday, January 27, 2021

Interpreting the Regression Test Activity Graph

We just completed a 1.5 week intensive manual regression test phase where we executed almost the complete set of all (several hundreds) test cases. We are in a lucky situation. Our documented test cases represent nearly 100% of all implemented features. If we achieve a 70-80% test coverage, then we get a real good picture of the overall quality of the product increment. That means, aside from the many automated tests, it's worth from time to time, doing some manual end-to-end regression testing.

While tracking the regression testing progress using a cloud based test case management tool, we were looking at the activity graph and it made us smile. It's exactly what we expected.

At the beginning, testers focus executing those test cases that are well documented, having clear instructions and which rely on previously well prepared test data. I mean, objects that are in specific states where testers can execute just the transition from one state to the next and not worry about the laborious setup steps.
Then, testers switch to more complex test cases which take a little more time to understand and test.  This is when the progress curve reaches the peak and progress starts to slow down.
Of course, we also find anomalies. Bugs can slow you down because analyzing and understanding where and when defects were introduced takes additional time. After a few days, the first bugfixes are delivered, too. Developers require your attention to test their fixes. This interrupts testers from working on their suite. The rate of passed tests is decreasing, but still decreasing in a constant and expected way.
In parallel, developers are working already on the next generation of the product, meaning, their user stories get shipped and require testing too. The tester's brain is now busy with a lot of context switching; clearly more than at the beginning of the sprint.
Now that we are more than half way through, we switch to the monster test cases. I call them like that because they do not consist of simple steps, they contain several tests expressed in tables of inputs and expected outputs. That's why I think it's nonense to talk about the number of test cases. A test case can be atomic and executed in seconds, yet another test case can keep you busy for half an hour and more.

Some of the test cases may be poorly documented and require maintenance or correction. Some test cases require the help of a domain expert. The additional information gained should be documented in the test suite, so we don't have the same questions next time. These are all activities running in parallel.

Last but not least, weekend is getting closer. The first enthusiasm is gone, you're starting to get bored.
You hear music from your neighbour. The caféteria gets louder. The sound of clinking glasses reaches your desk. It's time for a break, time to reboot your brain. TGIF! And now it's weekend time!
And then, Monday is back! It's time for another final boost and time to say thank you. Great progress.
We made it Yogi! 
....and I like that graph.


Monday, January 11, 2021

About Bias and Critical Thinking


I recently stumbled over a small article about the "Semmelweis-Reflex". It was published in a Swiss Magazin and I found it quite interesting. I translated and summarized it, because I found some analogy to software testing:

In 1846, the Hungarian gyneocolist Ignaz Semmelweis realized in a delivery unit that the rate of mothers dying in one department was 4 percent while in another within the same hospital, the rate was 10 per cent.

While analyzing the reason for it, he identified that the department with the higher death rate was mainly operated by doctors which were also involved in performing post-mortem examination. Right after the autopsy  they went helping mothers  without properly disinfecting their hands.

The other department with the lower death rate was maintained mainly by midwifes who were not involved in any such autopsy. Based on this observation, Semmelweis adviced all employees to disinfect their hands with chlorinated lime. The result: the death rate decreased remarkably.

Despite the clear evidence, employees's reaction remained sceptical, some were even hostile. Traditional believes were stronger than empiricism.

Even though this is more than 150 years ago, people haven’t changed so much in these days. We are still biasing ourselves a lot. The tendency to reject new arguments that do not match our current beliefs is still common today. It is known as the "Semmelweis-Reflex”. We all have our own experience and convictions. This is all fine, but it is important to understand that these are personal convictions that may not be transferred to a general truth.

How can we fight such bias? Be curious. If you spontaneously react with antipathy on something new, force yourself to find pieces in the presenter’s arguments that could still be interesting, despite your disbelieve as a whole.

Second, make it a common practice to question yourself by saying “I might be wrong”. This attitude helps you overcome prejudice by allowing new information to get considered and processed.

Sources: text translated to English and summarized from original article  published as "Warum wir die Wahrheit nicht hören wollen», by Krogerus & Tschäppeler, Magazin, Switzerland, March 20, 2021

Back to testing:
I am learning in this article, in case we didn't do it before, we should start questioning ourselves more often.  We should learn to listen and not hide behind dismissive arguments simply because what is told to us doesn't match our current view of the world.

But this story has two sides. If I am getting biased, then others may be biased, too. Not all information reaching my desk, must be right. The presenter's view of the world may be equally limited.

My personal credo therefore is NOT ONLY to question myself, but to also question statements made by others, even if these confirm my current view of the world and even if it sounds all reasonable.
He/she may also have the intention to achieve something without the need of answering critical questions. He/she may want to avoid getting challenged into discussions that uncover his/her own "sloppy" analysis.

Call me a doubting Thomas. I don't believe anything, until I've seen it.

  • If someone tells you "a user will never do that",question it.
  • If someone tells you "users typically do it this way", question it.
  • If someone tells you "this can't happen", question it.
  • If someone tells you "we didn't change anything", question it
  • ...

I trust in facts only. This is the result of 20 years testing software.

Come-on, I don't even trust my own piece of code. I must be kind of a maniac =;O)

Tuesday, January 5, 2021

No test is the same, use your brain

 About a decade ago, a manager wondered why I asked questions about how the software-under-test was supposed to work. He responded: "You don't need to know all that stuff, all you need to do is test".

This happened so many years back, I can't  remember whether he really meant it seriously or was he just making fun. 

I mention this episode here, because - even if you are looking at software from a black box or business oriented perspective, you should still be interested in how things work from a technical point of view.

If you don't know what's going on under the hood, you are going to miss important test cases. The more one knows, the more target-oriented and effective your tests evolve. 

James Bach (Satisfice) once provided a great example in one of his speaches when asking the audience "how much test cases do we need" while showing a diagram that consisted of a few simple rectangles and diamonds. Various numbers were summoned towards the speaker's desk, most of them being wrong or inappropriate. When James Bach revealed the details behind this simplified diagram, the scales fell from their eyes. It became clear that there is much more behind this graph. When people guessed numbers like 2, 3 or 5 tests, it became now clear, this was just the start and the real number of required tests would be multiple times higher than their first guess.

But, I have also my own story to share with you and demonstrate why it is important to ask more questions about the software-under-test.

A quick intro
Every passenger vehicle is registered using a 17-digit vehicle identification number (VIN) which is unique worldwide. The first 3 letters identify the manufacturer. 

For example:

  • WVWZZZ6NZTY099964 representing a 1994 a VW Polo
  • WF0WXXGBBW6268343 representing a 2006 Ford Mondeo V6
  • WBAPX71050C101516 representing a 2007 BMW 5series

In order to retrieve the details of each car's equipment such as painting color, 2-door, 4-door, tinted glass, sunroof, etc. we fired a VIN number as input and received the complete vehicle information as an XML output.

This service was an important part of a cloud based service we developed and sold as part of a bigger solution to insurance companies and bodyshops. The customers used it to estimate the total repair cost of damaged cars. Frankly speaking, in order to perform an accurate estimation, one needed at least the part prices and repair steps or workload. Our company was great in this business, since they had all that data from almost all car manufacturers. I have to admit, this is a simplified explanation of a more complex piece of software.

Back to testing
Our job was to make sure the vehicle identification service returns the correct car related data. To test that, we took our own and our friends' cars as a reference, because it was the easiest way for us to verify the data makes any sense. 

What we didn't know yet; our company wasn't the owner of all data. Depending on the manufacturer of the queried vehicle, our system either retrieved data from our own database or had to call a third party web service to request the required information. This was the case for BMW and PSA (Peugeot, Citroen, Opel, Vauxhall) just to name two examples.

For the end-user, this wasn't visible. All the user saw was a 17-digit VIN field with a button to trigger the search and then wait until we returned the complete vehicle information along with the equipment data.

What does that mean for the development of test cases?

Knowing, that the system - in some cases - communicates with third party components to get the information, is essential when testing this service. Simply submitting one VIN to see that the service responds correctly, is not enough.
We needed to know what data we own, and what data is gathered through calling external services. Unfortunately, we never managed to get a comprehensive list to shed a little light on the inwards of the software. Therefore, the only weapon for us testers was to test the service using a broard variety of exemplary VIN numbers covering as much as different manufacturers as possible.

But this is just half the story. When a request was being fired, our system cached the generated XML response for an unspecified period of time. 

That means, if you submit a particular BMW VIN number for the first time, followed by later calling that service again using the exact same VIN number, this wouldn't trigger the same developer's code paths anymore.

Also, we should know, for how long the data is cached. Unfortunately, the owners of the product didn't reveal the secret to the testers. I could now start with a new article on testability and rave over the necessity to have extra functions for testers to flush the cache when needed, but this goes beyond the scope of this article.

What else?

Having now two different kinds of tests such as...

- consider different brand names when testing VIN
- consider submitting the same VIN a second time to test the cache

..we should start thinking about what happens when one of the external services is unavailable.
How does our system react in such scenario, especially if the response isn't cached yet?
How would we even know that it's the external service and not our own code that got broken?

One of the approaches was to add extra tests examining the external services in isolation from our implementation.
Yet another approach - and this is what we did (because we were not given the details) - was to have a set or group of different VIN numbers that belong to the same manufacturer. For example, a set of 5 BMWs, another set of 5 Peugeots, 5 Fords, etc.

If an external service is down, then a whole set of such test cases would fail. If instead only 1 test within a set failed, then the root cause was probably somewhere else.

The below circles demonstrate it visually. At the beginning one knows little to nothing about the system- or component-under-test (1). By questioning how things work, one detects new areas that may or may not be relevant while developing new test cases (2). Knowing and understanding these areas enlarge the awareness area of circle and result in minimizing the risk of skipping important test cases.

Closing rate
Developing test cases is not a monotonous job. The contrary is the case, it requires asking questions, sometimes critical questions and having a good understanding what are meaningful test scenarios to cover and which can be skipped without accepting high risks. Sometimes, you need to be obtrusive when it comes to interviewing techies to get the details, especially when asking for testability hooks.
Although testing activities face a lot of repeating patterns of software-defects, developing appropriate test cases is a task that involves a lot of thinking. That's why I say, no test is the same, use your brain!

BTW, I forgot to add...not all external service providers allowed us to cache data, even if it was cached just for one day. That's an additional test to think about, and I am sure, I forgot to mention many more tests we actually developed at that time.

 Below, find a simplified flowchart of the vehicle identification service.