Wednesday, January 27, 2021

Interpreting the Regression Test Activity Graph

We just completed a 1.5 week intensive manual regression test phase where we executed almost the complete set of all (several hundreds) test cases. We are in a lucky situation. Our documented test cases represent nearly 100% of all implemented features. If we achieve a 70-80% test coverage, then we get a real good picture of the overall quality of the product increment. That means, aside from the many automated tests, it's worth from time to time, doing some manual end-to-end regression testing.

While tracking the regression testing progress using a cloud based test case management tool, we were looking at the activity graph and it made us smile. It's exactly what we expected.


 
At the beginning, testers focus executing those test cases that are well documented, having clear instructions and which rely on previously well prepared test data. I mean, objects that are in specific states where testers can execute just the transition from one state to the next and not worry about the laborious setup steps.
 
Then, testers switch to more complex test cases which take a little more time to understand and test.  This is when the progress curve reaches the peak and progress starts to slow down.
 
Of course, we also find anomalies. Bugs can slow you down because analyzing and understanding where and when defects were introduced takes additional time. After a few days, the first bugfixes are delivered, too. Developers require your attention to test their fixes. This interrupts testers from working on their suite. The rate of passed tests is decreasing, but still decreasing in a constant and expected way.
 
In parallel, developers are working already on the next generation of the product, meaning, their user stories get shipped and require testing too. The tester's brain is now busy with a lot of context switching; clearly more than at the beginning of the sprint.
 
Now that we are more than half way through, we switch to the monster test cases. I call them like that because they do not consist of simple steps, they contain several tests expressed in tables of inputs and expected outputs. That's why I think it's nonense to talk about the number of test cases. A test case can be atomic and executed in seconds, yet another test case can keep you busy for half an hour and more.

Some of the test cases may be poorly documented and require maintenance or correction. Some test cases require the help of a domain expert. The additional information gained should be documented in the test suite, so we don't have the same questions next time. These are all activities running in parallel.

Last but not least, weekend is getting closer. The first enthusiasm is gone, you're starting to get bored.
You hear music from your neighbour. The caf├ęteria gets louder. The sound of clinking glasses reaches your desk. It's time for a break, time to reboot your brain. TGIF! And now it's weekend time!
 
And then, Monday is back! It's time for another final boost and time to say thank you. Great progress.
 
We made it Yogi! 
 
....and I like that graph.
 
 
 

 


Tuesday, January 5, 2021

No test is the same, use your brain

 About a decade ago, a manager wondered why I asked questions about how the software-under-test was supposed to work. He responded: "You don't need to know all that stuff, all you need to do is test".

This happened so many years back, I can't  remember whether he really meant it seriously or was he just making fun. 

I mention this episode here, because - even if you are looking at software from a black box or business oriented perspective, you should still be interested in how things work from a technical point of view.

If you don't know what's going on under the hood, you are going to miss important test cases. The more one knows, the more target-oriented and effective your tests evolve. 

James Bach (Satisfice) once provided a great example in one of his speaches when asking the audience "how much test cases do we need" while showing a diagram that consisted of a few simple rectangles and diamonds. Various numbers were summoned towards the speaker's desk, most of them being wrong or inappropriate. When James Bach revealed the details behind this simplified diagram, the scales fell from their eyes. It became clear that there is much more behind this graph. When people guessed numbers like 2, 3 or 5 tests, it became now clear, this was just the start and the real number of required tests would be multiple times higher than their first guess.

But, I have also my own story to share with you and demonstrate why it is important to ask more questions about the software-under-test.

A quick intro
Every passenger vehicle is registered using a 17-digit vehicle identification number (VIN) which is unique worldwide. The first 3 letters identify the manufacturer. 

For example:

  • WVWZZZ6NZTY099964 representing a 1994 a VW Polo
  • WF0WXXGBBW6268343 representing a 2006 Ford Mondeo V6
  • WBAPX71050C101516 representing a 2007 BMW 5series

In order to retrieve the details of each car's equipment such as painting color, 2-door, 4-door, tinted glass, sunroof, etc. we fired a VIN number as input and received the complete vehicle information as an XML output.

This service was an important part of a cloud based service we developed and sold as part of a bigger solution to insurance companies and bodyshops. The customers used it to estimate the total repair cost of damaged cars. Frankly speaking, in order to perform an accurate estimation, one needed at least the part prices and repair steps or workload. Our company was great in this business, since they had all that data from almost all car manufacturers. I have to admit, this is a simplified explanation of a more complex piece of software.

Back to testing
Our job was to make sure the vehicle identification service returns the correct car related data. To test that, we took our own and our friends' cars as a reference, because it was the easiest way for us to verify the data makes any sense. 

What we didn't know yet; our company wasn't the owner of all data. Depending on the manufacturer of the queried vehicle, our system either retrieved data from our own database or had to call a third party web service to request the required information. This was the case for BMW and PSA (Peugeot, Citroen, Opel, Vauxhall) just to name two examples.

For the end-user, this wasn't visible. All the user saw was a 17-digit VIN field with a button to trigger the search and then wait until we returned the complete vehicle information along with the equipment data.

What does that mean for the development of test cases?

Knowing, that the system - in some cases - communicates with third party components to get the information, is essential when testing this service. Simply submitting one VIN to see that the service responds correctly, is not enough.
We needed to know what data we own, and what data is gathered through calling external services. Unfortunately, we never managed to get a comprehensive list to shed a little light on the inwards of the software. Therefore, the only weapon for us testers was to test the service using a broard variety of exemplary VIN numbers covering as much as different manufacturers as possible.

But this is just half the story. When a request was being fired, our system cached the generated XML response for an unspecified period of time. 

That means, if you submit a particular BMW VIN number for the first time, followed by later calling that service again using the exact same VIN number, this wouldn't trigger the same developer's code paths anymore.

Also, we should know, for how long the data is cached. Unfortunately, the owners of the product didn't reveal the secret to the testers. I could now start with a new article on testability and rave over the necessity to have extra functions for testers to flush the cache when needed, but this goes beyond the scope of this article.

What else?

Having now two different kinds of tests such as...

- consider different brand names when testing VIN
- consider submitting the same VIN a second time to test the cache

..we should start thinking about what happens when one of the external services is unavailable.
How does our system react in such scenario, especially if the response isn't cached yet?
How would we even know that it's the external service and not our own code that got broken?

One of the approaches was to add extra tests examining the external services in isolation from our implementation.
Yet another approach - and this is what we did (because we were not given the details) - was to have a set or group of different VIN numbers that belong to the same manufacturer. For example, a set of 5 BMWs, another set of 5 Peugeots, 5 Fords, etc.

If an external service is down, then a whole set of such test cases would fail. If instead only 1 test within a set failed, then the root cause was probably somewhere else.

Visualized
The below circles demonstrate it visually. At the beginning one knows little to nothing about the system- or component-under-test (1). By questioning how things work, one detects new areas that may or may not be relevant while developing new test cases (2). Knowing and understanding these areas enlarge the awareness area of circle and result in minimizing the risk of skipping important test cases.

Closing rate
Developing test cases is not a monotonous job. The contrary is the case, it requires asking questions, sometimes critical questions and having a good understanding what are meaningful test scenarios to cover and which can be skipped without accepting high risks. Sometimes, you need to be obtrusive when it comes to interviewing techies to get the details, especially when asking for testability hooks.
Although testing activities face a lot of repeating patterns of software-defects, developing appropriate test cases is a task that involves a lot of thinking. That's why I say, no test is the same, use your brain!

BTW, I forgot to add...not all external service providers allowed us to cache data, even if it was cached just for one day. That's an additional test to think about, and I am sure, I forgot to mention many more tests we actually developed at that time.

 Below, find a simplified flowchart of the vehicle identification service.



Saturday, October 10, 2020

Follow the Swarm

During my holidays, I  read two fantastic books about problem solving and focusing on the important stuff. "Range" by David Epstein fascinated me in that there are plenty of examples demonstrating how people without specialized knowledge in a particular area could find solutions to problems where the best experts got stuck. The book rejects the common believe one has to start and specialize early in order to really get good at something. It lists plenty of famous people in the world having demonstrated the contrary such as Roger Federer, Vincent Van Gogh, Dave Brubeck, etc. 

Experimenting in your career, trying out different stuff broadens the skill to look at problems from different angles and find solutions that are much more difficult to identifiy if you can't get out of your box. It explains also the success story of Nintendo which once was a small company and not very attractive to high talented graduates. 

The other book "Simplicity" by Benedikt Weibel, former CEO of Swiss Railway Corporation (SBB"), goes into a similar direction with other and less detailed examples. He analyzes how the best chess players think in terms of patterns and how to focus on the essential stuff. Less is more. Weibel encourages to make more use of checklists and he also makes a heretical remark when he says that "without a great checklist, Sullenberger had not managed to bring down the Airbus on the Hudson savely" (that's not really my opinion. I think it was a mixture of all, great experience, courage, and a little bit of luck). 

Big Data is an interesting tool, but it is not solving our problems and it is not free of failure (examples in the book). I am not advertizing but simply sharing my thoughts on two great books full of valuable hints and references although I know, it will be difficult to not fall back into old habits.

Interesting related reading:

  • The Carter Racing Case Study: https://www.academia.edu/20358932/Carter_racing_case
  • The missing bullet: https://onebiteblog.com/finding-the-missing-bullet-holes/

Thursday, October 1, 2020

Tag submarines before they shoot

In the retrospective meeting it was claimed we could probably have raised some hot issues earlier. The late addressing of these items put people under pressure shortly before rolling-out. Although it was correct that we had quite some late change requests which hurt not only devs but also testers, most of the issues were not late discoveries but rather late awareness of importance.

 The tickets were raised long time back but, because of release pressure, these were parked aside in the backlog and assigned low priority. We called it the “parking-place”. At that time, a restrictive process of prioritizing tickets was needed to manage the deadlines. The limitations were accepted for a while. But over time, things changed.  What was once rated low priority for release N, all of a sudden became more important in the next release and at a bad timing. For many stakeholders including me, these tickets came out of nowhere like submarines; and of course, all at same time.

As a lessons learnt, we - as testers - have decided to get better by regularly reviewing parked tickets. This is to help everyone in the team to become aware earlier of potential new dangerous "submarines" intending to surfac on the water plane soon. There would still be enough time then to tag these before they have a chance to shoot.



Sunday, September 27, 2020

Sometimes you have to shock the customer

I struggled with some of our new features when looking from an end-to-end workflow view. I was convinced we didn't take into account the unexpressed users’ requirements enough to not just get a user’s job done, but to also get it done quickly. 

To find out whether we were right, we prepared a plan.

When the customers were invited to our offices the very next time - testing our new features - for one time,  we didn’t prepare anymore the typical test case checklist. This time, instead, we prepared just one simple and realistic end-to-end scenario that combined the features of the old version with the new features. Don't ge me wrong, I love checklists, they are a perfect tool to guarantee we don't forget anything but for this particular case, we needed to jump out of the standard operation procedure. Instead of ticking off an atomic feature list, this time we took into accocunt its place in a real-life scenario.

We were really surprised about the reaction of the domain experts. This approach revealed the "pilikia". All experts agreed this process requires improvement.

As a result, initially low rated internal tickets became a higher attention and unfortunately had to be implemented as part of late change requests in the middle of the stabilization phase. My bad. The timing was a disaster. It was so bad, it was almost good. These late changes payed off. The implemented improvement was significant and worth it. The customer appreciated the correction and our ability to respond quickly to their concerns. 

They probably did because we "shocked" them with what they were getting next. Okay, it worked this time, but I guess we shouldn't do this too often.





Sunday, September 6, 2020

Late change requests

 I wish that sometimes, it could be as easy as that...


Sunday, August 30, 2020

Digging in the mud

As a software tester, I often feel like being an archeologist. I am analyzing bits and pieces that are found in requirements, use-cases, user-stories, emails, phone calls, balcony talks, meetings, defects, etc. I am collecting all these wide spread pieces of information and try to put them back together as accurate as possible. Unlike the two men in the excavation, I like to do this kind of job. When the connected pieces turn into a nice picture in my head, I am gluing them together in the form of well documented test cases or - if it is a really large "dinosaur" - in a final report that contains all information needed. I do it mainly for me, but I also do it for other stakeholders so they don't need to to through the same laboursome process when someone has the same questions.

The other analogy to a tester is the explorer who collects facts, numbers, painstaking noticing observed behavior or numbers gained from measures. The explorer then groups the data and analyses the collected material trying to find patterns that allow for interesting new findings that either confirm or refute assumptions made upfront. These conclusions are then (hopefully) used to make accurate decisions elsewhere (and also at your own desk).

Yet another analogy is the one of a fed. Testers are regularly cluttered with fake information. Of course, we also get accurate information, but it's all mixed in a bucket and we need to sort it out. Unlike a criminal who tries to protect himself using lies, we may get information that has either not been accurately analyzed or was invented to stop others from further asking and investigating. We, the testers, get information that is based on assumptions and it is our job to question everything. Some nice guy once stated "testers question everything, this includes authority". 

There is also some analogy to medical practitioners. The patient gets interviewed by the doctor whose goal is to identify the root cause of the suffering. The software-under-test is the patient, while the tester is the doctor who checks the sick patient. When successful, the doctor identifies the root cause and solves the problem by prescribing selective medicine for treatment. The tester does similar things. She raises a defect and describes the symptoms as accurate as possible. If the examination reveals a problem that is not within the doctor's specialized field, she may delegate additional examination to be made by someone who is more specialized in the area where the doctor assumes a problem. The tester assignes the ticket to a developer or DevOps engineer for further investigation..

In case you have more examples of such analogy, simply post a comment to this blog or drop me a message at Linked-In or whatever channel you like.