Most of the real "cool" bugs that I find, usually don't show up during automated testing. They also don't show up during the execution of manual regression testing. Of course, both approaches are successful within the restrictions that are well-known. But the bugs that show up during in depth analysis of an already reported and hard to reproduce bug, often "generates" new findings of anomalies that I often didn't expect at this place.
To bring an example, my goal was to find the root cause of a particular error more and more customers ran into. It became a high priority when suddenly 100 customers were affected by it.
When analyzing real customer defects, my mind works differently than during normal testing activities. I have now a different thinking and ask questions like "Which scenario(s) is/are candidates that drive the application into the reported behavior.
While the functional testing techniques worked well without a customer in mind, the new approach takes the customer into foreground. I now wear a different hat. In order to understand what happened, I need to know how a customer uses the system in an end-to-end environment.
The first approach is usually to check the logs that help-desk provided and then hope to find the cause fast. The help-desk usually provides a log of one error at a specific point in time only. Sometimes you need more information, for instance, how did the user get that object he was unsuccessfully working on, who sent the object originally to him, who accepted it when and who forwarded the object at what time, who added data to the object, etc.
The log from the support gives you the necessary base information. Now I dig deeper into it to find more information about the object's "life cycle". Besides the client application log and besides the web server log files we have something like an Event-Log attached to each object.
The Event-Log provides information about which action has been executed by which user and in which state the object ended up after this operation. That sounds great but I've always had this strange feeling that something is wrong with this Log. There were too many inconsistencies and I feared I lose time while digging into this other area.
This time, I had no choice, I had to understand each action and each state it ended up, so I could reproduce the same scenario in our test environment. And since again, I couldn't quite follow up the weird order of the log entries and also the time-stamps didn't really make sense to me. It was now time to get into this new area of potential defects. I started to create an object from scratch. I created it and then immediately checked the event log and noted down which actions and states the event log produced. I append data to the object, sent it to someone else, pushed it through several web services and for each action made screenshot of the event log. Step by step I tried out varieties with the only goal to understand the patterns of the logs.
What I found was astonishing. My original assumptions are now confirmed. These entries are incorrect. To make it short, these entries didn't tell you the truth. When asking our support people whether they ever realized they ever realized the clutter in the event log sequence and time stamps they confirmed that they didn't like to read the logs because of the same reason I didn't look at it. They just could not follow these entries. I then talked to the architect which – to my surprise – confirmed that there are some known issues with the event log. Strenthend through this experience, I raised a bunch of new defects which were related to this Event Log. This was just one example. The truth is that I usually run into more such "side" effects
And now just happened what often happens while I am testing: I drifted away. I was out to find the root cause for a completely different defect and found a bunch of other new interesting anomalies which didn't have to do anything with the original defect.
BTW, what was the problem that I was originally out for testing?