Rolf Østergaard @rolfostergaard suggested on twitter when I posted my previous blog that instead of counting defects and tests we take a look on test coverage. Certainly!
Mathematically, coverage relates the size of an area fully contained in another area, relative to the size of that other area. We could calculate the water coverage of the Earth or even how much of a floor a carpet could cover. Coverage can be expressed as a percentage.
But coverage is also a qualitative term. For example a book can cover a subject, or a piece of clothing can give proper (or improper!) body coverage.
So what is test coverage? Well, the term is often used to somehow describe how much of a system’s functionality is covered by testing.
Numbers are powerful and popular with some people, so a quantified coverage number would be nice to have. One such number is code coverage, which is calculated by dividing the number of code lines which have been executed at least once by to the total number of code lines in a program.
Another measurement relies on business requirements for the system being registered and numbered, and tests mapped to the requirements which they test. A suite of tests can then be said to cover a certain amount of requirements.
Numbers can hint something interesting. E.g. if your unit tests exercise only 10% of the code and it tends to be the same 10% on all of them, the chances are that something important will be missing from the unit tests. Or you could even have a lot of dead legacy code. This would be similar if you found that you actually only tested functionality in a few of the documented business requirements: Could the not-covered requirements be just noise?
No matter what, a coverage number can only give hints. It cannot give certanity.
Let’s imagine we can make a drawing of the functionality of a system; like a map. Everything on the map would be intended functionality, everything outside would be unaccepted. Let’s make another simplification and imagine for the moment that the map is the system, not just an image of it. Here is an example of such a simple system:
The blue area is the system. The red spots are checks carried out as part of testing. Some of the checks are within the system, others are outside it. The ones within are expected to pass, the ones outside are expected to fail.
Note that there is no way to calculate the test coverage of this imaginative system. Firstly, because the area outside the system is infinite and we can’t calculate the coverage of an infinite area. Secondly, because the checks don’t have an area – they are merely points – so any coverage calculation will be infinitesimal.
Ah, you may argue, my tests aren’t composed of points but are scripts: They are linear!
Actually, a script is not a linear entity, it’s just a connected sequence of verification points, but even if it was linear, it wouldnt’ have an area: Lines are one-dimensional.
But my system is not a continous entity, it is quantified and consists only of the features listed in the requirement document.
Well that’s an interesting point.
The problem is that considering only documented requirements will never consider all functionality. Think about the 2.2250738585072012e-308 problem in Java string to float conversion. I’m certain there are no requirement documents on systems implemented in Java, which actually listed this number as being a specifically valid (or invalid) entry in input fields or on external integrations. The documents probably just said the system should accept floats for certain fields. However a program which stops responding because it enters an infinite loop is obviously not acceptible.
A requirement document is always incomplete. It describes how you hope the system will work, yet there’s more to a system than can be explicitly described by a requirements document.
Thus any testing relying explicitly on documented requirements cannot be complete – or have a correctly calculated coverage.
My message to Rolf Østergaard is this: If a tester makes a coverage analysis of what he has done, remember that no matter how the coverage is measured, any quantified value will only give hints about the testing. And if he reports 100% coverage and looks satisfied, I strongly suggest you start looking into what kind of testing he has actually done. It will probably be flawed.
Intelligent testing assists those who are responsible for quality in finding out how a system is actually working, it doesn’t assure quality.
Thanks to Darren McMillan for helpful review of this post.
Month: March 2011
Michael posted the following two comments on Twitter shortly after I published this post:
There’s nothing wrong with using numbers to add colour or warrant to a story. Problems start when numbers *become* the story.
Just as the map is not the territory, the numbers are not the story. I don’t think we are in opposition there.
I agree, we’re not in opposition. Consider this post an elaboration of a different perspective – inspired by Michaels tweets.
Michael Bolton posted some thought provoking Tweets the last few days:
Trying to measure quality into a product is like measuring height into a basket ball player.
Counting yesterday’s passing test cases is as relevant to the project as counting yesterday’s good weather is to the picnic
Counting test cases is like counting stories in today’s newspaper: the number tells you *nothing* you need to know.
Michael is a Tester with capital T and he is correct. But in this blog post, I’ll be in opposition to Michael. Not to prove that he’s wrong, not out of disrespect, but to make a point that while counting will not make us happy (or good testers), it can be a useful activity.
Numbers illustrate things about reality. They can also illustrate something about the state of a project.
A number can be a very bold statement with a lot of impact. The following (made up) statement illustrates this: The test team executed 50 test cases and reported 40 defects. Defect reporting trend did not lower over time. We estimate there’s 80% probablity that there are still unfound critical defects in the system.
80%? Where did that come from. And what are critical bugs?
Actually, the exact number is not that important. Probabilities are often not correct at all, but people have learnt to relate the word “probability” to a certain meaning telling us something about a possible future (200 years ago it had a static meaning, by the way, but that’s another story).
But that’s okay: If this statement represents my gut feeling as a tester, then it’s my obligation to communicate it to my manager so he can use it to make an informed decision about whether it’s safe to release the product to production now.
After all, my manger depends on me as a tester to take these decisions. If he disagrees with me and says ”oh, but only few of the defects you found are really critical”, then fine with me – he may have a much better view of what’s important with this product than I have as a test consultant – and in any case: he’s taking the resonsbility. And if he’s rejecting the statement, we can go through the testing and issues we found together. I’ll be happy to do so. But often managers are too busy to do that.
Communicating test results in detail is usually easy, but assisting a project manager making a quality assessment is really difficult. The fundamental problem is that as testers, by the time we’ve finished our testing, we have only turned known unknowns into known knowns. The yet unknown unknowns are still left for future discovery.
Test leadership is to a large extent about leading testers into the unknown, mapping it as we go along, discovering as much of it as possible. Testers find previously unknown knowledge. A talented ”information digger” can also contribute by turning ”forgotten unknowns” into known unknowns. (I’ll get along to defining ”forgotten unknowns” in a forthcoming blog entry, for now you’ll have to beleive that it’s something real.)
Counting won’t help much there. In fact it could lead us off the discovery path and into a state of false comfort, which will lead to missing discoveries.
But when I have an important message which I need to communicate rapidly, I count!
Death by Virtual Memory
Every pc user knows that pc’s become slower and slower over time until the point where they are almost unusable. This is where upgrading RAM will usually help – until eventually you have to buy a new pc. Apparantly that’s just the way pc’s wear out.
Actually, pc’s don’t wear out – readers of my blog probably know that, but users (knowingly and unknowingly) add software which consumes resources of which memory is the most important one.
RAM used to be very expensive and therefore a scarce resource. Programmers used to do all sorts of tricks to fit their increasingly complex programs in memory. To help them focus on the programming task and not worry too much about resource scarcity, operating system designers invented something called virtual memory or swap memory.
Swap memory allowed the operating system to remove running processes from the (expensive and therefor scarce) RAM and store the state of the process on disk (‘swap’ it out – hence the name), from where it could later be restored into RAM and start running on the cpu. The technique is still employed by all modern operating systems, and while the amount of RAM has grown considerably to a level where lack of it is usually not a problem, virtual memory techniques are still useful with long running processes that only need to run once in a while and where it is not a problem if the initial response time is a second or more – and when they’re not running, the RAM can be used for caching file system data and other important things.
But what happens if load increases, e.g. if the number of users grow or the system becomes otherwise loaded and the processes running on the system start competing for memory? The good news is that functionally nothing changes: Virtual memory is transparant to the process, so the code will execute the same as it did before. But the bad news is that execution time increases rapidly when real memory become exhausted and the OS has to start using VM. If this only happens during nighttime or at other times when users or external systems are’nt depending on the system, all is probably okay, but if not then you can be in real trouble. In fact, the problem can be so bad that the system becomes useless.
In fact, with much more RAM and larger programs in today’s computers, the relative performance penalty is much higher than it used to be. This is because when the OS starts swapping, the amount of data that needs to be transferred in and out of the hard disk(s) is probably a factor of 10 higher than it would have been say 10 years ago. During that time, however, hard disk access speeds has only doubled, so overall, the damage you risk of hitting the virtual memory “wall” is much higher now than it used to be.
An interesting factor which I have found useful to look for is the fact that anti virus systems installed on your servers often make the virutal memory problem worse. They do so because they install hooks into applications running on the system, monitoring all i/o. This monitoring performs well as long as the anti virus system can keep its database and code in memory, but when memory starvation starts occurring, it can turn into a real bad situation. How can we detect that situation (except by performance dropping)?
I’m not aware of any really useful tools that can sit in the background automatically detecting (or better: predicting) memory starvation problems on running servers or test systems. But there are ways to look for it: On Windows, I’ll be looking at the running processes, particularly focusing on the Page Faults Delta column, looking for processes consistently experiencing high numbers here:
This is an important performance testing subject. And one which is too often overlooked.