Acceptance testing is a key method in Agile. One way of defining acceptance tests are Gojko Adzic‘s ”Specification by example” paradigm which has gained quite a bit of momentum lately. I personally found it to be both refreshing and nice when I heard him present it at Agile Testing Days 2009, and I also found his book Bridging the Communication Gap a nice read.
I’m sceptical of the concept of acceptance testing. Not because verification of agreed functionality is not a good thing, but because it tends to shift attention to verification instead of exploration.
This will shift attention from testing to problem prevention. Is that bad, you ask? Isn’t it better to prevent problems than to discover them?
Well, most people think ”why didn’t I prevent this from happening?” when problems do happen. Feelings of regret are natural in that situation and that feeling can lead you into thinking you should improve your problem prevention. And maybe you should, but more examples aren’t going to do it!
Real testing is still necessary.
To explain why, I’ll consult one of the great early 20’th century mathematical philosophers: Kurt Gödel. In particular his first incompleteness theorem. It says that no consistent system of axioms whose theorems can be listed by an “effective procedure” is capable of proving all facts about the natural numbers.
What does this mean to us?
It means that we will never be able to list all things that can be done with this particular set of data.
A specification is a kind of listing of ”valid things to do” with data, thus Gödel’s theorem teaches us that there are infinitely more things to a system than any long list of requirements. This also applies when the requirements are listed as examples.
If you’re in the business to deliver products of only ”agreed quality” to a customer, you can be all right only verifying things which are explicity agreed. If something goes wrong you can always claim: ”It wasn’t in the specification!”
But if you’re striving for quality in a broader sense, verifying that the system works according to specifications is never going to be enough.
Gojko has made a good contribution to agile. Examples can be useful and efficient communication tools, and if they are used correctly they can help making users and other interested parties better aware of what’s going on on the other side. His contribution can help bridge a communication gap. It can also produce excellent input for automated unit tests.
Just don’t let it consume your precious testing time. The real testing goes far beyond verification of documented requirements!
If you want to learn more about this, I recommend you sign up for one of the Rapid Software Testing courses offered by James Bach and Michael Bolton.
Tag: context driven testing
Covering test coverage
Rolf Østergaard @rolfostergaard suggested on twitter when I posted my previous blog that instead of counting defects and tests we take a look on test coverage. Certainly!
Mathematically, coverage relates the size of an area fully contained in another area, relative to the size of that other area. We could calculate the water coverage of the Earth or even how much of a floor a carpet could cover. Coverage can be expressed as a percentage.
But coverage is also a qualitative term. For example a book can cover a subject, or a piece of clothing can give proper (or improper!) body coverage.
So what is test coverage? Well, the term is often used to somehow describe how much of a system’s functionality is covered by testing.
Numbers are powerful and popular with some people, so a quantified coverage number would be nice to have. One such number is code coverage, which is calculated by dividing the number of code lines which have been executed at least once by to the total number of code lines in a program.
Another measurement relies on business requirements for the system being registered and numbered, and tests mapped to the requirements which they test. A suite of tests can then be said to cover a certain amount of requirements.
Numbers can hint something interesting. E.g. if your unit tests exercise only 10% of the code and it tends to be the same 10% on all of them, the chances are that something important will be missing from the unit tests. Or you could even have a lot of dead legacy code. This would be similar if you found that you actually only tested functionality in a few of the documented business requirements: Could the not-covered requirements be just noise?
No matter what, a coverage number can only give hints. It cannot give certanity.
Let’s imagine we can make a drawing of the functionality of a system; like a map. Everything on the map would be intended functionality, everything outside would be unaccepted. Let’s make another simplification and imagine for the moment that the map is the system, not just an image of it. Here is an example of such a simple system:
The blue area is the system. The red spots are checks carried out as part of testing. Some of the checks are within the system, others are outside it. The ones within are expected to pass, the ones outside are expected to fail.
Note that there is no way to calculate the test coverage of this imaginative system. Firstly, because the area outside the system is infinite and we can’t calculate the coverage of an infinite area. Secondly, because the checks don’t have an area – they are merely points – so any coverage calculation will be infinitesimal.
Ah, you may argue, my tests aren’t composed of points but are scripts: They are linear!
Actually, a script is not a linear entity, it’s just a connected sequence of verification points, but even if it was linear, it wouldnt’ have an area: Lines are one-dimensional.
But my system is not a continous entity, it is quantified and consists only of the features listed in the requirement document.
Well that’s an interesting point.
The problem is that considering only documented requirements will never consider all functionality. Think about the 2.2250738585072012e-308 problem in Java string to float conversion. I’m certain there are no requirement documents on systems implemented in Java, which actually listed this number as being a specifically valid (or invalid) entry in input fields or on external integrations. The documents probably just said the system should accept floats for certain fields. However a program which stops responding because it enters an infinite loop is obviously not acceptible.
A requirement document is always incomplete. It describes how you hope the system will work, yet there’s more to a system than can be explicitly described by a requirements document.
Thus any testing relying explicitly on documented requirements cannot be complete – or have a correctly calculated coverage.
My message to Rolf Østergaard is this: If a tester makes a coverage analysis of what he has done, remember that no matter how the coverage is measured, any quantified value will only give hints about the testing. And if he reports 100% coverage and looks satisfied, I strongly suggest you start looking into what kind of testing he has actually done. It will probably be flawed.
Intelligent testing assists those who are responsible for quality in finding out how a system is actually working, it doesn’t assure quality.
Thanks to Darren McMillan for helpful review of this post.
I had the opportunity of being teacher for my son Frederik’s 7th grade class today at Kvikmarken. The school had decided to send the ”real” teachers on a workshop together, so the job of teaching the children was left to volounteering parents. I’m enthusiastic about learning, so felt obliged to volunteer.
I gave them a crash course in software testing.
These boys and girls are really smart. They quickly grasped an understanding of what software testing is about: Exploring and learning. Remember, they are brought up with laptops, mobile phones, and the internet, so they’ve learned to cope with all the shortcomings and bugs that come with today’s technology. I could have expected them to be blind of bugs. But no, they do see them – and they are splendid explorers and learners.
I started my lesson with a flash back 40-50 years ago when computers were as big as a classroom and software was written on note paper and stored on punched tape. I compared this with today’s state of the art commodity computer: An iPad, smaller than a school book – and a lot more powerful than computers then.
Complexity has increased by a factor of at least one million – and it shows! (Though I should add that the evolution of software development tools and methods has also improved a good deal.)
They now know what a bug is, why there are bugs in software, what testing is about and how testing is a learning and discovery process. They also have an idea of why we’re testing – they particularly liked this one: We test because it feels good to break stuff!
Finally, the had a chance to prove their collective exploration abilities in a testing exercise. They did splendidly!
The last slide of my presentation contained a quote from James Bach on Twitter yesterday, words of a true tester on bug hunt using his senses:
Say “it looks bad” and I hear you.
Say “it smells bad” and I taste BLOOD.
I’m happy that Frederik’s classmatetes liked my lesson: ”Great presentation, thanks!”, ”Wow, testing is fun!”, ”You’ve got a really cool job!”
I think there’s good reason to expect software engineering to improve a lot when these boys and girls get to be responsible for software engineering!
Bohr on testing
When Niels Bohr and his team of geniuses at his institute in Copenhagen developed quantum physics, they fundamentally changed science by proving wrong an assumption dating very far back in science: That everything happens for a reason. In science, building and verifying models to predict events based on knowledge of variables is an important activity, and while this is still meaningful in certain situations, quantum mechanics proved that on the microscopic level of atoms and particles, things don’t happen for a reason. Put in another way: You can have effects without cause. In fact, effects don’t have causes as such at this level.
This means that in particle physics, you cannot predict precisely what’s going to happen, even if you know all variables of a system. Well in fact you can, but only if you’re prepared to give up knowing anything about when and where the effect will happen.
This is counterintuitive to our daily understanding of how the world works. But there’s more: According to quantum physics, it is impossible to seperate knowledge of variables of a system from the system itself. The observation of the system is always part of the system, and thus changes the system in an unpredictable way.
If you find this to be confusing, don’t be embarrassed. Even Einstein never accepted this lack of causality.
Bohr was a great scientist, but he was also a great philosopher. He did a lot of thinking about what this lack of causaility and the inseperability of observation from events would teach us about our understanding of nature. On several occasions he pointed out that even on the macroscopic level, we cannot ignore what is happening on the atomic and particle level. First of all because quantum physics did away with causality as a fundamental principle, but also because quantum effects are in fact visible to us in our daily macroscopic life: He used the example of the eye being able to react on stimuli as small as those of a single photon and argued that it is very likely that the entire organism contains other such amplification systems where microscopic events have macroscopic effects. In some of his philosophical essays he points out how psychology and quantum mechanics follow similar patterns of logic.
So does testing. In software testing we are working to find out how a compuster system is working. Computers are algorithmic machines designed in such a way that randomness is eliminated and data can be measured (read) without affecting the data, but the programs are written by humans and are used by humans, so the system in which the computer program is used is both complex and inherently unpredictable.
We’re also affecting what we’re testing. Not by affecting the computer system itself, but by affecting the development of the software by discovering facts about the system and how it works in relation to other systems and the users.
In some schools of software testing, the activity is reduced to a predictable one: Some advocate having “a single point of truth” about what is going to be developed in an iteration, and that tests should verify that implementation is correct – nothing more. They beleive that it is possible to assemble “all information” about a system before development starts, and that any information not present is not a requirement and as such should not be part of the delivery.
That is an incorrect approach to software engineering and to testing in particular. Testing is much more than verification of implementation, and the results of testing are as unpredictable as the development process itself is. We must also remember that it is fundamentally impossible to collect all requirements about a product: We can increase the probability of getting a well working system by collecting as much information as possible about how the system should work and how it is actually working (by testing), and comparing the two, but the information will always be fundamentally incomplete.
Fortunately we’re not stupid. It is consistent with quantum physics.
Studying the fundamental mechanisms of nature can lead to a better understanding of what we are working with as software engineers and as software testers in particular.
Finding the perfects
Friend and tester colleague Jesper Ottosen participated in what appeared to be a great event and discussion at EuroStar 2010: The rebel alliance night (link to Shmuel Gershon’s blog with video recordings of the talks), where he spoke about whether we as testers can start looking for more than defects. What if we started looking for the perfects?
I like the idea: Is testing really only about finding problems? It can be depressing to be the one always to tell the bad news (especially when there is a lot of bad news or the bad news are not really welcome). Do we testers really have to be worried all the time? If we start communicating perfects too, will our careers not get both better and more successful?
I see a problem, though. Looking for good things will be in conflict with the very mindset of testing. Programming is a creative process where the programmer creates something new and unique. He does it to solve a problem and he does it in the assumption that it will solve the problem. If he starts out assuming that it won’t work, he will be psychologically blocking his creativity and he will probably not perform well.
As a tester, I look at software with the reverse assumtion: I assume that it will not work. This assumption is stimulating my creativity to find the bugs because I will get ideas of where they’re hiding.
With that assumption, I just can’t be successful looking for good things!
That said, however, I do beleive that we sometimes need to be positive, especially to satisfy some managers and programmers. They’re used to hear bad news from us and some people can’t take that. Switching for a moment to looking for “perfects” might actually work very well in this respect. Just don’t forget that we’re doing it for them, not to do our job.
And don’t forget that it can only be for a while: We have to think negatively to be successful. We make a difference when we find the obvious problems with the product: The problems that will cause severe dissatisfaction among users and managers if they slip into product. We’re a great help to our clients because we prevent bugs by finding them before the users!
Here’s Jesper at EuroStar 2010:
[youtube=http://www.youtube.com/watch?v=wB_N-TZPde8&fs=1&hl=da_DK&color1=0x5d1719&color2=0xcd311b]
Based on evidence…
I’m very interested in shool education and I’m chairing the board of the local school. My particular field of interest is education for children with learning disabilities. This is because two of my boys have ADHD with reading disabilities.
We’re looking for new ways to educate children with special needs here in Denmark, and the trend is to include them in the normal educational environment – not put them in special schools. There are two reasons to do so: One is that it’s cheaper, the other that the effect of special education is statistically not very good.
I’m regularly discussing the dilemmas of special education with local politician and social sciences professor Bent Greve, and lately he and I had a short e-mail discussion about whether education should be evidence based. This is also a public debate, similar in nature to the debate taking place in the testing community.
Bent Greve explained me why he, as a politician, requires education to be evidence based: He needs to know that the practices which are being applied actually works. I suppose it is about supporting his descision making, but he pointed out that it is also in the interest of those they’re trying to help. It can be done in health care, so he beleives it can be done in other areas too. However, it should never be an excuse for the individual professionals’ responsibility for what he or she is doing – the evidence should support them in making the right decisions.
This point of view is sympathetic if you’re engaged in politics or managing a company: You want your workers to do work that works. Not work that doesn’t work. Right?
On the other hand is the point of view of the professional, who uses his talent and creativity to research for and find solutions. Teachers generally argue against the focus on evidence and find that it limits their freedom and makes them less good teachers. I also argued with Bent that, sometimes the “right descision” turns out to be the wrong descision. For example: If you base your teaching on evidence of what works in general, we can be certain that there will be between 5 and 20% of the pupils who will not learn anything. So what should we do for them?
Bent Greve responded intelligently: Evaluation and evidence cannot be left alone. Continous experimentation and development is needed, as nothing can be said to be final truths.
I agree.
As a tester craftsman I follow patterns in my work which work for me most of the time, but sometimes I find that they don’t. If I test and don’t find any bugs, I feel dissatisfied and confused. I think I may have used the wrong pattern, but I’m in jeopardy and I don’t know what to do since my pattern failed. Eventually I may have to give up, or I may discover a pattern that works within the time that I have to test.
Managers don’t find bugs, testers do. Politicians don’t educate pupils, teachers do. So who should we trust? I’m not going to answer this question since it’s absurd: There would not have been schools if there hadn’t been politicians and there would not have been any software companies if there were no managers. But I think everyone can agree that by the end of the day, the thing that really matters is that we find as many of those annoying bugs as possible before the product is released, and that the pupils learn the most from attending school. Right?
Correct. But sometimes we find that the patterns we’re following in our worklife (whether that’s teaching or testing) are not working. Then what? We have to stop and think. Something is wrong. How do we progress from here. This is where the most important rule is: Don’t apply the same pattern again – try something different!
PS: I was not at Eurostar 2010 today so did not hear Stuart Reid’s keynote. I did, however follow some of the noise it caused on Twitter!
Usability testing is done for roughly the same reason as other kinds of testing: To discover new knowledge about the system under test. In this case knowledge about how users work with the system.
But Usability Testing is a very different discipline from ordinary system testing.
In my opinion, the one thing that makes usability testing different from system testing is, that it never discovers absolute facts about the system.
Instead, a usability test will only say something about how the system works in relation to someone – and this someone is a person – or persons. And as you have probably experienced several times in your life, real persons aren’t absolute, predictable, or static – they’re quite dynamic and you never really know what to expect. Usability testing is a bit like integration testing against a remote system, which keeps changing, even while you test it!
Another aspect is that it’s much more important, yet also more difficult, to describe the context in which the system is to be used and the test is executed in.
I’ll illustrate that with a basic example: Many corporations implement systems for internal use which are not really user friendly, but since the employees get paid for using them, they don’t mind this – and once they’ve used the system for a few weeks, the system may have become an efficient work tool. An opposite example is most computer games, which are deliberately designed to be inefficient, but are usually easy to learn and use. These two systems or applications may be used on the same day by the same person, but in different situations, of course.
Setting the context is not always possible, though. For example is that most people will only respond to things which they understand: Chances are that if you had tested Facebook on users in 1995, they wouldn’t like it at all because they would not understand what it was to be useful for.
Essentially, I feel that the tester must be much more conscious about what he is doing and how it is affeting the test results. I actually believe that the description of the context itself and the way you describe it is as influential on the results you get, as the system and the users themselves.
Yes, testing usability can be very challenging!
WordPress used to have a tagline saying “code is poetry”, apparantly Microsft has used it too. I don’t know who came up with it first, but I agree. I wrote my first piece of BASIC code almost 30 years ago, my first machine code program more than 25 years ago, and learned C also about 25 years ago. Code can be as meaningful and meaningless as poetry, so the analogy is correct. To me at least.
Now, many years later, I’m a professional tester. I find problems with other peoples business processes, architecture, software designs, and implementations. I don’t write code any more (except when I’m using it to test something).
But where’s the poetry? Is testing essentially a non-poetic activity?
I don’t think so. Yet, it doesn’t make sense to say “tests are poetry”. They aren’t. Then, what are they?