Nancy Leveson is professor of Aeronautics and astronautics at MIT. She is one of the worlds’ leading researchers on safety, a very serious researcher. I’m using some of the techniques she has developed analyzing complex systems for safety. Her papers are often interesting, but the title of her latest paper blew my mind when I read it: An Engineering Perspective on Avoiding Inadvertent Nuclear War.
I was born in 1969 and grew up during the cold war. One of the dangers we feared was that a mistake would happen, a bomb would detonate over Russia, Europe or the US, and uncontrolled retaliation would end the world. Dr. Strangelove, the movie, immortalized this scenario.
Take a deep breath if you watched the trailer above before reading on.
Leveson is not fearful. She has produced the paper for a workshop on systems and strategy stability and she looks back at why this horror scenario didn’t occur:
“The most successful complex systems in the past were simple and used rigorous, straightforward processes. Prevention of accidental detonation of nuclear bombs, for example, used a brilliant approach involving three positive measures […] and reliance on simple mechanical systems that could provide ultra-high assurance. Although there were a few incidents over a long period […] inadvertent detonation did not occur in those cases.”
The question she raises in her paper is whether we are still safe? Well, things are changing:
“The more recently introduced software-intensive systems have been much less reliable. […] More recently, our ability to provide highly trustworthy systems has been compromised by gratuitous complexity in their design and inadequate development and maintenance processes. For example, arguments are commonly made for using development approaches like X-treme Programming and Agile that eschew the specification of requirements before design begins.”
Yes, Leveson is a critic of the popular, modern development paradigms most of us has learned to love. Have we mistakenly stopped worrying?
I met her at a conference on STAMP/STPA in Iceland in 2017, and during a conversation which I was so fortunate to have with her in the lobby of the Iceland University she made herself very clear about her skepticisms towards Agile. But Agile is not the only problem:
“Providing high security is even more problematic. Again, only the most basic security techniques, such as providing an air gap to isolate critical systems, have been highly successful. The number of intrusions in today’s systems is appalling and unacceptable. Clearly what we are doing is not working.”
Leveson suggests a paradigm shift and suggests what the shift can look like. In the paper she discusses systems theory, and how approaches like those she describes in her 2010 book Engineering a Safer World can be useful.
The article can be downloaded in PDF from the MIT website Partnership for Systems Approaches to Safety and Security (PSASS).
I highly recommend anyone interested in software systems safety to read it and reflect on what dr. Leveson has to say
Dr. Leveson at Iceland University with myself and an Icelandic researcher on volcanic safety.
As testers we need to better understand and be explicit about problems in testing that don’t have known, clear, or obvious solutions. Cynefin can help by transforming the way we, our teams, and our stakeholders think about testing problems. Ben Kelly and James Christie has written very good blogs about Cynefin and testing. Liz Keogh was one of the first to write about Cynefin in development. At the bottom of this post, I have included a video with David Snowden and a link to an article I found interesting when I read it.
With this blog post I’m sharing elements of my own understanding of Cynefin and why I think it’s important. I think of Cynefin itself as a conceptual framework useful for comprehending dynamic and complex systems, but it is also a multi faceted “tool” which can help create context dependent conceptual frameworks, both tacit and explicit, so that we can better solve problems.
But before diving into that (and in particular explain what a conceptual framework is), I’d like to share something about my background.
Product design and the historic mistakes of software development
I used to study product design in university in the early 90’s. Creating new and innovative products does not follow obvious processes. Most engineering classes taught us methods and tools, but product design classes were different.
We were taught to get into the field, study real users in their real contexts, develop understandings of their problems, come up with prototypes and models of product ideas, and then try out these prototypes with the users.
Discussing an early draft of this post with James Christie, he mentioned that one of the historic mistakes of software development has been the assumption that it is a manufacturing process, whereas in reality it is far more like research and development. He finds it odd that we called it development, while at the same time refusing to believe that it really was a development activity.
SAFe, “the new black” in software delivery, is a good example of how even new methodologies in our industry are still based on paradigms rooted in knowledge about organizing manufacturing. “The Phoenix Project”, a popular novel about DevOps states on the back cover that managing IT is similar to factory management.
What I was taught back in the 90’s still help me when I try to understand why many problems remain unsolved despite hard work and many attempts on the opposite. I find that sometimes the wrong types of solutions are applied, solutions which don’t take into consideration the true nature of the issues we are trying to get rid of, or the innovations we’re trying to make.
Knight Capital Group, a testing failure
The case of Knight Capital Group is interesting from both innovation, risk and software testing perspectives, and I think it exemplifies the types of problems we get when we miss the complexity of our contexts. Knight Capital Group was one of the more aggressive investment companies in Wall Street. In 2012 they developed a new trading algorithm. The algorithm was tested using a simulation engine, I assume to ensure to that stakeholders that the new algorithm would generate great revenues.
The testing of the algorithm was not enough to ensure revenues, however. In fact, the outcome of deploying to algorithm to production was enormous losses and the eventual bankruptcy of the company after only 45 minues of trading. What went wrong? SEC, Securities and Exchange Commission of the U.S.A.:
[…] Knight did not have a system of risk management controls and supervisory procedures reasonably designed to manage the financial, regulatory, and other risks of market access […] Knight’s failures resulted in it accumulating an unintended multi-billion dollar portfolio of securities in approximately forty-five minutes on August 1 and, ultimately, Knight lost more than $460 million […]
But let’s assume a testing perspective.
It think it’s interesting that the technical root cause of the accident was that a component designed to be used to test the algorithm by generating artificial data was deployed into production along with the algorithm itself.
This test component created a stream of random data and was of course not supposed to run in production since it was designed to generate a stream of random data about worthless stock.
I find it strangely fascinating that the technical component that caused the accident was designed for testing.
Why didn’t someone ensure that the deployment scripts excluded the testing components?
Was it software testing that failed? It is not uncommon that software testing is entirely focused on obvious, functional and isolated performance perspectives of the system under test.
The component did it’s job: Helped test the new product. The testing strategy (probably undocumented) however, obviously did not consider possible side effects of the component.
I think Cynefin could have helped.
Cynefin transforms thinking
Let’s imagine we’re test managers at Knight and that we choose to use Cynefin to help us develop the testing strategy for the new algorithm.
David Snowden talks about Cynefin as a ‘sensemaking tool’ and if you engage Knights’ management, financial, IT-operations, and development people in a facilitated session with a focus on risks and testing,
I’m pretty sure the outcome would be the identification of the type of risk that ended up causing the bankruptcy of the company, and either prevented it by explicitly testing the deployment process, or made sure operations and finace put the necessary “risk management controls and supervisory procedures” in place.
I think so because I have observed how Cynefin sessions with their brainstormings are great for forming strategies to deal with the problems, issues, challenges, opportunities etc that we are facing. It helps people talking seriously about the nature of problems and issues, transforming them into smaller chunks that we can work with, and to help escalate things that require escalation.
Cynefin seems to be efficient breaking the traditional domination of boxed, linear and causal thinking that prevent problem solving of anything but the simplest problems.
My interpretation of what is happening is that Cynefin helps extend the language of those participating in sessions.
Decision makers a Knight Capital did not think about possible negative outcomes of the testing software. They had a simplistic view of their business risks. Cynefin could have helped them by extending their ‘sensemaking’ to more complex risks than those they were focusing on.
In the following I’ll dive a bit more into why I understand the sensemaking part of Cynefin to be a language-extending tool.
Language and Conceptual Frameworks
Language is an every-day thing that we don’t think much about.
Yet it is the very framework which contains our thinking.
While we can know things we cannot express (tacit knowledge), we cannot actively think outside the frame language creates.
Many philosophers have thought about this, but here I’d like to refer to physicist Niels Bohr (1885-1962) who in several of his lectures, articles, and personal letters talks about the importance of language in science.
Science is in a way about sensemaking through knowledge gathering and poetically (paraphrasing from my memory) Bohr describes language as the string that suspends our knowledge above a void of endless amounts of experiences.
In “The Unity of Science”, a lecture given at Columbia University, New York in 1954, Bohr introduce language as a “conceptual framework”:
“[it] is important […] to realize that all knowledge is originally represented within a conceptual framework adapted to account for previous experience, and that any such frame may prove too narrow to comprehend new experiences.”
“When speaking of a conceptual framework, we merely refer to an unambiguous logical representation of relations between experience.”
Bohr was the father of quantum physics, which is more than new laws about nature. It introduced new and complimentary concepts like uncertainty, and non-deterministic relations between events. The extension was made for quite practical purposes, namely the comprehension of observations, but has turned out to be quite useful:
“By means of the quantum mechanical formalism, a detailed account of an immense amount of experimental evidence regarding the physical and chemical properties of matter has been achieved.”
The rest is history, so to speak.
This is relevant to software testing and Cynefin because I think that the conceptual frameworks based on the thinking developed during industrialism are far from capable of explaining what is going on in software development and therefore also in testing.
Further, Cynefin seems to be an efficient enabler to create extensions to the old thinking frameworks in the particular contexts in which we use it.
Cynefin and software testing
Software development is generally not following simple processes. Development is obviously a human, creative activity. Good software development seems to me to be much more like a series of innovations with the intention to enable someone doing things in better ways.
Testing should follows that.
But if language limits us to different types of linear and causal thinking, we will always be missing that there is generally no simple, algorithmic or even causal connection between the stages of (1) understanding a new testing problem, (2) coming up with ideas, and (3) choosing solutions which are effective, socially acceptable, possible to perform, and safe and useful.
Experienced testers know this, but knowledge is often not enough.
James Christie added in his comments to the early draft mentioned above that as testers, with Cynefin we can better justify our skepticisms about inappropriate and simplistic approaches. Cynefin can make it less likely that we will be accused of applying subjective personal judgment.
I would like to add that the extended conceptual framework which Cynefin enables with us and our teams and stakeholders further more allow us to discover new and better approaches to problem solving
David Snowden on Cynefin
This video is a very good, quick introduction to Cynefin. Listen to David Snowden himself explain it:
The agile manifesto explicitly values ”working software over comprehensive documentation.” In test, this means that actual testing is valued over test documentation. I would have put it this way: Focus is on the quality of the product, not on the quality of the documentation of the product.
We can probably all agree that it’s more fun to test and explore software than writing documentation. But it would be too radical, if we skipped writing documentation at all!
I think, however, that we testers should be more critical about the documentation we actually do produce, and that we should look for ways to improve the documentation we’re making.
The problem with documentation is not that too much time is spent writing it instead of actually testing the product. The problem is that the quality of the documentation is often just not good enough.
Most organizations have standards for test documentation and require their testers to write specific types of documents based on mandatory templates.
Templates are generally a good thing, but they can be problematic if you limit your writing process to ”filling in the gaps.” A good document contains useful information to the reader, so the most important step in the writing process is finding out what information is actually needed.
Not all sections of a template can be equally useful in all contexts (or useful at all), and very often you need to document things which there has not been left space for in the template.
But, finding out what to document is not trivial. Basically, it requires that you know the context of the project you are working on. Determining context can be difficult if you are new on a project, but asking questions can give you answers which will help you define the context.
Here are some questions, which I find useful:
Who will be reading the document? How will they be reading it? The challenge is to include content which the actual readers will find useful and to structure the content in a way so that they can find that information.
What information are they looking for? Try to get people to answer in concrete terms rather than using abstract vocabulary. The problem with written documentation is that stake holders will often not read it – most testers seem to prefer to explore systems on their own rather than reading documents, and managers will often not have time to read everything, but will only check the headline and certain details. But if readers get what they look for, chances are that they will read the document.
What kind of analysis do I need to carry out on the system to better understand it and test it? Writing a document about the system can often help me understanding it. I will discover missing knowledge and knowledge which I didn’t see initially. The writing process is part of the analysis.
Are there regulatory requirements which specify that testing should be documented to a certain detail and in a particular way? In this case, the test documentation is actually a product of the project.
Should the document assist knowledge transfer once I’m no longer on the project? In that case, the big question is what kind of knowledge should be ”transferred.”
I checked the section about documentation in Kaner, Bach and Pettichord’s Lessons Learned in Software Testing a few days ago. They have a better and longer list of context-free questions to ask about documentation which will help you find out what kind of documentation you need. They also list a few references to other lists of useful questions, so I suggest you take a look at that too.
Are there any specific types of documentation which is particularly useful? Indeed, there is:
Mind maps are becoming increasingly popular with testers, but ‘old style’ software diagrams like swimlane diagrams, state diagrams, and flow charts etc are still working well. The problem with mind maps is that they are often personal to the author of the map and not very easy to read.
Test scripts can be very useful in initial stages of knowledge transfer: A script can help another tester finding out how a certain procedure is performed in the system. However, a script will by itself tell the tester anything about the context of the script, and this is something which is often missed: Knowledge about a system under test is much more than knowing how to do things.
Check lists are actually much more useful than the name implies: A check list will list things to verify, but unlike a script will not specify in detail how to verify them. That information has to be available elsewhere e.g. in user manuals.
I always look for a document describing the system context in a top-down manner: What is the system doing, for who, and how? If it isn’t there, I don’t mind writing that document myself.
A catalogue of tools used in testing is often also very useful. Test tools are often not well documented (or not documented at all), and that can be a hurdle for new testers when they come aboard a project. A well written ”tips and tricks for testing system X” will get them up to speed faster and can act as a platform for sharing knowledge about testing specific parts of the system. I like Word documents for their self-contained’ness, but a Wiki could be better in many situations – the important thing is that such a document is actively maintained.
Acceptance testing is a key method in Agile. One way of defining acceptance tests are Gojko Adzic‘s ”Specification by example” paradigm which has gained quite a bit of momentum lately. I personally found it to be both refreshing and nice when I heard him present it at Agile Testing Days 2009, and I also found his book Bridging the Communication Gap a nice read.
I’m sceptical of the concept of acceptance testing. Not because verification of agreed functionality is not a good thing, but because it tends to shift attention to verification instead of exploration.
This will shift attention from testing to problem prevention. Is that bad, you ask? Isn’t it better to prevent problems than to discover them?
Well, most people think ”why didn’t I prevent this from happening?” when problems do happen. Feelings of regret are natural in that situation and that feeling can lead you into thinking you should improve your problem prevention. And maybe you should, but more examples aren’t going to do it!
Real testing is still necessary.
To explain why, I’ll consult one of the great early 20’th century mathematical philosophers: Kurt Gödel. In particular his first incompleteness theorem. It says that no consistent system of axioms whose theorems can be listed by an “effective procedure” is capable of proving all facts about the natural numbers.
What does this mean to us?
It means that we will never be able to list all things that can be done with this particular set of data.
A specification is a kind of listing of ”valid things to do” with data, thus Gödel’s theorem teaches us that there are infinitely more things to a system than any long list of requirements. This also applies when the requirements are listed as examples.
If you’re in the business to deliver products of only ”agreed quality” to a customer, you can be all right only verifying things which are explicity agreed. If something goes wrong you can always claim: ”It wasn’t in the specification!”
But if you’re striving for quality in a broader sense, verifying that the system works according to specifications is never going to be enough.
Gojko has made a good contribution to agile. Examples can be useful and efficient communication tools, and if they are used correctly they can help making users and other interested parties better aware of what’s going on on the other side. His contribution can help bridge a communication gap. It can also produce excellent input for automated unit tests.
Just don’t let it consume your precious testing time. The real testing goes far beyond verification of documented requirements!
If you want to learn more about this, I recommend you sign up for one of the Rapid Software Testing courses offered by James Bach and Michael Bolton.
Rolf Østergaard@rolfostergaard suggested on twitter when I posted my previous blog that instead of counting defects and tests we take a look on test coverage. Certainly!
Mathematically, coverage relates the size of an area fully contained in another area, relative to the size of that other area. We could calculate the water coverage of the Earth or even how much of a floor a carpet could cover. Coverage can be expressed as a percentage.
But coverage is also a qualitative term. For example a book can cover a subject, or a piece of clothing can give proper (or improper!) body coverage.
So what is test coverage? Well, the term is often used to somehow describe how much of a system’s functionality is covered by testing.
Numbers are powerful and popular with some people, so a quantified coverage number would be nice to have. One such number is code coverage, which is calculated by dividing the number of code lines which have been executed at least once by to the total number of code lines in a program.
Another measurement relies on business requirements for the system being registered and numbered, and tests mapped to the requirements which they test. A suite of tests can then be said to cover a certain amount of requirements.
Numbers can hint something interesting. E.g. if your unit tests exercise only 10% of the code and it tends to be the same 10% on all of them, the chances are that something important will be missing from the unit tests. Or you could even have a lot of dead legacy code. This would be similar if you found that you actually only tested functionality in a few of the documented business requirements: Could the not-covered requirements be just noise?
No matter what, a coverage number can only give hints. It cannot give certanity.
Let’s imagine we can make a drawing of the functionality of a system; like a map. Everything on the map would be intended functionality, everything outside would be unaccepted. Let’s make another simplification and imagine for the moment that the map is the system, not just an image of it. Here is an example of such a simple system:
The blue area is the system. The red spots are checks carried out as part of testing. Some of the checks are within the system, others are outside it. The ones within are expected to pass, the ones outside are expected to fail.
Note that there is no way to calculate the test coverage of this imaginative system. Firstly, because the area outside the system is infinite and we can’t calculate the coverage of an infinite area. Secondly, because the checks don’t have an area – they are merely points – so any coverage calculation will be infinitesimal. Ah, you may argue, my tests aren’t composed of points but are scripts: They are linear!
Actually, a script is not a linear entity, it’s just a connected sequence of verification points, but even if it was linear, it wouldnt’ have an area: Lines are one-dimensional. But my system is not a continous entity, it is quantified and consists only of the features listed in the requirement document.
Well that’s an interesting point.
The problem is that considering only documented requirements will never consider all functionality. Think about the 2.2250738585072012e-308 problem in Java string to float conversion. I’m certain there are no requirement documents on systems implemented in Java, which actually listed this number as being a specifically valid (or invalid) entry in input fields or on external integrations. The documents probably just said the system should accept floats for certain fields. However a program which stops responding because it enters an infinite loop is obviously not acceptible.
A requirement document is always incomplete. It describes how you hope the system will work, yet there’s more to a system than can be explicitly described by a requirements document.
Thus any testing relying explicitly on documented requirements cannot be complete – or have a correctly calculated coverage.
My message to Rolf Østergaard is this: If a tester makes a coverage analysis of what he has done, remember that no matter how the coverage is measured, any quantified value will only give hints about the testing. And if he reports 100% coverage and looks satisfied, I strongly suggest you start looking into what kind of testing he has actually done. It will probably be flawed.
Intelligent testing assists those who are responsible for quality in finding out how a system is actually working, it doesn’t assure quality. Thanks to Darren McMillan for helpful review of this post.
Michael posted the following two comments on Twitter shortly after I published this post:
There’s nothing wrong with using numbers to add colour or warrant to a story. Problems start when numbers *become* the story.
Just as the map is not the territory, the numbers are not the story. I don’t think we are in opposition there.
I agree, we’re not in opposition. Consider this post an elaboration of a different perspective – inspired by Michaels tweets.
Trying to measure quality into a product is like measuring height into a basket ball player.
Counting yesterday’s passing test cases is as relevant to the project as counting yesterday’s good weather is to the picnic
Counting test cases is like counting stories in today’s newspaper: the number tells you *nothing* you need to know.
Michael is a Tester with capital T and he is correct. But in this blog post, I’ll be in opposition to Michael. Not to prove that he’s wrong, not out of disrespect, but to make a point that while counting will not make us happy (or good testers), it can be a useful activity.
Numbers illustrate things about reality. They can also illustrate something about the state of a project.
A number can be a very bold statement with a lot of impact. The following (made up) statement illustrates this: The test team executed 50 test cases and reported 40 defects. Defect reporting trend did not lower over time. We estimate there’s 80% probablity that there are still unfound critical defects in the system.
80%? Where did that come from. And what are critical bugs?
Actually, the exact number is not that important. Probabilities are often not correct at all, but people have learnt to relate the word “probability” to a certain meaning telling us something about a possible future (200 years ago it had a static meaning, by the way, but that’s another story).
But that’s okay: If this statement represents my gut feeling as a tester, then it’s my obligation to communicate it to my manager so he can use it to make an informed decision about whether it’s safe to release the product to production now.
After all, my manger depends on me as a tester to take these decisions. If he disagrees with me and says ”oh, but only few of the defects you found are really critical”, then fine with me – he may have a much better view of what’s important with this product than I have as a test consultant – and in any case: he’s taking the resonsbility. And if he’s rejecting the statement, we can go through the testing and issues we found together. I’ll be happy to do so. But often managers are too busy to do that.
Communicating test results in detail is usually easy, but assisting a project manager making a quality assessment is really difficult. The fundamental problem is that as testers, by the time we’ve finished our testing, we have only turned known unknowns into known knowns. The yet unknown unknowns are still left for future discovery.
Test leadership is to a large extent about leading testers into the unknown, mapping it as we go along, discovering as much of it as possible. Testers find previously unknown knowledge. A talented ”information digger” can also contribute by turning ”forgotten unknowns” into known unknowns. (I’ll get along to defining ”forgotten unknowns” in a forthcoming blog entry, for now you’ll have to beleive that it’s something real.)
Counting won’t help much there. In fact it could lead us off the discovery path and into a state of false comfort, which will lead to missing discoveries.
But when I have an important message which I need to communicate rapidly, I count!