I did a talk at TestBash Germany last week that sparked lots of positive response, but also some critique. Critique is fair: It was a 30 minute inspirational talk in which I wanted to explain why Immanuel Kant’s work “Critique of Pure Reason” matters to testers. Quite a few people found me afterwards, asked me questions, and commented: Critique. Job well done (I’m padding my own shoulder here).
Category: Blog posts in English
The opening keynote at Pipeline Conference, the yearly, non-profit, continuous delivery conference that took place in London on March 20th, 2018 was given by Elisabeth Hendrickson. Her words to us are still resonating:
There is no failure, only learning – an awful lot of learning
– Elisabeth Hendrickson
Where is Testing in CD?
Continuous Delivery is about integrating code and moving to production continuously. It’s a core principle in Agile.
Some 15 years ago, the idea of moving code to production fourthnightly or weekly was pretty cool. Today, deploying to production daily, even several times a day is the norm. To make it happen:
- Projects avoid branches and releases
- Instead development and delivery is happening in a flow
- Regression testing is largely automated to support the flow
- Production is monitored to detect problems before they happen
The question is where that leaves the job of exploring a product to learn about it?
I found an answer in London, but not just at the conference.
Quality is an illusion. That may seem like a bold statement, but there is a deeper truth to it that I will discuss in this blog. I’ll also discuss how we can approach the real.
We can think of testers as doctors, scientists, or researchers whose job is to research, explore, or examine some software, gather, analyze, and communicate factual knowledge based on observations.
But science teaches us that when we research and observe things, including software, what we “see” is not reality. At TED 2017, University of Sussex neuroscience professor Anil Seth called what we see “hallucinations”.
This gives the hopefully scientific tester some severe epistemological challenges: As she is a person, and is hallucinating, how can she (or we) trust her observations?
The problem for her is that the images that she experiences as real are a synthesis, an intuitive product of her observances based on a minimal amount of sensory data. The critical mindset is important in testing but doesn’t help by itself.
Fortunately philosophy has a solution for her (and us). Before I explain it, let me share a daily life story about intuitive illusions and assumptions.
Walking on Black Ice
I was out walking my poodle Terry a few days ago. A car came against us, but as we were on the sidewalk and the car on the road, the situation was intuitively safe.
Unfortunately, my intuition turned out to be wrong as only a moment later my foot slipped on the sidewalk and I realized that the wet road was not wet; both the road and the sidewalk were covered in black ice.
When another car approached I was aware of the danger, and made sure to keep myself and my dog safe.
There could be a moral in this story about always being cautious about cars and roads, but it might end up in over-cautiousness of the type that grandmothers sometimes impose on their grandchildren.
Instead I consider it a reminder that we don’t see things as they are: The road was wet until my foot slipped and I realized it was icy.
Already the Stoic philosophers in Rome 2000 years ago had figured this out.
Immanuel Kant’s Model of Mind
In 1781 the German philosopher Immanuel Kant published his mammoth work Critique of Pure Reason in which a key concept is the transcendental, which can be thought of as a bridge between the real and the hallucination.
Let me explain: Something that is only realized by intuition, dreams, myths etc, and which doesn’t link to experience, is transcendent. Something realized by pairing sensing and experience is transcendental.
Kant’s model is simple and straightforward, as Kant was pedantic, but it still needs some explanation:
Outside us is of course the objects which we sense. Kant calls them “the things in themselves”. It could be the road I was walking with my dog.
Kant thinks of us as rational beings who act on the basis of the thing in itself, and that has caused much debate. Skepticism will claim that the thing in itself is not available, and that there is only the subjective experience. Logical positivism will claim that the thing in itself doesn’t exist at all. Realism will doubt the subjective. We can probably all appreciate that the always rational human doesn’t exist.
But Kant’s bridge is interesting. What he says is that even though “the thing in itself” is not available to us, we can still say rational things about what we see.
So the mind is connected to the real in a way so we can gain and share experience. Does it sound weird? In a way it is, but Kant’s arguments has generally stood the test of time and critical philosophers – and even neuroscience.
So let me tie this to testing.
Belief as a Premise
There are different ways to test: In exploratory testing, we do it interactively by operating and observing the software. In automated testing we “outsource” the actions to another piece of software, and our task is then reduced to making sense of data from the tests and suggest and possible implement changes to the test-software. Scripted and chartered testing sits somewhere in-between the two “extremes”.
However,no matter how we practice testing, we need to make sense of what is observed. And since observing is subjected to sensing, the only thing we have available is our intuitive image about the thing we are testing.
James Bach is quoted as saying “Belief is a sin for testers.” I like the quote as it is an important reminder to be careful what we think: It’s not reality. The road might not only be wet. The software probably doesn’t always do what it did this time. I probably missed something. My mind is hallucinating.
So with a bit of wordplay in Kant’s home language, German, I’ll say that “die Sinne ist die Sünde.”
Our senses are the sinner, but as they are also our only hope to see some tings about reality belief is not an option. It’s a premise.
But since we know, we can establish the transcendental: Think the real rationally by testing our beliefs.
In other words: The realist approach to testing is to test the product. The transcendental approach is to test beliefs.
On Common Terms
There is something missing in the above as so far I’ve only talked about sensing, imagining, and experiencing. The brilliant part of Kant’s philosophy is that he explains how we can collect experiences.
Kant develops four categories of terms that we think by, and argues how they are given to us a priori, i.e. before we experience anything. He argues how they come from the existence of time and space. Back in his time Newton had just published his theories. Today, we’ve progressed, and it probably makes better sense to think of the terms as a result of the experience of space and time.
But what’s important is that although our experiences vary, we’re on common terms, so to speak.
This is important as it means we can think and express our knowledge about experiences generally.
Let me give some examples: I told you about the black ice on the road above, and while cannot be certain what I said is true, you can understand my experience. I can also share a testing problem, and we can imagine solutions together. I can try them out afterwards, and share experiences with you. We can even talk about testing in general, and imagine solutions to certain testing problems in general.
In other words: The terms allow us to relate, connect, discuss, collaborate, learn, reflect, prospect etc.
This makes the transcendental model of experience complete: We can sense, imagine, think, and express our thoughts into words and actions that we can share with others, who can do the same.
The Two Things I Want to Say
So what do I want to say with all this? I want to say two things:
The first is that yes, we are trapped in hallucinating minds. We might theoretically be able to escape them if we subject our testing to strict scripted procedures, make sure what we do is repetitively accurate, and only communicate what we can verifiably record and therefore objectively observe. But we’ll essentially be turning ourselves into machines and miss intuitive and tacit knowledge. And one way or another, we’re still stuck in a mess where at every and any judgement and decision made will be based on hallucinations.
But we’re not lost as we can explore the product and our intuitive ideas about it transcendentally, i.e. by realizing that both are in play when we test. Although we can’t get access to the “thing as it is”, we can experience it. Our expeirences do not have to be transcendent, i.e. disconnected from real, but can be transcendental.
And this is the second thing I’ll say: Since we are not alone in the trancendental, our roles as testers become clearer.
People are different, but I think a fundamental, perhaps even genetically coded, qualification for testers is to be sensitive people with intuitions which are easily disturbed by reality. On top of that, great testers need critical thinking skills, i.e. courage to doubt intuitive illusions, and creativity to come up with test ideas useful in the context. The rest is about interaction and communication with teams and stakeholders so that the good hallucinations about the software that we develop through our testing are shared.
Testing Transcendentally
In the spirit of Anil Seth, the neurology professor, let’s be honest: Software quality is a hallucination.
We can’t escape our minds and the apparent mess created by the hallucinations we think of as real. But we can experience quality transcendentally by testing.
To me testing is not so much an exploration of a product.
I see testing first and foremost as the transcendental practice of exploring of our own, and our team colleagues’ and stakeholders’ hallucinations about the product.
References
- Anil Seth: How your brain hallucinates your conscious reality. TED2017
- Internet Encyclopedia of Philosophy: Stoic Philosophy of Mind
- Immanuel Kant: Critique of Pure Reason, 1781/1789
Recently I discovered that there is a relation between Cynefin’s domains and the Greek Square, a square formed by the four fundamental human values; the true, the just, the beautiful, and the good.
This became clear to me when I was thinking about values governing and shaping our actions in the domains.
In the obvious domain, truth is the governor. What else could shape action in that domain than a desire for truth, fact, and sticking to those facts?
In the complicated, justice shapes actions, as this is where we ask others for help and seek knowledge, which always needs justification in the social. It is okay letting solutions on complicated problems rely on knowledge bases, past solutions to similar problems, and expertise.
The value that shapes my actions in complexity seems to be beauty. Dijkstra said, “beauty is our business” when he described programming. Creative and aesthetic leadership are tightly connected. Some philosophers have described the sense of beauty as a taste. In that case, the thing that keeps me going is the hope for good taste. And good taste is not just good, it is something with aesthetic value.
In chaos, we need to stay grounded, but act on our toes. A desire to do good is the only thing capable of grounding us in chaos, and this is where ultimately gut feelings (gut etymologically has the same root as good, and even God), and intuition are what I can rely on.
(I put freedom in the middle in my sketch below. This was inspired by Ole Fogh Kirkeby, who connects the four fundamental human values with human freedom. Whether it fits Cynefin, I’m not sure.)
More to come…
At the core of innovation in IT is someone getting the idea of connecting existing services and data in new ways to create new and better services. The old wisdom behind it is this:
The Whole is Greater than the Sum of its parts
– Aristotele
There is a flipside to this type of innovation that the opposite is also true: The whole can become more problematic than the negative sums of all the known risks.
My experience as a tester and test manager is that projects generally manage risks in individual subsystems and components quite well.
But I have on occasions found that we have difficulty imagining and properly taking care of things that might go wrong when a new system is connected to the infrastructure, subjected to real production data and actual business processes, and exposed to the dynamics of real users and the environment.
Safety, Accidents and Software Testing
Some years ago, I researched and came across the works of Dr. Nancy Leveson and found them very interesting. She is approaching the problem of making complex systems safe in a different way than most.
Leveson is professor of aeronautical engineering at MIT and author of Safeware (1994) and Engineering A Safer World (2011).
In the 2011 book, she describes her Systems-Theoretic Accident Model and Process – STAMP. STAMP gives up the idea that accidents are causal events and instead perceives safety as an emergent property of a system.
I read the book a while ago, but has only recently managed to begin the transformation of her ideas to software testing.
It actually took a tutorial and some conversations with both Dr. Leveson and her colleague Dr. John Thomas at the 5th European STAMP/STPA workshop in Reykjavik, Iceland in September to completely wrap my head around these ideas.
I’m now working on an actual case and an article, but have decided to write this blog as a teaser for other testers to look into Leveson’s work. There are quality resources freely available which can help testers (I list them at the end of this blog).
The part of STAMP I’m looking at is the STPA technique for hazard analysis.
According to Leveson, hazard analysis can be described as “investigating an accident before it occurs”. Hazards can be thought of as a specific type of bug, one with potentially hazardous consequences.
STPA is interesting to me as a tester for a few reasons:
- As an analysis technique, STPA helps identify potential causes of complex problems before business, human, and societal assets are damaged.
- One can analyze a system and figure out how individual parts need to behave for the whole system to be safe.
- This means that we can test parts for total systems safety.
- It works top-down and does not require access to knowledge of all implementation details.
- Rather, it can even work on incomplete models of a system that’s in the process of being built.
To work, STPA requires a few assumptions to be made:
- The complete system of human and automated processes can be modeled as a “control model”.
- A control model consists of interconnected processes that issue control actions and receive feedback/input.
- Safety is an emergent property of the actual system including users and operators, it is not something that is “hardwired” into the system.
I’d like to talk a bit about the processes and the control model. In IT we might think of the elements in the control model as user stories consisting of descriptions of actors controlling or triggering “something” which in turn produce some kind of output. The output is fed as input either to other processes or back to the actor.
The actual implementation details should be left out initially. The control structure is a mainly a model of interconnections between user stories.
Given the control model sufficiently developed, the STPA analysis itself is a two step activity where one iterates through each user story in the control structure to figure out exactly what is required from them individually to make the whole system safe. I won’t go into details here about how it works, but I can say that it’s actually surprisingly simple – once you get the hang of it.
Safety in IT
I have mentioned Knight Capital Group’s new trading algorithm on this blog before as it’s a good example of a “black swan project” (thanks to Bernie Berger for facilitating the discussion about it at the first WOTBLACK workshop).
Knight was one of the more aggressive investment companies in Wall Street. In 2012 they developed a new trading algorithm which was tested using a simulation engine. However, the deployment of the algorithm to the production environment turned out to be unsafe: Although only to be used in testing, the simulation engine was deployed and started in production resulting in fake data being fed to the trading algorithm. After 45 minutes of running this system on the market (without any kind of monitoring), Knight Capital Group was bankrupt. Although no persons were harmed, the losses were massive.
Commonly only some IT systems are considered “safety critical” because they have potential to cause harm to someone or something. Cases like that of Knight Capital indicate to me that we need to expand this perspective and consider safety a property of all systems that are considered critical to a business, society, the environment or individuals.
Safety is a relevant to consider whenever there are risks that significant business, environmental, human, personal or societal assets can be damaged by actions performed by a system.
STAMP/STPA and the Future of Testing
So, STPA offers a way to analyze systems. Let’s get this back to testing.
Software testing relies fundamentally on testers’ critical thinking abilities to imagine scenarios and generate test ideas using systematic and exploratory approaches.
This type of testing is challenged at the moment by
- Growing complexity of systems
- Limited time to test
- Problems performing in-depth, good coverage end-to-end testing
DevOps and CD (continuous delivery) attempts to address these issues, but they also amplify the challenges.
I find we’re as professional testers more and more often finding ourselves trapped into frustrating “races against the clock” because of the innovation of new and more complex designs.
Rapid Software Testing seems the only sustainable testing methodology out there that can deal with it, but we still need to get a good grip on the complexity of the systems we’re testing.
Cynefin is a set of theories which are already helping testers embrace new levels of complexity in both projects and products. I’m actively using Cynefin myself.
STAMP is another set of theories that I think are worth looking closely at. Compared to Cynefin, STAMP embraces a systems theoretical perspective and offers processes for analyzing systems and identify component level requirements that are necessary for safety. If phrased appropriately, these requirements are direct equivalents of test ideas.
STAMP/STPA has been around for more than a decade and is already in wide use in engineering. It is solid material from one of the worlds’ leading engineering universities.
At the Vrije Universiteit in Amsterdam, the Netherlands they have people taching STPA to students in software testing.
The automobile industry is adopting STPA rapidly to manage the huge complexity of interconnected systems with millions of lines of code.
And there are many other cases.
If you are curious to know more, I suggest you take a look at the resources below. If you wish to discuss this or corporate with me on this, please write me on twitter @andersdinsen or e-mail, or join me at the second WOTBLACK workshop in New York on December 3rd, where we might find good time to talk about this and other emerging ideas.
Resources
- Leveson, Nancy G. and Thomas, John: An STPA Primer [Online] // MIT Partnership for a Systems Approach to Safety (PSAS) 2013 https://psas.scripts.mit.edu/home/home/stpa-primer/
- Leveson, Nancy G.: Engineering a Safer World [Book]. – Boston : Massachusetts Institute of Technology, 2011. Downloadable as PDF for free on https://mitpress.mit.edu/books/engineering-safer-world
- MIT Partnership for a Systems Approach to Safety web site
Thanks to John Thomas and Jess Ingrassellino for reviewing drafts of this blog post. Errors you may find are mine, though.
As testers we need to better understand and be explicit about problems in testing that don’t have known, clear, or obvious solutions. Cynefin can help by transforming the way we, our teams, and our stakeholders think about testing problems.
Ben Kelly and James Christie has written very good blogs about Cynefin and testing. Liz Keogh was one of the first to write about Cynefin in development. At the bottom of this post, I have included a video with David Snowden and a link to an article I found interesting when I read it.
With this blog post I’m sharing elements of my own understanding of Cynefin and why I think it’s important. I think of Cynefin itself as a conceptual framework useful for comprehending dynamic and complex systems, but it is also a multi faceted “tool” which can help create context dependent conceptual frameworks, both tacit and explicit, so that we can better solve problems.
But before diving into that (and in particular explain what a conceptual framework is), I’d like to share something about my background.
Product design and the historic mistakes of software development
I used to study product design in university in the early 90’s. Creating new and innovative products does not follow obvious processes. Most engineering classes taught us methods and tools, but product design classes were different.
We were taught to get into the field, study real users in their real contexts, develop understandings of their problems, come up with prototypes and models of product ideas, and then try out these prototypes with the users.
Discussing an early draft of this post with James Christie, he mentioned that one of the historic mistakes of software development has been the assumption that it is a manufacturing process, whereas in reality it is far more like research and development. He finds it odd that we called it development, while at the same time refusing to believe that it really was a development activity.
SAFe, “the new black” in software delivery, is a good example of how even new methodologies in our industry are still based on paradigms rooted in knowledge about organizing manufacturing. “The Phoenix Project”, a popular novel about DevOps states on the back cover that managing IT is similar to factory management.
What I was taught back in the 90’s still help me when I try to understand why many problems remain unsolved despite hard work and many attempts on the opposite. I find that sometimes the wrong types of solutions are applied, solutions which don’t take into consideration the true nature of the issues we are trying to get rid of, or the innovations we’re trying to make.
Knight Capital Group, a testing failure
The case of Knight Capital Group is interesting from both innovation, risk and software testing perspectives, and I think it exemplifies the types of problems we get when we miss the complexity of our contexts.
Knight Capital Group was one of the more aggressive investment companies in Wall Street. In 2012 they developed a new trading algorithm. The algorithm was tested using a simulation engine, I assume to ensure to that stakeholders that the new algorithm would generate great revenues.
The testing of the algorithm was not enough to ensure revenues, however. In fact, the outcome of deploying to algorithm to production was enormous losses and the eventual bankruptcy of the company after only 45 minues of trading. What went wrong?
SEC, Securities and Exchange Commission of the U.S.A.:
[…] Knight did not have a system of risk management controls and supervisory procedures reasonably designed to manage the financial, regulatory, and other risks of market access […] Knight’s failures resulted in it accumulating an unintended multi-billion dollar portfolio of securities in approximately forty-five minutes on August 1 and, ultimately, Knight lost more than $460 million […]
But let’s assume a testing perspective.
It think it’s interesting that the technical root cause of the accident was that a component designed to be used to test the algorithm by generating artificial data was deployed into production along with the algorithm itself.
This test component created a stream of random data and was of course not supposed to run in production since it was designed to generate a stream of random data about worthless stock.
I find it strangely fascinating that the technical component that caused the accident was designed for testing.
Why didn’t someone ensure that the deployment scripts excluded the testing components?
Was it software testing that failed? It is not uncommon that software testing is entirely focused on obvious, functional and isolated performance perspectives of the system under test.
The component did it’s job: Helped test the new product. The testing strategy (probably undocumented) however, obviously did not consider possible side effects of the component.
I think Cynefin could have helped.
Cynefin transforms thinking
Let’s imagine we’re test managers at Knight and that we choose to use Cynefin to help us develop the testing strategy for the new algorithm.
David Snowden talks about Cynefin as a ‘sensemaking tool’ and if you engage Knights’ management, financial, IT-operations, and development people in a facilitated session with a focus on risks and testing,
I’m pretty sure the outcome would be the identification of the type of risk that ended up causing the bankruptcy of the company, and either prevented it by explicitly testing the deployment process, or made sure operations and finace put the necessary “risk management controls and supervisory procedures” in place.
I think so because I have observed how Cynefin sessions with their brainstormings are great for forming strategies to deal with the problems, issues, challenges, opportunities etc that we are facing. It helps people talking seriously about the nature of problems and issues, transforming them into smaller chunks that we can work with, and to help escalate things that require escalation.
Cynefin seems to be efficient breaking the traditional domination of boxed, linear and causal thinking that prevent problem solving of anything but the simplest problems.
My interpretation of what is happening is that Cynefin helps extend the language of those participating in sessions.
Decision makers a Knight Capital did not think about possible negative outcomes of the testing software. They had a simplistic view of their business risks. Cynefin could have helped them by extending their ‘sensemaking’ to more complex risks than those they were focusing on.
In the following I’ll dive a bit more into why I understand the sensemaking part of Cynefin to be a language-extending tool.
Language and Conceptual Frameworks
Language is an every-day thing that we don’t think much about.
Yet it is the very framework which contains our thinking.
While we can know things we cannot express (tacit knowledge), we cannot actively think outside the frame language creates.
Many philosophers have thought about this, but here I’d like to refer to physicist Niels Bohr (1885-1962) who in several of his lectures, articles, and personal letters talks about the importance of language in science.
Science is in a way about sensemaking through knowledge gathering and poetically (paraphrasing from my memory) Bohr describes language as the string that suspends our knowledge above a void of endless amounts of experiences.
In “The Unity of Science”, a lecture given at Columbia University, New York in 1954, Bohr introduce language as a “conceptual framework”:
“[it] is important […] to realize that all knowledge is originally represented within a conceptual framework adapted to account for previous experience, and that any such frame may prove too narrow to comprehend new experiences.”
And:
“When speaking of a conceptual framework, we merely refer to an unambiguous logical representation of relations between experience.”
Bohr was the father of quantum physics, which is more than new laws about nature. It introduced new and complimentary concepts like uncertainty, and non-deterministic relations between events. The extension was made for quite practical purposes, namely the comprehension of observations, but has turned out to be quite useful:
“By means of the quantum mechanical formalism, a detailed account of an immense amount of experimental evidence regarding the physical and chemical properties of matter has been achieved.”
The rest is history, so to speak.
This is relevant to software testing and Cynefin because I think that the conceptual frameworks based on the thinking developed during industrialism are far from capable of explaining what is going on in software development and therefore also in testing.
Further, Cynefin seems to be an efficient enabler to create extensions to the old thinking frameworks in the particular contexts in which we use it.
Cynefin and software testing
Software development is generally not following simple processes. Development is obviously a human, creative activity. Good software development seems to me to be much more like a series of innovations with the intention to enable someone doing things in better ways.
Testing should follows that.
But if language limits us to different types of linear and causal thinking, we will always be missing that there is generally no simple, algorithmic or even causal connection between the stages of (1) understanding a new testing problem, (2) coming up with ideas, and (3) choosing solutions which are effective, socially acceptable, possible to perform, and safe and useful.
Experienced testers know this, but knowledge is often not enough.
James Christie added in his comments to the early draft mentioned above that as testers, with Cynefin we can better justify our skepticisms about inappropriate and simplistic approaches. Cynefin can make it less likely that we will be accused of applying subjective personal judgment.
I would like to add that the extended conceptual framework which Cynefin enables with us and our teams and stakeholders further more allow us to discover new and better approaches to problem solving
David Snowden on Cynefin
This video is a very good, quick introduction to Cynefin. Listen to David Snowden himself explain it:
AI personally found this article from 2003 a very good introduction to Cynefin:
The new dynamics of strategy: Sense-making in a complex and complicated world (liked page contains a link to download the article)
The Art of Doubting
As a software tester, it is my job to question things. Questioning involves doubt, but is that doubt of a certain kind? Perhaps; let’s call it ‘good doubt’.
Monday May 15th 2017, I facilitated a philosophical, protreptic salon in Copenhagen about the art of doubting. The protreptic is a dialogue or conversation which has the objective of making us aware and connecting us to personal and shared values.
Doubt is interesting for many reasons. Self-doubt is probably something we all have and can relate to. But there seems to be value in a different kind of doubt than that with which we doubt ourselves.
Doubt is related to certainty. Confidence can be calculated statistically, and that seems to be the opposite of doubt.
Science almost depends on doubt: Even the firmest scientific knowledge is rooted in someone formulating a hypothesis and proving it by doubting it and attempting to prove it wrong.
Even religion, faith, seems to be related to doubt.
It is always interesting to examine the origins of a worud. The Danish and German words “tvivl” and “Zweifel” have the same meaning as the English doubt and all relate to the duo; two; zwei; to.
That appears to indicate that when we doubt we can be in “two minds”, so to speak.
So is doubt a special type of reflection, “System-2”, or slow thinking?
The protreptic is always about the general in terms of the personal. We examine our relations to doubt.
“What is it that our doubts wants or desires for us?” was one of my protreptic questions during the salon.
We circled a lot around that particular question. Finding an answer was difficult and we came back to self-doubt, which can be difficult to live with. Self-doubt can even harm our images, both the external ones and those that are internal to ourselves.
Leaders are usually expected not to have self-doubt: A prime minister risk loosing the next election if he doubts his own decisions and qualities. A CEO that doubts her own actions will drive the share value of the company down.
But there is a good doubt, and good doubt seems to be of a helpful nature.
Good leadership requires having the courage to doubt. It seems to help us act wisely and based on our experiences.
During the salon, my personal image of doubt changed. In the beginning I thought of doubt as a kind of cognitive function, perhaps a process I had to go through. Doubting could even be an event.
But at the end of the salon, my image of doubt changed into that a good friend walking with me through my life. Continuously present, if I want him.
With that image we found an answer to the question: Doubt is my friend. A friend who wants my actions to be driven not only by my instincts or simple gut feelings. A friends that help me shape my actions by my values.
Summary: Last year in September, I spoke at Anna Royzman’s New Testing Conference “Reinventing Testers Week” in New York about testing in Black Swan domains. The title of the talk refers to Nassim Taleb’s book “Black Swan” and concerned testing in contexts where improbable risks can have disproportionate effects. This blog contains an invitation to a peer conference in New York on the subject Sunday April 30th.
Sometimes things happen which appear to be beyond “the possible”.
This awareness haunts us in testing: We aim to get those important bugs to materialize. We want to qualify real and serious risks. Yet, our stakeholders have to accept that no matter how much testing is done, we cannot cover everything.
Logically, testing has to be focused on what is important to the business, and what might go wrong in reality. To me, that is at the core of risk based testing.
But saying that is one thing; doing it in reality is quite another. Certain risks seem almost impossible to qualify through testing. How should our stakeholders interpret the absence of clear testing results, for example, when we are trying our best to dig out quality information about quality? Could there be a serious problem lurking? The thought may seem paranoid, but experience shows it is not.
Years ago, I read Nassim Taleb’s “The Black Swan – The Impact of the Highly Improbable”. The book blew my mind and set me on a path to find out what we can do in testing about what he writes about.
The book is about “the random events that underlie our lives, from bestsellers to world disasters. Their impact is huge; they’re nearly impossible to predict; yet after they happen we always try to rationalize them.” (from the backcover of the 2010 paperback edition)
As an engineer and a human, I think testers and test managers should not give up and leave to product owners, project managers or developers to interpret testing results take care of potential Black Swans. As a tester, I wish to embrace the possibility of Black Swans and do quality testing with the aim to qualify them.
I think, however, that we need new models in testing. The problem is that most of our techniques and heuristics tend to support us best on the functional testing level.
Accident Focused Testing?
The first part in solving a problem is accepting it. It sounds basic, but acceptance implies understanding what we are dealing with. Reading Talebs book asserted to me that we have to accept the fact that really bad things can happen in the world. Knowing what I do about information technology, I appreciate that his philosophy can be applied on technology. I also believe that the functional testing will not help us muc.
Mentally examining what I do as a tester, I understood that the idea of Black Swans is fundamental to the very nature of what we do and the systems we work with.
That much about acceptance.
The problem is that in some contexts – banking, healthcare, industrial plant management, safety systems, public sector, transportation etc – accidents and Black Swans could be of a nature where they cause irrecoverable losses, set lives at stake or otherwise be fundamentally unacceptable.
Let me give an example:
I recently came across a story of an interesting IT breakdown in a hospital in Sweden. It concerned something most people do on a regular basis: Apply the newest updates to our pc’s.
As updates were rolled out in the hospital, performance of pc’s started degrading. During the rollout the problems became worse, and before the rollout could be stopped all computers in the hospital became useless.
Now that the computers stopped working, undoing the rollout became extremely difficult and had to be carried out manually, one pc at a time.
In the end it took IT-operations several days to get everything back to normal. Meanwhile the hospital had to be run “on paper”.
The hospital used an uncommon Windows network configuration, which is not recommended by Microsoft which in combination with the Microsoft update triggered a problem in the network. What is interesting here is not the root cause, however: The outcome of a seemingly trivial update in a complex system turned out very bad.
It is easy to imagine how the stress experienced by doctors and nurses due to this situation could have affected patients. Someone could have been hurt.
We can shrug and blame Microsoft or the hospital IT operations. However, as skilled testers, I think we need to be able to provide some kind of answer as to how we can constructively contribute to hospital safety by qualifying even Black Swan-types of risks.
Systemic risks
Before moving on, let me dive into the subject of risk. Risk is something we all talk about, but do we really know what it means? I’m not sure. Risk is a common thing to talk about, but the concept of risk is in no way simple.
There seems to be at least three “risk domains” in software projects:
- Some risks concern plans and schedules. Will the project be done on time and budget? That’s what we usually call “project risks”.
- Other risks concern the product or system under development: Will it do what it is conceived to do? Will it do it correctly? These are called “product risks”.
- Then there is a third class, a class of risks of a different nature: Systemic risks. They exist by combinations of systems, users, data, and environments.
Black Swans lurk in all three: Even simple products or components can sometimes fail in strange ways with huge impact. Just think of the defective Galaxy Note 7 battery problem, which was only a manufacturing problem with the battery, but one which has caused lots of harm to Samsung.
Black Swans are sometimes annoyingly simple.
But those kinds of Black Swans can be prevented by stricter quality control and similar tradtional measures. Project- and product risks are usually relatively easy to deal with using appropriate care in the context.
Systemic risks are different. They seem much more troublesome – and in some ways more interestng.
From simple to complex
Back in the early days of computing, I think systemic risks used to be rather uninteresting. Systems were simply… simple. Developing a new product, we would sometimes work to make sure usability was good, or the machine which the computer system was designed to control would work as a whole.
But that was it. Interfaces and interactions to other systems and contexts could be counted on one hand and there were usually very few connections to other computer interfaces etc.
If you have been interested in risk in software, you may have read about the Therac-25 accident. If not, let me summarize: A difficult-to-find multi-tasking bug in the control software of a radiation therapy machine turned out to be the root cause of apparantly random radiation burns of cancer patients placed in the machine for treatment. Some of these burns were fatal.
Obviously a Black Swan: A difficult-to-find bug in a lacking design.
The system was simple, however as there was only four components in the system: The user, the user interface software, the machine control software, and the machine itself. Of course there was also the patient, victims of the accidents, but they were only victims, receivers of the problem. (Some victims attempted to provide feedback though.)
The issue turned out to be a simple multitasking problem, where experienced operators who were fast on the keyboard used to control the machine could cause the software to enter illegal states. I.e. a software engineering problem.
Today, however, complexity is increasing. To me at least, it seems our industry has crossed a boundary: The number of components that work together in complex ways to realize important business functionality has grown significantly. While counting can never tell the truth, it is worrying that modern systems can be comprised of 10’s, even 100’s of components that are assumed to work seamlessly together on sunny days. Often no-one knows what will happen when the sun goes away and rain comes, so to speak.
Systemic risk in IT systems is no longer something that can be excluded from risk analysis and managed technologically.
So why are we not spending more time testing based on systemic risk analyses?
Explore the whole picture
Some readers might think of the Cynefin framework, and yes, I think it certainly appears promising as Cynefin provides a thought framework for understanding complex and complicated systems.
I went by a different path, however, when I explored the situation: I looked at safety engineering and mechanical safety analysis. I can recommend two books in particular:
- Normal Accidents by sociologist Charles Perrow
- Human Error by psychologist James Reason
(In a later blog post, I’ll come back to what a found in these two books, but you can get a peek of it in the presentation recording at the bottom of this blog. I’ll certainly also be coming back to Cynefin as it seems promising.)
But there might be a bigger problem to address too as it seems there is a management problem worsening the situation: Testers very often do not receive sufficient freedom to test the “big picture”.
When have you last heard of a tester tasked to test a product in complete integration with real users for a long time? I’d like to hear about examples of it, as very often, when I talk to people, I hear about product owners, project managers or C-level managers deciding and controlling tightly what should be tested.
And risk reporting to the rest of the organization is filtered through these levels.
Focus is too often only on going live on time, on schedule, no matter what. Too seldomly on qualifying complex or systemic risks.
I think testers should be tasked to explore the dynamics of the product in contexts resembling the real world.
Speaking about testing in a Black Swan Domain
I spoke about this first time at the first Let’s Test conference in 2012 in Stockholm (slides – PDF) and second time in September 2016 at the New Testing Conference during “Reinventing Testers Week” in New York. Scroll down to see a recording of the latter presentation.
The feedback I received at those two events has confirmed to me that this is a subject that needs exploration. Our craft can be advanced to go below the functional, performance, or usability perspectives. New models in testing, heuristics and even types of testing strategies can be developed, I think.
Going alone can be difficult, and I’m therefore extremely grateful to have received moral backing from both Michael Bolton and Fiona Charles. Additionally, Anna Royzman has agreed to co-host a peer workshop on the subject in New York with me in connection with her May conference.
I find New York an interesting place for a few reasons:
- It is where I talked about the subject last time.
- Nassim Taleb lives in New York.
- It is a very big city, so big, that it’s even difficult to comprehend for someone like me who comes from a little county with less than half the population of it. New York is seems a complex system beyond imagination.
- It is the world’s financial centre, and some of the systems running that are extremely complex. I try not to think about what types of systemic risk they manage on a daily basis.
If you are interested, feel you have something to contribute with, have time, etc, it would be great to see you at the first WOTBLACK: Workshop on Testing in Black Swan Domains on Sunday April 30th in New York.
The objective?
Advance the testing craft by co-developing and sharing models, heuristics, and strategies.
Write me an e-mail if you’re interested in participating, or ping me on twitter if you feel you have something to share now or wish to start a discussion about the subject.
Christmas is almost over and while I am still having holiday with the family, I’m beginning to think a bit about testing again.
I am passionate about software testing.
There is a lot of talk about passion, but do we know what passion is?
The word shares roots with the greek ‘pathos’, which is one of the three key components of persuasion in rhetoric. The other two are ethos and logos.
Good communication should be fact based (logos) and serve a common greater good (ethos), but passion adds something important to communication.
The passionate lecturer
I remember two math lecturers from university. One taught analytical algebra, the other graph theory and combinatorics.
Both were personalities of the type you would notice if you saw them in the street, but if someone would then whisper to you: “He is an associate professor in mathemathics”, you would exclaim “ah!” and understand exactly what you were seeing 🙂
Their style of lecturing was very different, however.
Every lecture in graph-theory and combinatorics was unique. It seemed the lecturer literally reinvented what he was lecturing while he was doing it. He was not particularly organised in his teaching, sometimes he would even forget the subject, and divert off a wrong ‘graph’ (sic!). But he had passion for the subjects, and that showed. The lectures were often very engaging and fascinating.
The other lecturer prepared his lectures to perfection: He always started on the exact minute putting his chalk to the board in the top left corner of the first of the six large black boards in the auditorium, and by the end of the 90th minute, he would finish writing formula in the last available spot of the lower right corner of the last board. He repeated that time after time. A fascinating performance. But there was a problem, as he had obviously lost passion for the subject he was teaching. I felt bored to death during his lectures, and I am not sure I ever passed that exam.
Some testers are passionate about what they do, others try to be perfect. I always prefer passion over perfection.
Suffering by Passion
Passion is one of those tacit capabilities we know by heart, but will probably never be able to code, teach to a neural network, or explain to someone who has never experienced it.
The word has an interesting record in the Douglas Harper online etymology dictionary. Apparantly, passion used to be a kind of suffering:
Passion: late 12c., “sufferings of Christ on the Cross,” from Old French passion “Christ’s passion, physical suffering” (10c.), from Late Latin passionem (nominative passio) “suffering, enduring,” from past participle stem of Latin pati “to suffer, endure,” possibly from PIE root *pe(i)- “to hurt” (see fiend).
The article even goes on linking passion to sufferings of martyrs.
Let me confess now: While I am very passionate about good testing, I am not going to become a testing martyr.
Words change meaning over time and passion is certainly a word that has become more of a daily language term than it probably was back in the late 12th century.
Today, linking passion to sufferings, even physical sufferings, may seem out context.
However, it reminds us that passion does involve trading in some things that I like too: Staying relaxed, calm and cool, for example.
I am neither of those things when I am feeling passionate.
Passion seems to be a kind of double-edged sword.
Passion-Fatigue
I am always more tired after working passionately on a testing problem than when I’m doing more trivial things in my job: E.g. diligently replying to e-mails, writing factual test reports, checking out plans and schedules.
Could there be something called passion-fatigue? I think so, and when passion is a driver in daily work life, relaxation and recharging is important to stay healthy, sane, and well in the longer run..
The need for Hygge
Now that Christmas has just passed, but I am still enjoying days of holiday with the family, it seems right to mention ‘hygge’ (pronounced “hyk-ge”).
Hygge is Danish for relaxing with others, a good book or in other nice ways.
Hygge is difficult to define. In that way it’s similar to passion, except opposite: Relaxing, calming and mentally soothing.
A day with hygge could be so relaxing and good that it deserve finishing off with a good tequila, scotch, or another good drink of your preference 🙂
What’s interesting here is that hygge seems to be a good cure for passion-fatigue. Hygge creates space for passion.
And this is exactly what ‘Julehygge’ is about: Getting away from daily life, relaxing with family and friends, and recharging.
Is “hygge” becoming a global fashion trend? The New York Times had an article on the fashion of hygge a few days ago: Move Over, Marie Kondo: Make Room for the Hygge Hordes
Playful Software Testing
I met with and enjoyed a very good conversation with Jessica Ingrassellino in New York back in September. Jessica presented a workshop on playful testing during the Reinventing Testers Week (I presented at the conference about “Testing in a Black Swan Domain” which, unfortunately, I have not had time to write about yet).
We talked mostly about philosophy.
Jessica is quite a multi-talent: Plays the violin virtously, is an educated music teacher, has switched career to testing, taught herself Python, authored a book on Python programming for kids, and is teaching Python classes at a local community college, as well as music classes.
She has a vision of making testing playful and fun.
Structured work govern testing in professional settings, work which has nothing to do with play. So why is play important?
Jessica puts it this way:
When the power of play is unleashed in software testing, interesting things happen: The quality of the testing performance becomes noticeably better, and the outcomes of it too. This results in better software systems, higher product quality.
I have a product engineering background and play is important for me too. Engineers have methods, calculations, and procedures, but great engineers know that good solutions to problems are not found by orderly, rational processes. Good solutions depend on creativity and play.
Friday December 9th, I met with Mathias Poulsen in Copenhagen. Mathias is the founder of CounterPlay, a yearly conference and festival on serious play in Aarhus, the second largest city in Denmark.
About three years ago, Mathias got the idea for the conference.
In the first year, 2014, it was an immediate success with more than 20 talks and workshops in 3 tracks on “Playful Culture, Playful Learning, and Playful Business”, and more than 150 participants. This year (2016), the conference had 50 scheduled sessions: keynotes, talks, workshops, mini-concerts and open sessions.
Mathias explains (about 0:30 into the video):
Counterplay is basically an attempt to explore play and being playful across all kinds of domains and areas in society. We are trying to build a community of playful people around the world to figure out, what does it mean to be playful and why do we think it is beneficial?
Processional IT has so far not been represented at the conference, Mathias told me. I found that a bit surprising, as at the moment almost everything in IT seems to be buzzing with concepts promising joy and fun – play.
Sometimes, however, there is an undertone to all the joy. Agile and DevOps have become popular concepts even in large corporations, and to me, both strive to combine productivity with playfulness. That is good.
But is the switch to Agile always done in order to pass power to developers and testers, allowing them to playfully perform, build and test better solutions? No, not always.
Play facilitate change and breaking of unhelpful patterns, but sometimes play is mostly a cover for micromanagement. There is a word for this: In a recent blog post, Mathias talks about playwashing:
Playwashing describes the situation where a company or organization spends more time and money claiming to be “playful” through advertising and marketing than actually implementing strategies and business practices that cultivate a playful culture in said organization.
A question is therefore how we genuinely support play? Are there methods or processes that better accommodate playfulness at work?
I believe there is. Processes need to leave space for exploring context, knowledge sharing and actual interaction with customers, stakeholders and team members.
But processes or methods will not do the job alone. In fact, putting play under the examination of psychology or cognitive sciences will never be able to grasp what play really is.
Play is more like music and poetry, where ideas based on assumptions about order, rational choice, and intention cannot explain anything.
Philosophy and especially the dialectical exploration of what it means being a playful human is much better at embracing what play means to us and how to support it.
Jessica and I are working on a workshop about playful and artful testing. It will combine ideas of playful testing with philosophy.
We are certain that breaking out of patterns will help testers, and breaking out of our patterns, participating in a conference which is fully devoted to play will teach us a lot.