– What are you doing?
– I’m turning up the heat. To see if anything catches fire. It’s an old CID trick.
Detective Chief Inspector Jack Frost in the TV series “A Touch of Frost”
In the follow up post after my presentation at Let’s Test, I concluded that the next step in my work on black swan testing should be on operationalisation. This post introduces a a testing heuristic which I’ve successfully used myself. I call it “Turning up the Heat”
The basic idea is that odd things happen when a system is put under pressure, and that we can learn stuff about the system by doing so.
There are basically three ways to do it:
- Load testing, i.e. putting the system under exceptionally heavy load for a period
- Soak testing, i.e. loading the system lightly for a long time
- Destructive testing, e.g. crippling subsystems by disabling or forcing malfunctioning
They can of course be combined.
In the quote above, heat is a metaphor for the psychological pressure Jack Frost is subjecting suspected murderers to, and here ‘load’ is also a metaphor for any environment changes that can have an impact on the way the system works. It could be a high data load, e.g. 10 or 100 times the normal rate of requests to a service, but it could also be something quite different, e.g. a 10 degrees higher than normal temperature in the server room. The black swan domain is the systems domain, so any component in the complete system is a valid target for putting load on.
Likewise, ‘crippling’ is a metaphor here for doing something to a subsystem that will cause the subsystem to work differently from normal: It could be as simple as just removing a component, or bugs could be deliberately introduced. In fact the target doesn’t have to be one single subsystem: Changing the same thing in several subsystems, e.g. compile all software modules with a buggy version of some widely used library, can be a simple and efficient apporach.
As testers, we often don’t have direct access to the tools needed to load and cripple subsystems and complete systems. I find that in order to practically “turn up the heat”, I often have to rely on the help of others, e.g. developers and system administrators. This leaves me with a communication and cooperation challenge which should not be taken lightly.
There are no right or wrong approaces in testing, but there’s a risk of wasting time: I.e. spending time on preparations and never getting down to the actual testing – thereby not learning. That’s one of the reasons I usually prefer simple techniques rather than planned approaches.
In one project I’ve worked on, the testing tool we used had a simple load testing function. I managed to crash the test environment completely by just running the tool off my own pc, and this eventually gave us some important information about a vulnerable subsystem (the root cause of the crash was not what anyone expected when the system stopped responeding). I spent less than an hour on this test – though getting the problem diagnosed and the environment recovered involved somewhat more work afterwards by the system admins, I’m afraid.
This actually points me to another point related to cooperation: While load testing or soak testing is normally non-destructive, only do it if the project can afford repairing what might be affected by possible malfunctions of the system. This could include other testers not being able to complete other testing activities!