Categories
Blog posts in English

Skype's first Black Swan

Skype went out for about 24 hours just before Christmas. Skype management is embarrassed and promise this will not happen again, which of course is true. The particuar situation is now prevented. However, the question is: Will Skype never be out again?
Skype’s CIO explains what went wrong in this post mortem of what I’d call Skype’s first Black Swan.
To summarise, it was a high load on Skypes infrastructure which triggered a bug in a certain version of the Windows client for Skype which again increased the load on the infrastructure, thereby rapidly taking the entire network down and making it almost impossible to get it up again.
The bug was always there of course, and it was probably already known internally at Skype. It is also possible that the risk of server overloading and service degradation had been identified, but obviously not in the context of making a complete system crash a likely possibility (if so, they would have prevented it). Further, I’m quite certain that the risk of the client bug affecting the server load had not been identified. Humans are positive thinkers, as Taleb documents in his book: The Black Swan: The Impact of the Highly Improbable.
So Skype’s challenge now is to prevent outages in general by identifying and preventing Black Swans in general. This will involve a cross organisational backwards thinking process, which the innovation driven company has probably not been focusing on at all until now. (I may actually be wrong here, Janus Friis, one of the founders of Skype, used to work in a support function of an ISP so he may have been involved in preventing problems, but generally, startups think very positively, and even if Skype has millions of users, it’s still a very young company – a startup.)
One may think that this is going to be extremely expensive for Skype since they will have to predict every possible way their system can go wrong. It does not have to be that expensive, although it will cost money.
When securing a nuclear facility, engineers don’t have to analyze every possible way a disaster can happen, instead they think: How can we prevent failure at every level? This is what I mean with “backwards thinking” – start assuming something is failing, then work backwards identifying ways to prevent it becoming worse.
This is done on multiple levels: On component level, asking what can go wrong here and how can we prevent a bug or incident from affecting the rest of the system? And on system level, assuming that disaster is happning, how can we prevent it from developing.
I assume that’s what Skype is doing now.
Wishing everyone a happy 2011!

Categories
Blog posts in English

Bohr on testing

When Niels Bohr and his team of geniuses at his institute in Copenhagen  developed quantum physics, they fundamentally changed science by proving wrong an assumption dating very far back in science: That everything happens for a reason. In science, building and verifying models to predict events based on knowledge of variables is an important activity, and while this is still meaningful in certain situations, quantum mechanics proved that on the microscopic level of atoms and particles, things don’t happen for a reason. Put in another way: You can have effects without cause. In fact, effects don’t have causes as such at this level.
This means that in particle physics, you cannot predict precisely what’s going to happen, even if you know all variables of a system. Well in fact you can, but only if you’re prepared to give up knowing anything about when and where the effect will happen.
This is counterintuitive to our daily understanding of how the world works. But there’s more: According to quantum physics, it is impossible to seperate knowledge of variables of a system from the system itself. The observation of the system is always part of the system, and thus changes the system in an unpredictable way.
If you find this to be confusing, don’t be embarrassed. Even Einstein never accepted this lack of causality.
Bohr was a great scientist, but he was also a great philosopher. He did a lot of thinking about what this lack of causaility and the inseperability of observation from events would teach us about our understanding of nature. On several occasions he pointed out that even on the macroscopic level, we cannot ignore what is happening on the atomic and particle level. First of all because quantum physics did away with causality as a fundamental principle, but also because quantum effects are in fact visible to us in our daily macroscopic life: He used the example of the eye being able to react on stimuli as small as those of a single photon and argued that it is very likely that the entire organism contains other such amplification systems where microscopic events have macroscopic effects. In some of his philosophical essays he points out how psychology and quantum mechanics follow similar patterns of logic.
So does testing. In software testing we are working to find out how a compuster system is working. Computers are algorithmic machines designed in such a way that randomness is eliminated and data can be measured (read) without affecting the data, but the programs are written by humans and are used by humans, so the system in which the computer program is used is both complex and inherently unpredictable.
We’re also affecting what we’re testing. Not by affecting the computer system itself, but by affecting the development of the software by discovering facts about the system and how it works in relation to other systems and the users.
In some schools of software testing, the activity is reduced to a predictable one: Some advocate having “a single point of truth” about what is going to be developed in an iteration, and that tests should verify that implementation is correct – nothing more. They beleive that it is possible to assemble “all information” about a system before development starts, and that any information not present is not a requirement and as such should not be part of the delivery.
That is an incorrect approach to software engineering and to testing in particular. Testing is much more than verification of implementation, and the results of testing are as unpredictable as the development process itself is. We must also remember that it is fundamentally impossible to collect all requirements about a product: We can increase the probability of getting a well working system by collecting as much information as possible about how the system should work and how it is actually working (by testing), and comparing the two, but the information will always be fundamentally incomplete.
Fortunately we’re not stupid. It is consistent with quantum physics.
Studying the fundamental mechanisms of nature can lead to a better understanding of what we are working with as software engineers and as software testers in particular.

My son Jens at Tisvilde beach, where Niels Bohr spent a lot of time with friends, familiy and physicists

Categories
Blog posts in English

Finding the perfects

Friend and tester colleague Jesper Ottosen participated in what appeared to be a great event and discussion at EuroStar 2010: The rebel alliance night (link to Shmuel Gershon’s blog with video recordings of the talks), where he spoke about whether we as testers can start looking for more than defects. What if we started looking for the perfects?
I like the idea: Is testing really only about finding problems? It can be depressing to be the one always to tell the bad news (especially when there is a lot of bad news or the bad news are not really welcome). Do we testers really have to be worried all the time? If we start communicating perfects too, will our careers not get both better and more successful?
I see a problem, though. Looking for good things will be in conflict with the very mindset of testing. Programming is a creative process where the programmer creates something new and unique. He does it to solve a problem and he does it in the assumption that it will solve the problem. If he starts out assuming that it won’t work, he will be psychologically blocking his creativity and he will probably not perform well.
As a tester, I look at software with the reverse assumtion: I assume that it will not work. This assumption is stimulating my creativity to find the bugs because I will get ideas of where they’re hiding.
With that assumption, I just can’t be successful looking for good things!
That said, however, I do beleive that we sometimes need to be positive, especially to satisfy some managers and programmers. They’re used to hear bad news from us and some people can’t take that. Switching for a moment to looking for “perfects” might actually work very well in this respect. Just don’t forget that we’re doing it for them, not to do our job.
And don’t forget that it can only be for a while: We have to think negatively to be successful. We make a difference when we find the obvious problems with the product: The problems that will cause severe dissatisfaction among users and managers if they slip into product. We’re a great help to our clients because we prevent bugs by finding them before the users!
Here’s Jesper at EuroStar 2010:
[youtube=http://www.youtube.com/watch?v=wB_N-TZPde8&fs=1&hl=da_DK&color1=0x5d1719&color2=0xcd311b]

Categories
Blog posts in English

Based on evidence…

I’m very interested in shool education and I’m chairing the board of the local school. My particular field of interest is education for children with learning disabilities. This is because two of my boys have ADHD with reading disabilities.
We’re looking for new ways to educate children with special needs here in Denmark, and the trend is to include them in the normal educational environment – not put them in special schools. There are two reasons to do so: One is that it’s cheaper, the other that the effect of special education is statistically not very good.
I’m regularly discussing the dilemmas of special education with local politician and social sciences professor Bent Greve, and lately he and I had a short e-mail discussion about whether education should be evidence based. This is also a public debate, similar in nature to the debate taking place in the testing community.
Bent Greve explained me why he, as a politician, requires education to be evidence based: He needs to know that the practices which are being applied actually works. I suppose it is about supporting his descision making, but he pointed out that it is also in the interest of those they’re trying to help. It can be done in health care, so he beleives it can be done in other areas too. However, it should never be an excuse for the individual professionals’ responsibility for what he or she is doing – the evidence should support them in making the right decisions.
This point of view is sympathetic if you’re engaged in politics or managing a company: You want your workers to do work that works. Not work that doesn’t work. Right?
On the other hand is the point of view of the professional, who uses his talent and creativity to research for and find solutions. Teachers generally argue against the focus on evidence and find that it limits their freedom and makes them less good teachers. I also argued with Bent that, sometimes the “right descision” turns out to be the wrong descision. For example: If you base your teaching on evidence of what works in general, we can be certain that there will be between 5 and 20% of the pupils who will not learn anything. So what should we do for them?
Bent Greve responded intelligently: Evaluation and evidence cannot be left alone. Continous experimentation and development is needed, as nothing can be said to be final truths.
I agree.
As a tester craftsman I follow patterns in my work which work for me most of the time, but sometimes I find that they don’t. If I test and don’t find any bugs, I feel dissatisfied and confused. I think I may have used the wrong pattern, but I’m in jeopardy and I don’t know what to do since my pattern failed. Eventually I may have to give up, or I may discover a pattern that works within the time that I have to test.
Managers don’t find bugs, testers do. Politicians don’t educate pupils, teachers do. So who should we trust? I’m not going to answer this question since it’s absurd: There would not have been schools if there hadn’t been politicians and there would not have been any software companies if there were no managers. But I think everyone can agree that by the end of the day, the thing that really matters is that we find as many of those annoying bugs as possible before the product is released, and that the pupils learn the most from attending school. Right?
Correct. But sometimes we find that the patterns we’re following in our worklife (whether that’s teaching or testing) are not working. Then what? We have to stop and think. Something is wrong. How do we progress from here. This is where the most important rule is: Don’t apply the same pattern again – try something different!
PS: I was not at Eurostar 2010 today so did not hear Stuart Reid’s keynote. I did, however follow some of the noise it caused on Twitter!

Copyright (C) Anders Dinsen
6 year old Troels learning by playing