{"id":736,"date":"2015-05-21T21:09:51","date_gmt":"2015-05-21T19:09:51","guid":{"rendered":"http:\/\/blog.asym.dk\/?p=736"},"modified":"2015-05-21T21:09:51","modified_gmt":"2015-05-21T19:09:51","slug":"are-you-playing-the-russian-roulette-learning-from-failure","status":"publish","type":"post","link":"https:\/\/www.asym.dk\/index.php\/2015\/05\/21\/are-you-playing-the-russian-roulette-learning-from-failure\/","title":{"rendered":"Are you playing the Russian roulette? Learning from failure"},"content":{"rendered":"<p>I think most (if not all?) testers have witnessed situations like this: A new feature of the system put into production, only to crash weeks, days or just hours later.<br \/>\n\u201dWhy didn&#8217;t anybody think of that?!&#8221;<br \/>\nTruth is, quite often, somebody did actually think about the problem, but the issue was not realised, communicated or accepted.<br \/>\nBelow is the story about the space shuttle Challenger accident in 1986.<br \/>\n<b>Disaster&#8230;<\/b><br \/>\nTwentynine years ago, space shuttle Challenger exploded seven minutes into the flight killing the seven astronauts aboard.<br \/>\nTheoretical physicist Richard Feynman was a member of the accident commision. During the hearings he commented that the whole decision making in the shuttle project was \u201da kind of Russian roulette\u201d.<br \/>\nThe analogy is striking. Russian roulette is only played by someone willing to take the risk to die.<br \/>\nI don\u2019t know anyone who deliberately want to play the Russion roulette, so why did they play that game?<br \/>\nFeynman explains: <i>[The Shuttle] flies [with O-ring erosion] and nothing happens. Then it is suggested, therefore, that the risk is no longer so high for the next flights. We can lower our standards a little bit because we got away with it last time&#8230;. You got away with it but it shouldn&#8217;t be done over and over again like that.<\/i><br \/>\nThe problem that caused the explosion was traced down to leaking seals in one of the booster rockets. On this particular launch ambient temperatures were lower than usual and for that reason the seals all failed. The failed seals allowed very hot exhaust gasses to leak out of the rocket combustion chamber, and eventually, these hot gasses ignigted the many thusand litres of higly explosive rocket fuel.<br \/>\nChallenger blew up in a split second. The seven astronauts probably didn&#8217;t realise they were dying before their bodies were torn in pieces.<br \/>\nIt was a horrible tragedy.<br \/>\n<a href=\"http:\/\/science.ksc.nasa.gov\/shuttle\/missions\/51-l\/docs\/rogers-commission\/Chapter-6.txt\" target=\"_blank\" rel=\"noopener noreferrer\">Chapter 6 of the official investigation report<\/a> is titled: <b>\u201dAn accident rooted in history.\u201d<\/b><br \/>\nThe accident was made possible because of consistent misjudgements and systematically ignored issues, poor post flight investigations, and ignored technical reports. The accident was caused because three seals failed on this particular launch, but the problem <strong>was<\/strong> known and the failure was made possible because it was <strong>systematically ignored<\/strong>.<br \/>\n<b>The tester&#8217;s fundamental responsibilites<\/b><br \/>\nAs a tester, I have three fundamental responsibilities:<\/p>\n<ol>\n<li>Perfom the best possible testing in the context<\/li>\n<li>Do the best possible evaluation of what I&#8217;ve found and learnt during testing.\u00a0\u00a0Identify and qualify bugs and product risks.<\/li>\n<li>Do my best to communicate and advocate these bugs and product risks in the\u00a0organisation.<\/li>\n<\/ol>\n<p>The Challenger accident was not caused by a single individual who failed detecting or reporting a problem.<br \/>\nThe accident was made possible by systemic factors, i.e. factors outside the control of any individual in the programme. Eventually, everyone fell into the trap of relying on what seemed to be &#8220;good experience&#8221;. The facts should have been taken seriously.<br \/>\nA root cause analysis should never only identify individual and concrete factors, but also systemic factors which enabled the problem to survive into production.<br \/>\n<b>Chapter 6 of the Challenger report reminds me that, when something goes wrong in production, performing a root cause analysis is a bigger task than just finding out the chain of events that lead to problem.<\/b><br \/>\n<em>Many thanks to Chi Lieu <a href=\"http:\/\/twitter.com\/SomnaRev\" target=\"_blank\" rel=\"noopener noreferrer\">@SomnaRev<\/a> for taking time to comment early drafts of this post.<\/em><br \/>\n<figure id=\"attachment_737\" aria-describedby=\"caption-attachment-737\" style=\"width: 660px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/spaceflight.nasa.gov\/gallery\/images\/shuttle\/sts-51l\/html\/51l-10181.html\"><img loading=\"lazy\" class=\"wp-image-737 size-large\" src=\"https:\/\/asymaps.files.wordpress.com\/2015\/05\/51l-10181.jpg?w=660\" alt=\"Photo of the space shuttle Challenger accident Jan. 28, 1986. Photo credit: NASA\" width=\"660\" height=\"824\" \/><\/a><figcaption id=\"caption-attachment-737\" class=\"wp-caption-text\">Photo of the space shuttle Challenger accident Jan. 28, 1986. Photo credit: NASA<\/figcaption><\/figure><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I think most (if not all?) testers have witnessed situations like this: A new feature of the system put into production, only to crash weeks, days or just hours later. \u201dWhy didn&#8217;t anybody think of that?!&#8221; Truth is, quite often, somebody did actually think about the problem, but the issue was not realised, communicated or [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[14,26,63],"_links":{"self":[{"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/posts\/736"}],"collection":[{"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/comments?post=736"}],"version-history":[{"count":0,"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/posts\/736\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/media?parent=736"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/categories?post=736"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/tags?post=736"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}