{"id":160,"date":"2010-12-31T08:43:06","date_gmt":"2010-12-31T06:43:06","guid":{"rendered":"http:\/\/asymaps.wordpress.com\/?p=160"},"modified":"2010-12-31T08:43:06","modified_gmt":"2010-12-31T06:43:06","slug":"skypes-first-black-swan","status":"publish","type":"post","link":"https:\/\/www.asym.dk\/index.php\/2010\/12\/31\/skypes-first-black-swan\/","title":{"rendered":"Skype&#039;s first Black Swan"},"content":{"rendered":"<p>Skype went out for about 24 hours just before Christmas. Skype management is embarrassed and promise this will not happen again, which of course is true. The particuar situation is now prevented. However, the question is: Will Skype never be out again?<br \/>\nSkype&#8217;s CIO explains what went wrong in this <a title=\"Post mortem on the Skype outage\" href=\"http:\/\/blogs.skype.com\/en\/2010\/12\/cio_update.html?cm_mmc=PXBL\">post mortem<\/a>\u00a0of what I&#8217;d\u00a0call Skype&#8217;s first Black Swan.<br \/>\nTo summarise, it was a high load on Skypes infrastructure which triggered a bug in a certain version of the Windows client for Skype which again increased the load on the infrastructure, thereby rapidly taking the entire network down and making it almost impossible to get it up again.<br \/>\nThe bug was always there of course, and it was probably already known internally at Skype. It is also possible that the risk of server overloading\u00a0and\u00a0service degradation\u00a0had been identified, but obviously not in the context of making a complete system crash a likely possibility (if so, they would have prevented it). Further, I&#8217;m quite certain that the risk of the client bug affecting the server load had not been identified. Humans are positive thinkers, as <a href=\"http:\/\/en.wikipedia.org\/wiki\/Nassim_Nicholas_Taleb\">Taleb<\/a> documents in his book: <a href=\"http:\/\/en.wikipedia.org\/wiki\/The_Black_Swan:_The_Impact_of_the_Highly_Improbable\">The Black Swan: The Impact of the Highly Improbable<\/a>.<br \/>\nSo Skype&#8217;s challenge now is to prevent outages in general by identifying and preventing Black Swans in general. This will involve a cross organisational backwards thinking process, which the innovation driven company has probably not been focusing on at all until now. (I may actually be wrong here, Janus Friis, one of the\u00a0founders of Skype,\u00a0used to work in a support function of an ISP so he may have been involved in preventing problems, but generally, startups think very positively, and even if Skype has millions of users, it&#8217;s still a very young company &#8211; a startup.)<br \/>\nOne may think that this is going to be extremely expensive for Skype since they will have to predict every possible way their system can go wrong. It does not have to be that expensive, although it will cost money.<br \/>\nWhen securing a nuclear facility, engineers don&#8217;t have to analyze every possible way a disaster can happen, instead\u00a0they think: How can we prevent failure at every level? This is what I mean with &#8220;backwards thinking&#8221; &#8211; start assuming something is failing, then work backwards identifying ways to prevent it becoming worse.<br \/>\nThis is done on multiple levels: On component level, asking what can go wrong here and how can we prevent a bug or incident\u00a0from affecting the rest of the system? And on system level, assuming that disaster is happning, how can we prevent it from developing.<br \/>\nI assume that&#8217;s what Skype is doing now.<br \/>\n<em>Wishing everyone a happy 2011!<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Skype went out for about 24 hours just before Christmas. Skype management is embarrassed and promise this will not happen again, which of course is true. The particuar situation is now prevented. However, the question is: Will Skype never be out again? Skype&#8217;s CIO explains what went wrong in this post mortem\u00a0of what I&#8217;d\u00a0call Skype&#8217;s [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[15,68,89],"_links":{"self":[{"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/posts\/160"}],"collection":[{"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/comments?post=160"}],"version-history":[{"count":0,"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/posts\/160\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/media?parent=160"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/categories?post=160"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.asym.dk\/index.php\/wp-json\/wp\/v2\/tags?post=160"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}