Categories
Blog posts in English

Why you should do your job better than you have to

Software testers evaluate quality in order to help others make descisions to improve quality. But it is not up to us to assure quality.
Projects need a culture in which people care for quality and worry about risk, i.e. threats to quality.
Astronaut and first man on the moon Neil Armstrong talked about reliability of components in the space craft in the same interview I quoted from in my last post:

Each of the components of our hardware were designed to certain reliability specifications, and far the majority, to my recollection, had a reliability requirement of 0.99996, which means that you have four failures in 100,000 operations. I’ve been told that if every component met its reliability specifications precisely, that a typical Apollo flight would have about 1,000 separate identifiable failures. In fact, we had more like 150 failures per flight, substantially better than statistical methods would tell you that you might have.

Neil Armstrong not only made it to the moon, he even made it back to Earth. The whole Apollo programme had to deal very carefully with the chance that things would not work as intended in order to make that happen.
In hardware design, avoiding failure depends on hardware not failing. To manage the risk of failure, engineers work with reliability requirements, e.g. in the form of a required MTBF – mean time between failure – for individual components. Components are tested to estimate their reliability in the real system, and a key part of reliability management is then to tediously add all the estimated relibility figures together to get an indication of the reliability of the whole system: In this case a rocket and space craft designed to fly men to the moon and bring them safely back.
But no matter how carefully the calculations and estimations are done, it will always end out with an estimate. There will be surprises.
The Apollo programme turned out to perform better than expected. Why?
When you build a system, it could be an it-system or a space craft, then how do you ensure that things work as intended? Following good engineering practices is always a good idea, but relying on them is not enough. It takes more.
Armstrong goes on in the interview (emphasized by me):

I can only attribute that to the fact that every guy in the project, every guy at the bench building something, every assembler, every inspector, every guy that’s setting up the tests, cranking the torque wrench, and so on, is saying, man or woman, “If anything goes wrong here, it’s not going to be my fault, because my part is going to be better than I have to make it.” And when you have hundreds of thousands of people all doing their job a little better than they have to, you get an improvement in performance. And that’s the only reason we could have pulled this whole thing off.