Friday, January 30, 2009

Control Risk Through Exposure

We've been working on a new hardware platform.

This is awesome in a number of ways:
  • New hardware can help us increase our performance
  • Decreased end-of-life worries
  • New... and shiny!... toys are always fun.
This is scary in a number of ways:
  • checking to see if hardware actually works - reboots consistently, doesn't have massive amounts of dropped packets, doesn't freeze, etc. - takes weeks and can't really be shortened
  • confirming we can get enough hardware to keep up with our development and testing needs (funny, no matter how many we get, we can always effectively use more!)
  • if something isn't right, it won't be fixed quickly - weeks or months rather than hours or days - and this will blow the whole schedule out of the water
So, we're in a situation where we have a lot of promise and a lot of risk.

Don't hit that panic button just yet.

Let's do some risk analysis. And let's write it down. It doesn't have to be much. Just let your imagination run lose and write down all the things you can think of that could go wrong. The sky's the limit - everything from "it might be too heavy and the floor in the lab might break" to "there may be some horrible subtle disk firmware bug that causes systems to randomly freeze".

And then for each one, write down what you're going to do when it comes true. What software will you write? What processes will you put in place? What checks will you do? What backup vendor will you contact?

And then for each one, write down how you are going to prevent it. What tests will you run? What agreements will you make with a vendor? What stress tools will you use?

Sometimes the best way to stave off the risk is to simply write it all down, and then write down exactly how you're going to conquer it.

You win.

No comments:

Post a Comment