Monday, July 13, 2009

When It Messes Up

Messing around with infrastructure is sometimes a bit scary. Making a config change to the defect tracking system; updating (or patching) the mail server; changing the way a feature works in the core of your test infrastructure: all these are very public failures, if you don't do them right.

Of course, you try it beforehand. You create a test system, or you have a backup mail server you upgrade first. But it's still a bit scary. After all, you might mess up. And here's a hint: if you work on this kind of thing, one day it will go wrong. It might or might not even be your fault, but it will go wrong.

How you deal with it after it's messed up is what's important.

So there are two things you need to do:
  1. Have a backup plan before you start.
  2. Communicate well.
First of all, did you notice that neither of those was, "don't try?" That's deliberate - you need to change and you need to try. Stagnation will eventually not fit your needs, whether it's a mail server or a test infrastructure.

So, then what do you do?

The backup plan is very simple. The show must go on. So have a way to back up if you need to. Make sure you know how to back out your changes. Install a backup mail server and migrate all the data to it before you attempt changes on your primary mail server. Back up your config files so you can get the system back to the way it was before. That way, if you get in real trouble you can back out.

Second, and perhaps more important is communicating what's going on. Before you start, make sure you tell anyone who might be affected that you're going to be making a change. This could be a downtime notification, or just a "heads up" that we're changing X Y Z to improve A B C. Then, if it goes wrong, tell people. Don't hide it and leave them wondering why the downtime is extending. Acknowledge that there are some issues with the change and you're working on them. Keep the updates frequent until the problem is resolved (and you've finished or you've backed out).

So please. Make infrastructure changes - keep it up to date, make it better, do what you need to do. Just remember that when you make a change have a plan for it to go wrong. And talk about it. It'll be better in the end than if you didn't do it at all.

No comments:

Post a Comment