Friday, October 24, 2008

Wabbit Hunting

There are two kinds of escalations that come in from support: those where the issue is still ongoing and those where the issue is no longer occurring but we'd like to understand what happened so we can fix whatever first caused the problem.

I like to think of these as issues that are alive and issues that are dead (and just need a postmortem).

Issues that are dead are in their own way more simple. You take the logs, the issue description, and any other information that has been gathered, you apply your 5 whys or other form of analysis, and you state what you believe occurred. Since the issue isn't ongoing, proof is difficult to come by; you're looking for the most likely cause and what you can do to prevent recurrence of that most likely scenario.

Issues that are alive - that are still ongoing - are different. Now we're wabbit hunting.

The issue is still occurring. Either it hasn't been fixed at all, or recovery has been attempted and the problem has happened again. Your goal here is different; it's not about finding ultimate cause now. It's about getting the customer running again.

To be sure, a lot of your analysis techniques still apply, but don't be afraid to start fixing. This isn't the time for a leisurely analysis. It's a time to balance analysis with action. Got a problem? Great, get that problem to stop. Then see if there's another problem. As long as you're nondestructive and you're actually looking for the cause of the problem rather than simply hiding it, getting the customer up trumps creating an elegant theory. 

"What could we have done better?" is a question for a dead issue. 
"What can we do now?" is a question for a live issue.

To stretch the analogy: Shoot the wabbit. THEN figure out how it got into your garden.

No comments:

Post a Comment