Friday, May 29, 2009

Work Around the Unreproducible

It happens to the best of us. Some issue comes up, and we see it once or twice, but darned if we can pin it down. Maybe it's hard to reproduce, maybe "that shouldn't happen!". When this happens, and a release is coming up, the question is likely to arise:

What do we do about this issue we don't understand?

Well, you've got choices:
  • Delay the release until you do understand the issue. This could be days, weeks, months. Generally this is untenable in practice.
  • Go without it. We call this the "hope" method!
  • Deal with the effects of it. Find a workaround or a way to handle the effects so that the issue is there but has less effect on the customer. This can be in code or in policy.
Let's look at an example (this is made up, by the way): users who don't have passwords can't use a new "remote logon" feature because that feature depends on SSH and passwordless ssh isn't working for some reason. We don't know why it's not working.

So step back, think of another angle. What is the effect of this bug? Well, some users don't get to use a new feature. There are a few things we could do here: (1) we could just say, "okay, sorry, create a password if you want to use the feature"; or (2) we could force all users to set passwords on their first login after upgrade. Neither of these fixes passwordless ssh. Both of them work around the problem (and don't cause account corruption or anything nasty). In other words, we are working around the unreproducible.

Sometimes the answer to a bug is not to change that portion of code. Keep in mind that your ultimate goal is to make the problem go away without negative repercussions. It's okay to be creative in how you do that, as long as you're sure it's complete.

No comments:

Post a Comment