Monday, January 24, 2011

Failure Presentation

There are several phases in a bug's life. First, it is introduced. Then it is identified. Then it's fixed. Then verified. It all sounds nice and clean and simple.

It's not. We all know it's a bit messy in the middle steps. Bug identification, in particular can be quirky.

For example, I have a test we'll call "TestFoo". This is a piece of code that I run every night and that spits out a log file with some statistics and some checks. Over the past week, it has failed in the following ways:
  • One night it couldn't SSH to one of the machines involved, starting about 80% through its workflow
  • One night it expected a value X and got a value X-50.
  • One night it expected a value X and got a value X-48..... on a different branch of code.
So we have a bug! Or possibly two bugs.... or maybe even three bugs. It's kind of hard to tell based on the information we have currently.

What we have are failure presentations.

A failure presentation is the thing you see that is a problem. It is a presentation by the software of a state you don't expect. Often it indicates a bug is somewhere underneath there, but the failure itself is not the bug. Instead, the failure is a result of a bug. One bug may have many failure presentations. Several bugs may have the same failure presentation.

For example, the SSH failure I saw is a failure presentation. The underlying bug could be that the machine's hard drive died (aka not really a bug in our software at all). The underlying bug could be that our software crashed and caused a kernel panic so the machine is no longer responding to SSH requests. This is one failure presentation with several possible underlying causes.

The incorrect values I saw (one off by 50, one off by 48), are each failure presentations. They might be the same underlying bug with two slightly different presentations. Or they might be two different bugs.

The point is that until you understand the bug itself, and all the nasty things it might do, then all you can do is notice the way the failures present. The failure presentation can lead you to the bug, but don't be fooled into thinking you're there yet.

I find it particularly useful to distinguish failure presentations from bugs when working with development. I can log a bug, for example that says, "this is behavior blah in TestFoo. It looks like the failure in bug 123, which is TestBar, but has the same general behavior." By doing that, I have pointed out a possible link and helped narrow the bug, but I haven't made assumptions that they truly are the same thing. They simply present in a similar way. It's a relatively small but very useful distinction.

No comments:

Post a Comment