Friday, July 11, 2008

How Much Is Enough?

Certain kinds of tests are non-determinative.  That is, you cannot tell the result except to say that it's within acceptable levels. Performance tests are a good example of this: 10.5 MB/sec and 10.6 MB/sec may both be valid results if anything over 10 MB/sec is valid. Race conditions are another good example. If you don't see the problem, how do you know if you've fixed it or if you just haven't triggered it yet?

In both of these cases, there is no simple assert true on something. Instead, you have to assert within a range, or assert outside of a range. There are two interesting things to consider about this kind of test:
  • What is the acceptable variance in results?
  • How do you assert that the probability of something happening is vanishingly small?
And here's where statistics meets QA.  Let's take these two example problems one at a time:

Proving That Performance Is Better Than X
Assume we've already done our background work and we know that on a given system in given conditions with known data patterns, performance of 10 MB/sec is the minimum acceptable level. Anything over that is good.

So we write a performance test. It sets up the system, writes in that data pattern, and checks throughput. If throughput is greater than 10 MB/sec, we pass! Except.... Passing once isn't good enough. There are a lot of things that could change: network utilization, pre-existing disk usage, etc. Our performance over time is going to resemble a bell curve, and we need the entire bell curve to be over 10 MB/sec. So, we have to run our test enough to prove that the bell curve is over our minimum performance requirement.

Hold that thought for a minute while we talk about our other example.

Proving That a Race Condition Is Fixed
We had a race condition (oops!). Now we've fixed it! We can't just run a test once; if we didn't see it, then we don't know whether it was fixed or whether we simply didn't hit the condition. So, how do we prove to our satisfaction that it's fixed?

Again, we're looking at a situation where we need to write a test that exercises the race. Then we run it in a loop, figure out how often it fails on the old code, and then run it on the new code. Basically, we can't say we've proved the fix, but we can run the test enough times such that the race has a statistically tiny chance of still being present. And now we've looped back to our bell curve.

The Underlying Method
The underlying method is the same in both examples. You know that your responses in these cases will have some variance, and that variance is likely to be a fairly standard bell curve.

So, you create a test that you can run enough to create your bell curve. Then you plot your results and show that the expected value (i.e., odds of having a race condition, or minimum performance) is far enough out on the tail of the bell curve that you're satisfied.

The good news about this method is that it's a fairly straightforward thing to tweak. If your tolerance for risk is higher, you can run the test less - you'll just be at a different confidence level. If your tolerance for risk is very low, you can run the test more - and that will increase your confidence level.

So, while assert true is a simple and comforting thing, when you're faced with a non-determinative test, embrace the bell curve!

No comments:

Post a Comment