Monday, July 28, 2008

Dig As Far As You Should

We've been running a lot of different tests with a lot of different third-party programs lately. One of these is fairly straightforward: you set up the client program, set up your system, run the right Windows task, and off it goes. It comes back about 14 hours later with a report.

We've done this test a couple of times, and everything's going swimmingly except for one thing: performance is lower than we'd expect. Oh dear.  So we start digging in, and we find something. The first time, the test was running at a time when the network it was running on is absolutely slammed. Other performance tests run under these conditions are 3x slower than they are when the network is less busy. 

Hooray! We've found the problem!

We reran the test at a different time, when the network utilization was much lower. And.... it was still too slow. Oh dear. So we start digging in, and we find something. This time, about two hours into the test, one of the machines in our system had a hardware error and failed.

Hooray! We've found the problem!

Wait a minute. This isn't a particularly cheap test. We can really only run it once every day or two. So let's see keep digging and just check that there's nothing else going on. Lo and behold, we found that even discounting for that piece of hardware, it was still too slow.  Once we started digging again, we found some more things that needed tweaking (more RAM on the client box to keep it going, in this case).

There are two morals to this story: 

First, if a test is not cheap, look beyond the first thing you find before you go trying it again. You'll still come out ahead, timewise. The longer the test, the more time you can (and should!) spend really understanding what went on between test runs.

Second, don't ever assume that that the first thing you find is the right thing to find, or the only thing you will find. It's just the first place you looked.

No comments:

Post a Comment