We have a suite of automated tests. They basically do the following:
- reserve some machines
- engage in prep work (configuring the system, creating a volume, writing some data)
- perform an action (this is the test itself)
- check the assertion (or time the action, or whatever we're trying to look for here)
- tear down
- release the machines
These run overnight, and in the morning we have a list of tests and their results.
Most of the time, this works great. We walk in, take a look at the failures, and go on our merry way (logging bugs, cheering our successes, etc).
Some mornings it's different. Every once in a while, someone breaks a really fundamental thing and we walk in to hundreds or thousands of test failures. This is a really rare event, but we're all humans here, and it happens.
What's happening as this occurs is that someone broke something, typically in setup or teardown, and it affected a whole lot of tests. For example, as part of setup, almost all of our tests create virtual interfaces. When someone breaks the virtual interface utility, every single test is going to fail. One of those failures is a direct test of virtual interface creation (yes, we test our test infrastructure, so there really is a test called test_virtualInterfaceUtil). The rest of the failures are innocent victims.
It really sucks going through all the innocent victims.
I call this a proposal because I haven't actually tried it.
I would like to make a "dependency tree" for the tests we run. Basically, this is something that says "if test X fails, test Y is going to fail, so don't bother to run it". The idea is that we would run tests that produce real failures and not create innocent victims.
There are several gotchas with this:
- You may mask other failures. A test that would have failed prior to the dependency point, well, now you won't run it. A missed chance!
- Dependency detection needs to be managed. I'm not sure how to detect dependencies automatically, and manual detection is sort of a pain to maintain. I think there's probably something here by checking calls to failed libraries, but I haven't completely thought it through.
- A fix it is implied. When 1,000 out of 20,000 tests fail in a night, it's really obvious that something needs to be fixed right quick. When 1 out of 24 tests fail in a night, it's a little less obvious. After all, there was only one failure! (And it's easy to not look at the bottom number). I think this one is pretty easy to overcome just by making each bug also note, "because of this bug, X tests did not run" and triggering our automatic notifications.
This is still in the brainstorming stages, but it's something I'd like to keep poking at. If anyone's doing anything similar, I'd love to hear the war stories!