Basically, the issue is that humans tend to be optimistic (how's that for a huge generalization!). So when I'm sitting down to estimate a test, I break it down and I look at all the things I'm going to have to do. Usually this includes things like data generation, boundary analysis, component interaction analysis, test code modification and creation, actually running the tests, writing up results, etc. Then I just add it all up. Hooray! Test estimate complete!
Because I haven't accounted for something. I don't know what it is (or I would have accounted for it), but I've definitely missed something. Maybe on one story it's that the data generation takes a lot longer than I thought. Maybe on another story it's some subtlety about how two components interact that I simply never anticipated. Maybe it's a problem in my translation from work time to calendar time (who'd have thought it took a week to find 4 hours for doing this test?). Either way, I find that these things usually take me longer.
So I use history as my guide.
We have a record of all the stories we've done, and of how long we actually spent on them. It's all right there in our wiki (or your Jira instance, or your Test Director instance, or whatever). So we can use it. Let's do a little data mining.
Here's how we get some information:
- Estimate your current stories. Do this with whatever model you like, just get to the "this will take X" point.
- Go through past stories and group them into "big" "medium" and "small". This is a grouping of test effort here, and it reflects how hard it looked to test. Your gut feel applies here, and you're welcome to include other metrics (e.g., "that team does a ton of refactoring, so their stories are always medium or larger"). Be sure to do this over as long a period of time as possible, so you can flush out any really weird circumstances.
- For each story, determine how long it actually took you to test. Use calendar time here: from the day you started until the day you stopped. If you had it in test multiple times (test it, failed, retest it), count them all up together.
- Do a calendar-to-estimate ratio. Let's say your calendar was 8 days, and your estimate was 8 hours. Congratulations, one of your estimated hours is a real world day in this case. Calculate this into the form 1-hour-to-?? (e.g., 1 hour is 1 day).
- Average the estimates within the buckets. At this point you have a list of calendar-to-estimate ratios. They probably look something like this: 1h:2h, 1h:1day, 1h:3days, 1h:4h. Now we simply average these. Add 'em all up, divide. If you're feeling fancy, throw out the biggest and smallest outliers. The result is a single ratio: 1 hour estimate time is ??? calendar time. This is your real estimate.
- Adjust your estimates. Now, go back to your current story estimates. If, for example, on medium stories, your average ratio is 1 hour to 4 hours, and your current estimate was 2 hours, then your new estimate is 8 hours.
Is this precise? No. We're playing the law of averages here. We're probably going to get each individual story a bit wrong; over the course of a test cycle with a number of stories, though, the idea is that the little disparities will wash out and our efforts will approach that average. It's a way to bake in risk and slippage without having to explicitly account for it. You're implicitly accounting for risk and slippage.
Give it a shot - let's see how test estimates go. Over time, hopefully we'll see ourselves improve.