Doing test estimates is hard. In particular, test estimates often come with hedges like, "well, if the code's not too buggy", or "this long each time we have to try it, and we don't know how many times it will have to go back to dev". However, we've come up with a bit of a workaround.
The key to it all is that we don't estimate stories (dev or QA) until there are defined acceptance criteria. (I should note that we use a variant of Extreme Programming as our development process, and one of the things we're very strict about is describing acceptance criteria as we're describing requirements.) Then estimation is done by both development and QA. The development estimate accounts for building the feature and building the tests that prove the feature works; dev is "done" when the automated acceptance tests pass. The QA estimate accounts for any manual tests involved and integration testing (based on the size of the feature and its relationship to the rest of the product).
The biggest problem with this is that there is still a "gut feel" portion of the estimate, based on two things: (1) how buggy is the feature going to be; and (2) how much integration testing is necessary before we are comfortable with the feature?
So, handling of the "how buggy is this feature" is done basically by folding it into the development portion of the estimate. There may be a lot of bugs in other areas of the code (whoops!), but that feature at least won't spend a lot of it's "testing" time waiting for development to fix bugs. This hinges on having thorough acceptance criteria, but practice gets you good at that one.
Handling of the "how much integration testing is necessary" is much more of a dark art, and this one we could use more practice with. Our general criteria is to weight our basic estimate by size of the module, interaction with external programs, and interaction with other modules of the overall product; this tells us the likelihood that the code will wind up back in dev fixing something and how long it's likely to be back in dev. So a change to the UI that doesn't interact with external programs, doesn't interact with other modules in the system, and is a single text item on one screen is considered near zero change of going back to dev and would be back in dev for a few minutes tops. A change in the core data parsing module dependent on a specific external client's data pattern, and that changes the main data parser (a large module) is considered high chance of going back to dev for up to 30% of the original implementation time. But this one still boils down to a gut feel, and we're definitely looking for a better way.