Monday, December 31, 2007

Logging Bugs Politely

I've been testing a portion of our software lately that has a lot of potential error messages - that's right, I've been testing a GUI. One of the things I've been looking for is consistency and quality of error messages.

I found a bunch of error messages that look roughly like this:
CommandFailedException: Could not connect to server The username or password was not correct.

The actual errors were usually a mis-entered (or unspecified but required) field. It's a bug, certainly. But you need to be a little careful about how you log it. Usability issues, when you're not doing formal usability analysis (or if you're doing it after the fact) are a bit of a touchy subject because they can come close to subjective opinions.

Make sure your bug does this:
  • Be clear about what the error message should say. Consult the product manager or designer if necessary.
  • Be considerate about the structure of the system. Don't ask for information that the system doesn't have at that point. 
Make sure your bug avoids this:
  • Be rude, even if it's funny. "Incomprehensible error" will not make you any friends.
  • Point out in excruciating detail what's wrong. Just say what it doesn't do briefly and move on to what it should do.

Friday, December 28, 2007

Always Touch It

Let me state up front that I'm a huge fan of automated tests. I love the idea that while I'm fast asleep every night the computers in my lab are humming away doing all sorts of things to the code and generating a nice little log file for me showing what passed and what failed.


I always test the system as a user would see it.

Automated tests are great and wonderful, but your users aren't automatons. They're humans, or other systems, and as such they have their vagaries. Your job is to see a feature as your customer (human or system) will see it. If you don't try it at least once as a user would see it, how do you know what they'll really experience?

I use these "user proxy" tests to look for:
  • Usability issues. Is the time to accomplish a task just too long? Do I find that I'm clicking through to another screen a lot for information? Is three clicks really too many for this common action?
  • Perceived performance issues. What does performance feel like to the user? Sure, it may start rendering in 2 seconds, but if it takes another 20 seconds to fully render in my browser on my computer, is that really okay? Note that perceived performance may be different from what my performance measurements gather.
  • Context. Does this feature make sense with all the other features? Does it hang together when it's being used, no matter how good the screenshot is? Can I get at the feature where I expect to?
  • Inconsistencies. Does this feature feel like an integral part of the system? Does it have the same UI metaphors? Do the messages - no matter how correct - match up with the messages other parts of the system display?
I'm certainly not advocating avoiding test automation. I'm simply advocating living with a feature for a while. Just like you don't really know a house until you've lived in it for a bit, you don't really know a feature until you've used it for a bit.

When I first wrote this entry I wrote it as "manual testing" versus "automated testing", but I believe these terms are imprecise. It's more "testing as an end user" versus "suites (scripts and other code) that test the system". I haven't come up with really good shorthand terms for this.

Thursday, December 27, 2007

Official Handbook

I talk about process a fair amount, particularly as it relates to testing and as I've seen it used. To a certain extent, process doesn't matter: you don't ship process; you ship a product. However, the process use (and the mere fact of using a process at all) can greatly affect the product you ship.
Most processes I've worked on are based on standard processes that we've all heard of - RUP, XP, SCRUM, etc. - but they're pretty much all adapted to fit the needs of the company. And the processes themselves pretty much allow for that. One example is that we claim we're doing XP here, but we don't actually work directly with customers all the time. Instead, we work with a product manager who speaks for the customer. Does this mean we're not doing XP? Sure, in the strictest sense we're not doing XP. But if I tell someone we are doing XP, they will understand 95% of what we do and more importantly how we approach problems.
That got me to thinking, though. I've seen variations on most processes I've worked on. So what's the official handbook for various processes?
  • RUP
  • SCRUM (there are a lot of links on this one, but I think I've included the most official). I find SCRUM to be one of the more adaptive and adaptable processes, and I think that's reflected in the very active community around it.
  • XP. This is another very active community with a lot of proponents and a lot of people evolving the process.
If you're going to deviate from the specifically defined process, whatever it is, that's okay by me. Just make sure you know what the specifically defined process is and why you're deviating from it. If it's for good pragmatic (not lazy!) reasons, then I say do what works for you.
Don't let process adherence keep you from shipping good product.

Wednesday, December 26, 2007

Watch Your Spacing

Remember with some systems, spacing counts.

foo (bar)

is not the same as


This is fine in some circumstances, and can be useful when you're using space-delimited options (say, in a command line interface). However, be aware that at some point your users will get this wrong. So if you're going to be picky, you need to provide feedback to users right where they're entering the option.

If it's in a config file, throw an error message when the config is read. If it's on a command line, throw an invalid argument error. If it's in a GUI, put up a nice error message next to the field that is affected.

True example:
Today I created an NFS share and put in an access control list with an option. It looked like this:

Access: (my_option)

There was a space in it.

The end result is that the share wouldn't export. It took us 45 minutes to track it down to that space.

So do your users an error and throw a nice error when the syntax is that picky. Spaces are both picky and hard to see from documentation.

Friday, December 21, 2007

Heuristic for Verifying Automated Tests

I'm in search of a heuristic for verifying automated tests.

The Back Story:
We run a number of automated tests:
  • Build Verification Tests. These run after every build. The continuous integration system will not spit out the build (isos and debs, in our case), unless these tests pass.
  • Nightly Tests. These run every night. They're mostly *Unit based (JUnit, PerlUnit, etc).
  • Weekly Tests. These are the tests that simply take too long to run every night. They're functionally identical to Nightly tests, but they take between 6 and 24 hours to run, so we only run these once a week (and they take about 5 days to run).
The Bugs:
Sometimes there are bugs in these tests. That's fine, so we log them and dev fixes them (hooray!).

The Question:
How many runs does an automated test have to pass before it can be marked as verified? 

We do a code review, and the developer runs the test before he checks in, but we still want to see the tests run in BVT/nightly/weekly. The question is, how many good runs do we need to have confidence in the verification?

My Answer:
I don't think this is a formula; there are simply too many subtleties based on the frequency and type of failure, the test structure, the risk of it failing either later in the tests or in the field, etc. So I'm really just looking for a rule of thumb. 

The best I've found so far is as follows:

A test must run successfully for the longest interval between failures plus one.

So, if a test fails every time, then it needs to run once successfully (0 intervals between failures, plus 1). 

If a test fails every fifth time, then it needs to run six times successfully (5 intervals between failures, plus 1). 

If a test fails every three or four times, but one time it went ten times between failures, then it needs to run eleven times successfully (10 is the longest interval between failures, plus 1).


Thursday, December 20, 2007

See Around the Problem

I was talking to a developer today about how I approach a system.

Don't just see the problem. See around the problem.

When I see a problem I have to reproduce it. But often what makes a problem happen isn't what you see as the problem. It's what happened to get you to the problem. For example, a user can't log in. The problem may actually be all the way back at user creation, and the exception that was thrown then.

So it's important to see the context of the problem. Find what else was occurring in the system. I look for different things:
  • Was there an exception earlier that didn't seem to have an effect?
  • What else is the system doing?
  • Is there another system involved?
  • What other processes are running?
  • What was I doing right before this happened? Even if it was in another area of the system?
I don't know how to teach this yet. Any ideas?

Wednesday, December 19, 2007

Testing and XP Generalism

One of the things that XP encourages is generalists rather than specialists. Anyone on the team should be able to do any of the team's activities. This is a bit slower to start, but over time amplifies the team's effectiveness by reducing bottlenecks and lowering the team's truck number.

As a tester coming into an XP team, this can prove to be a challenge. I possess a different set of skills than most of the other team members, so how do I fit in a generalist team? Testing isn't the only discipline with this conundrum. The other role that is called out from the XP team is the customer.  So, in a generalist team, we now have two specialized roles: customer and tester.

A tester is really just one person playing multiple roles - in a way, I'm a generalist. Sometimes, I act like a developer. One morning, I pair with a developer and we write unit tests. Then I pair with another developer and we write automated acceptance tests for a completely different module of the software. Late in the afternoon, I sit with the product manager (our "customer") and we work on a new story. It's not exactly what I think most XP types call a generalist, but it's a start.

In some ways, though, I'm a specialist. I exist to provide information. I support development with testing and feedback. I support the business with information about risk and the current state of the system. In general, I find that this ends up making the standard XP test tasks better. My exploratory testing on a new story finds issues that we simply didn't anticipate. Then we add automated tests for a lot of it.

So yes, I'm something of a specialist when I join a team. I don't think this is a bad thing, although it does seem to go against some XP principles. We're not following XP to the letter. I happen to think that this is fine; I'm not dogmatic about the processes I follow. Where it doesn't work for our situation, we'll change it until it does.*

* I know this rubs some process zealots a very wrong way. If you have a real-world solution where you've been able to follow any process 100%, I'd love to hear about it. In the meantime, I will continue to make my customers happy, even if it means that some parts of XP get modified a bit to help us do that.

Tuesday, December 18, 2007

Not Everything's a Nail

"He who is good with a hammer tends to think everything is a nail." -- Abraham Maslow

I've been to a lot of holiday parties lately, and when geeks party, well, apparently we talk about software development process!* It's been really interesting getting people together who come from various schools of thought. The party attendees have ranged from a SCRUM zealot to an XP believer, to someone who thinks waterfall processes in general have gotten an unfairly bad rap.

That got me to thinking. When thinking about each process from a test perspective**, none of the software processes I'm aware of adequately address every problem and every situation.
  • SCRUM: This one really doesn't address the timing of test very well. Whether you have QA on the team or as a separate team, test tends to get squished in at the end of iterations or (worse) leak beyond iterations.
  • XP: There is a dogma here that manual testing is universally bad because it's not repeatable, can't be run constantly by anyone, etc. I'm comforted to see this changing in the XP community, and the value of manual testing for usability and as a first step of acceptance coming in. This community really seems to be starting to address the problem and trying to figure out how to effectively harness the power of human testers (rather than just coded tests).
  • RUP:  This tends to emphasize pre-defined tests at the expense of exploratory testing. There is also a tendency to wait to start testing until later in the project than I'm a fan of.
  • No Defined Process: This one is catch as catch can for every phase of the process, not just test. It mostly doesn't scale beyond a very few people who are very closely aligned in their goals.
All those risks in all those processes are why I believe that pragmatism will trump dogmatic adherence to process every time. No hammer I know of can turn all problems into nails, and I'd rather solve problems than wield a hammer.

In the end, my company doesn't succeed because we follow any software development process. My company succeeds because we give our customers what they want.

*Disclaimer: I swear, we're not normally a dull bunch! And yes, we talk about other things, too, at least some of the time.
** I think about most things from a test perspective. I'm not a developer or a product manager or a sales person, and I wouldn't presume to understand fully how these various processes apply to them. I can make guesses, but I'm most knowledgeable in the area of software testing. Just FYI.

Monday, December 17, 2007

Livin' the XP Life

I've now been at my current employer for exactly two months. For those of you who are counting, thats:
  • 4 iterations
  • 8 XP Customer team meetings
  • 40 standups
  • 40 pairing sessions (we pair once a day)
Even times like this are a good chance to go back and reflect on what it's been like, living the XP life.
XP is fabulous for recruiting. It's a very quick and easy way to express that this is a company built by engineers and for engineers. It's a company that embraces new techniques, and that's exciting for the kind of people I want to have working with me.
This is fair to middling. When we don't pair, everything gets code reviewed before checkin. For complex things, the end result is usually pairing after the fact to tighten up the code. In the case of QA, we pair on things like test planning and test infrastructure coding. Pairing for us works best when we're creating acceptance tests for stories. We get better tests and better knowledge of the system. In the end it takes more time, however, but it's a good way to train new people.
Stories are one of the elements of XP that is not new to me; other methodologies use stories or similar concepts. The best part about stories is the forced thought and the fact that you wind up with a lot of documentation (ours are all kept on a Wiki). The worst part about stories is that there's little to no information about the system as a whole and how each story fits in. This has resulted in a lot of inconsistencies in the system.
Net Net
In the end, working in an XP shop is a mixed bag. But it's been a fun experiment, and I'm looking forward to helping it continue.

Friday, December 14, 2007

Managing From the Bottom Up

I did a phone screen yesterday, and the candidate asked me "What do you think of as your management style?" I gave my standard answer, but later on I sat down and really thought about it.

As your manager, my job is NOT to:
  • Tell you how to build something
  • Define the structure of your tests
  • Describe for you what you should be doing in any given moment
  • Mediate every technical dispute
All these things are about being the "parent" of the group. They're micromanagement at its best. For me, management is really more about getting all the obstacles out of your way.

As your manager, my job is to:
  • Get the resources our team needs, including the money to buy those resources
  • Point us all in the same direction so we're all marching toward the same goal
  • Hire really smart people who each think about problems a little differently
  • Help remove obstacles
  • Be a source of ideas and thoughts about what to do and how to do it
  • Provide the tools to resolve disputes on their merits
I finally decided that my job is a little different than I had thought. I don't actually manage my team. I manage my team's environment.

In the end, this helps me manage a much larger team effectively. I am not the decider (apologies to George Bush). I'm merely the one who makes sure that everyone can be the decider. I don't have to manage every detail; I just have to build a team that lets me know what details are important. 

I'm most effective when my team is telling me what to do, not the other way around.

Thursday, December 13, 2007

Personal: Snow Day

Today the first real storm of the winter hit Boston. Since I get positively giddy around snow, I'm taking a day off the blog and I'm going to QA a few snowballs!

This is what I see outside.

Enjoy, everyone. Drive safe and have fun!

Wednesday, December 12, 2007

Human Anti-Patterns

In software there are patterns - known good solutions to common design or code problems. There are also anti-patterns - things you really really shouldn't do because they may look great at the start but they lead down a bad path. More generally, anti-patterns are simply patterns to avoid. Patterns and anti-patterns are most commonly referred to as part of software design, but they also apply to QA. Common patterns are things like pairwise, etc.

I like to think about patterns and anti-patterns in software development process, too. These are human patterns and human anti-patterns.

Examples of human anti-patterns are:
  • Manually comparing large datasets. This looks fine, but humans are fallible and easily bored, and they will miss something doing this kind of tedious comparison by hand.
  • Allowing an infrastructure problem to persist. See my post from yesterday for an example. Today I went in and we traced the problems and fixed them; not having a reliable build is not something you want hanging around!
  • Testing only right before release. Not testing as you build seems okay, even good. Hey, it doesn't waste developer time testing partially implemented features. But it will extend your release cycle because it leaves a lot of changing code that all has to come together perfectly - not likely.
So when you think about processes, think about your patterns - the repeatable solutions to common problems that you want. Also think about your anti-patterns - repeatable things that you don't want.

Just like designs, humans have anti-patterns, too!

Tuesday, December 11, 2007

QA Is for Optimists

I love my job. And today was not a good day!

Today was one of those days when nothing goes right. I walked in this morning, asked a developer about a feature, and we quickly determined that the code was in the branch but not in the build. You know, the build I had been testing against for about 36 hours. That's 36 hours down the drain because I can't trust that build. Darn.

Then I turned to the automated tests that ran last night. The failure rate was roughly 6 times normal. Almost all of it was traceable to the same root cause, but we had to go through each test log to determine that. Two hours later, we actually were able to publish the results (this usually takes 20-30 min). Darn.

Then I went to my big system that needed upgrading. No problem. I had a newer build (that actually had the right code in it!), so I placed it on the machine and upgraded. It proceeded to fail. Miserably. For unknown reasons. Darn.

By the time 6pm rolled around, if it could have gone wrong, it probably had.

Now, normally I don't talk much about my day unless there's a thought or a lesson in there. So what's the moral of the story?

QA is for optimists.

At the end of the day, when I was retrying the upgrade for the fourth time, there was nothing left to do but laugh at everything that had gone wrong, and talk to the screen (since asking politely sometimes gets a program to do what you want!). And  that's what it takes to do QA. Your job is to find things wrong all day, to seek out problems so they can be fixed. Don't get into this business if you can't do that cheerfully.

* Side note: I walk to work, which is good. I shudder to think what my odds of a car accident would have been on a day like this one!

Monday, December 10, 2007

Just Pick One!

One of the things that characterizes effective workers is the ability to simply make a decision. In many cases it's unclear who should decide what to do about something. A really effective worker (manager, engineer, etc) will make a decision and go with it*. An ineffective worker will be unwilling or unable to decide.

A quick story:

On Friday I went to pick up a friend from his office. He wasn't done yet, so I waited for about an hour. Now, this is an open plan office, so while I was there I was listening to various project managers talking about a new document they were writing and the template for it. The entire topic of conversation for that hour was about the template for that document and who they needed to get to sign off on it (and they weren't done when I left!).

So I started calculating. The average salary for a project manager in a small company in Boston is $85K.  This is approximately $42.50 an hour. So that hour, for 5 project managers, cost the company $212.50, and nothing got accomplished. The moral of the story?

Just make a decision.

* Disclaimer: If the decision is not yours to make and you know who should make it, then get the right person to make it. This is really just decisions that don't have a clear owner.

Friday, December 7, 2007

It's Like Wearing Your Skinny Pants

Our company, like many companies doing XP, SCRUM, or some other type of agile development, uses two week iterations*. We develop from stories, and the stories are placed in a queue. There's only one problem:

Our stories don't always fit in an iteration.

This means we don't actually finish a story every iteration. Sometimes a story is half done. Note that this does not mean that the software is broken or unusable; that's never an acceptable state at the end of iteration. What it does mean is that sometimes there's code in there that simply isn't usable by an end user - it's code that will be part of some feature X, but feature X isn't done and therefore isn't enabled. Think incomplete, not broken.

In practice, this works out all right from a code stability standpoint. Our automated tests make sure we haven't broken anything that we've said is done. However, from a process and a product planning standpoint it's actually a big problem.

Let's say I've signed up for a four-week (two iteration) story. Well, that means I should finish 50% of it during the first iteration. The problem is, until I've finished the whole thing, I don't really know what 50% is. Of the multi-iteration stories we've done since I've been at this job, all but one of them involved a last minute scramble because more than one iteration's worth of work was left in the last planned iteration. Reference the old cliche:

"The first 90% of the code accounts for the first 90% of development time. The remaining 10% of the code accounts for the other 90% of the development time."**

So, my rule for QA is very simple:

All stories must fit in one iteration. 

If a story can't fit in an iteration, we trim it and split it and rework it so it does. Often this is hard to reconcile with the requirement that stories provide some use benefit, but we've always managed to make it happen.

* I'm not sure what's magic about two weeks. Iterations certainly could be just about any length, but two weeks is very popular.

** I don't know who said this, and Google turns up a lot of sources.

Thursday, December 6, 2007

Be Nice: Phrases to Heal All Wounds

Of course it's important for QA and developers to get along. Despite the bugs logged and the "unreproducible" resolutions, in the end, you all want the same thing: to ship a good product.

So what's the fastest way to make friends with a developer and remind him that we're all on the same side?

"I'm just here to prove you're perfect."

The developer will laugh, but the message will come through. You're not there to cast blame; you're there to provide information about the code and how good it really is.

Wednesday, December 5, 2007

In the Zone

I work for a storage company. We spend a lot of time writing data on to systems for testing. So I'm always in search of new tools.

Running along the spectrum of sophistication, we've done the following:
  • Hand-copy files. This one is really easy and gives you really static data. Just don't expect it to be hugely scalable. You also run out of files to copy very quickly. Oh, and don't copy that private employee information accidentally! If you really want to do this just open up a bash terminal (or ksh or tcsh or whatever) and type: 
cp my_source_dir\* my_dest_dir\*
  • Copy junk data. Enter the dd command (we're on Linux). the problem with this is that you wind up with a lot of files of the same size.  It is better than copying files, because you can do it as much as you like, you get unique files, and you can name the files. You can't easily vary file size, however, and you can't measure performance of your reads and writes without a lot more scripting. It is useful as a quick and dirty data generator, however. Just do the following (change your loop to get the desired number of files):
until i=10; do
dd if=/dev/random of=my_dest_dir\file$i bs=8K
let i=$i++
  • Use IOZone. This tool is actually intended to measure performance of a disk or a filesystem (basically block and file read, write, etc). It will write files of varying sizes to disk, read them off, rewrite them, etc. It has options, also, to leave the data on disk, so you can use it to fill drives. Also, it will automatically calculate performance of each operation and output it in Excel-compatible (space-delimited) format. Try it out!
Good luck, and happy data creation!

Tuesday, December 4, 2007

Keep the Balls in the Air

I was looking at queues today. There are a lot of different areas that QA needs to touch. As a basic example, my team needs to keep on top of all these work queues:
  • team 1 bugs ready for verification
  • team 2 bugs ready for verification
  • team 3 bugs ready for verification
  • team 1 story queue
  • team 2 story queue
  • team 3 story queue
  • nightly test runs
  • weekly test runs
  • customer support escalation
  • QA story queue
  • story stub queue
So how do we keep all these balls in the air?

I've tried two ways:
  • Touch everything every day. Keep on top of your queues and make sure you work them all every day. The goal is to keep any one queue from getting out of control.
  • Pick a queue a day. Avoid context-switching, which is wasteful of your time. Pick one queue each day and get all the way through it, doing everything you possibly can. Sure, each queue will be longer, since you're not touching it as often, but you'll be more efficient about working that queue.
In all practicality, my team winds up touching every queue every day. We wind up needing to touch multiple queues anyway, either because we're blocked on something, or because someone has a question, or because there's an urgent need. So we go with it - pick the highest priority thing across all queues at any point in time.

How do you handle all your queues?

Monday, December 3, 2007

Customers Aren't Testers (Without Your Help)

Imagine if you turned on the TV and you no longer had channels. Instead, you just had lists and lists of programs you could watch. Some of them are available right now. Some of them aren't available yet. Some of them have really detailed descriptions.  Some of them just say "comedy, 30 min". Some of them happened in the past and probably won't come on again, but who can really say? It's just a huge flood of information.

This is roughly what happens trying to define tests in an XP environment. The focus of stories is very short-term and rather isolated to that story. Stories are also defined by the customer (or customer proxy) rather than a trained engineer or tester. Lastly, stories are very positive-outcome focused; they describe what will happen, not what shouldn't happen. What does this mean for the tester?

Your job, as the tester, is to help the customer write good acceptance criteria. You also need to make sure that the negative aspects of the story are accounted for; that all the things that could go wrong react appropriately. In short, you have to help the customer make sure that the story reflects what he wants, not just what he asked for.

I've come up with a simple framework for helping customers write acceptance tests. The goals of this framework are to:
  • Make testing accessible. If we bring in complexities, even just in terms like "boundary value analysis" or "threat modeling" or "equivalence class partitioning", we'll intimidate and ultimately lose our customer. Even if we're doing these things, we need to make them seem accessible.
  • Address the external first. Describe your acceptance tests as what the user will see or experience, then as external influencers on the system are affected. Avoid describing how internals of the system will react, because in XP this could change at any minute. Do keep in mind that external influencers might be other components of the same system. If I'm describing a change in my application, my database might be an external influencer.
  • Describe context. No story works in isolation, and unit tests should handle the story itself as it's being written. Acceptance tests need to not only confirm that the story does what it says, but also call out areas of integration and other modules with affected behavior. This is the context in which your tests run, the environment in which they exist. This could be as simple as a dropped network connection mid-test or as complex as a multi-part system failure with several different processes occurring in the background.
  • Be precise. Don't test "bad passwords". Test "blank password", "old password", "another user's password", "too long password". Often this alone will help better describe the feature and ultimately influence the implementation of the story.
Now that we've figured out what we want to do and that we don't want to intimidate our fellow testers, what are some things we can do? What kind of very light framework can we build?

I find that the hardest things customers have are (1) getting precise; and (2) understanding context. So let's address those:
  • Getting precise. This is easiest when there is a user interface. Ask the user to draw the interface on a whiteboard, then walk through it. Ask lots of "what if I..." questions. Repeat this often enough and you wind up with the user asking those questions themselves.
  • Context. I actually use (real) buckets for this. I put big labels on each bucket describing typical ways that the system might get exercised. Then I put good test cases in each bucket. For example, I might have a bucket labeled "HA failover" for a high availability system. Then all we have to ask ourselves is "if an HA failover happens during this story, will it change anything?". If the story is "new icon", then it won't, and we put the bucket away. If the story is "I can upload a 100MB file", then it will, and we run the test cases in that bucket. I'll talk more about the buckets in another post.
Having pre-made test cases tends to help people riff, and walking through things as the user will helps people feel a story before it's done. All this, and you walk out not only with acceptance criteria, but with a stronger story. Oh, and you haven't scared off your customers with intimidating test jargon.


Sunday, December 2, 2007

Down With Selenium (Core)

I spent a chunk of this weekend trying out some test cases using Selenium. Selenium, for those who aren't familiar with it, is a tool for testing websites.

It was pretty much a disaster.

Let's talk a bit about the application I was testing. It's a browser-based management application for a storage archive. There are approximately 15 pages, with auto-refresh capabilities but no AJAX or other partial page refresh. There is a fair amount of JavaScript, particularly for input validation.

So, what happened?
  • Breaking out of iframes. Selenium depends on putting your app in an iframe and then running all the JavaScript commands through it. All the "_top" methods in our app completely broke the test.
  • Username & Password Problems on IE. The site I was testing displays a dialog asking for the username and password. Selenium's recommendation for getting around this is to put the username and password in the URL. This is disallowed in IE. Although there is a registry setting you can use to get around it, I generally believe that modifying the registry of your test client taints that client and may interfere with other results. So, I find the workaround unsatisfactory.
  • Non-parallelizable. Selenium uses the actual browser, which is great in that it shows true browser behavior. However, this means I can only run one test at once. To scale up (more tests), I have to scale out (more machines). I will have several hundred tests running in continuous integration when this is done, so parallelization is important.
  • Modification of my test server. If I want to run Selenium Core, I have to install things on my test server. This is a HUGE no-no in my world; the sanctity of the test system is inviolate. You don't get to deploy code onto a test server that didn't come from the build system (right down to the OS and drivers). Test tools are no exception. My other options are Selenium IDE (Firefox only) or Selenium Remote Control (runs through a proxy server, which may be a possibility).
My biggest problems with Selenium are really philosophical. The tool demands that I modify both my test client and the server on which my system is deployed in order to allow it to run. I prefer a tool that doesn't force me to compromise the integrity of my system under test.

I will be trying Selenium Remote Control and Canoo WebTest, as I think those are better fits for my testing philosophy.

I much prefer to test things as they will actually be deployed.*

* Yes, we really do this, right down to the same hardware and network configuration.