Friday, August 29, 2008

Reputation

You have a reputation. In your team, in your company, in everyone you've worked with. You have nothing more valuable than that reputation.

Your reputation is basically you... as you exist in the minds of others.

So if you're not even there (and you're not actually in someone's mind), why do you care about your reputation and what can you do about it?

What Is My Reputation?
Your reputation is the composite of many things:
  • what you've done
  • what you haven't done (sometimes this can really make you look good! Choosing not to do something can have great value and be the harder choice.)
  • what other people say about you
  • what you said, or more precisely what other people heard you say
  • what you've written (Ever Google yourself? Do it now; you won't be the first person to Google you)
  • who you know
Why Do I Care?
Your reputation is your calling card. It colors every interaction other people have with you. If you have a reputation for competence, people will forgive mistakes you make as aberrations. If you have a reputation for incompetence, people will take every mistake you make as further proof that you just can't do anything right. (And we all know that we will make mistakes.) It's a hugely different reaction, just because of your reputation.

How Do I Control My Reputation?
A lot of us have a lot of control. We're well-paid professionals who work effectively and control many decisions throughout each day. That's great. But your reputation is something you don't get to control directly.

Instead, reputation is something you control indirectly. However, you can still control it, or more precisely, you can influence it.

A reputation is built on some fundamental principles:
  • Consistency. It's easier for people to remember repetitive things. It's harder for people to reconcile conflicting information. Make life easy - act consistently and you'll color your reputation toward those acts.
  • You're Always Being Watched. Your reputation is built on what people see, even when you don't know you're looking. And everything you do leaves a trail, so don't assume private is really private.
  • Honesty. It's just too exhausting to be dishonest. Please don't bother. It'll catch up to you, and then your reputation becomes only about your dishonesty - not really what you're going to want your reputation to be.
  • People Talk. Any community is ultimately fairly small and chatty. People will talk about you, and someone from your past can determine your future success - with contracts, jobs, etc. You never know when you'll run into someone who knows someone - and they've met your reputation before they've met you.
So whatever your reputation is, be aware that it affects of what it is now and of what you want it to be. Use your behavior, your writings, your actions to influence it in the ways you want. Your reputation will affect your life; make sure the you that exists in the minds of others is helping the you that exists in the real world.

Thursday, August 28, 2008

Seeing Everything That Matters

Imagine you could see absolutely everything about your system.

Every message.
Every button.
Every process.
Every state.

How cool would that be? 

Except that it wouldn't be cool at all. We'd never get anything done. 

I've written several times about the dangers of NOT seeing things - how when you look at a screen certain things just don't jump out, and how testing in the same way can cause you to only see what you expect to see. There's an equal danger in seeing everything. 

Even the simplest commercial system is not small. Imagine all the things you can see:
System logs
GUI
OS logs
Processes
Files in and files out
Network traffic - to and from your box and broadcast traffic
User actions and reactions
Mouse movements
Keystrokes

And now we're overwhelmed - too much information. This is actually a standard human state, not just those of us hardworking engineers!

When asked to remember cars and faces, car lovers couldn't do both effectively. When asked to make a choice between 6 kinds of jam, consumers picked one. When asked to make a choice between 24 kinds of jam, consumers simply walked away. Full study is here (pdf).

Human perception is about filtering. The difference between an effective engineer and an ineffective engineer is the ability to filter appropriately. Eliminate what's unimportant and see what's important. 

Our goal isn't to see everything. Our goal is to see everything that matters.

Of course, this is far easier said than done. So how do we build our filters and learn what's important and what's not?
  • Look through old issues. What were the clues that led to the real answer?
  • Spend time with the system. Getting a sense of what the system does when its behaving normally will help you understand what is an anomaly and what's not. Look in a different area every time - logs one day, process table one day, GUI one day.
  • Mix it up. Work in different areas of the product so you don't become overly familiar with any one area and start to miss things. You want to keep a sense of slight surprise about each area of the product.
  • Explain what's going on. Pair with someone and walk them through what's happening to the system. Make sure you can explain the entire work flow, start to finish.
It takes time to learn what matters and what doesn't. When you've done it, though, you'll find that you can find (and fix!) issues more quickly, you can track down defects more quickly, and your "gut" will get really good. So take the time and learn to see your system with good filters.

Wednesday, August 27, 2008

Good Dogma?

In general I'm pretty anti-dogma. I find the inflexibility in thinking frustrating. Adherence to principles in the face of reality is laudable, but only to a point. Beyond that point is what I've been calling dogma.

In software, I've mostly seen this happen when the line between process and dogma gets blurred.

But then I read an article in a totally non-software related forum that asserts that dogma can be good. It's a short read, but to paraphrase, the article states that there are two types of dogma. The first type is pure resistance to change; this one is pretty much always bad. The second type, though, is creating an idea and sticking to it long enough to see if it succeeds. This one can have power.

I think there's a valid point there. I don't know that I would have called this dogmatism, but the notion that you have to stick through with an ideal at least long enough to see that it's worth something. And sometimes that means sticking with the idea even in the face of current reality - if it works out, we call that innovation.

So where's the line between innovation and dogma?

Tuesday, August 26, 2008

Stepping Semi-Away

There must be a zillion articles out there that say you should step away from your job for a while to refresh and really think deeply about problems you're facing at work. This is your big picture moment.

Here's where I get confused.

I'm supposed to get away from the office, from email, from phone calls. I'm with it so far.
And I'm supposed to be doing something not related to my job. Okay, I'm probably in a museum or in a park watching the ducks.
And this is supposed to help me think about my job. Uh... you just lost me. I'm thinking about ducks. Or possibly art.

I think what is really needed is to step semi-away. Take the vacation day, but don't go do something else. Make thinking about your job - or at least the bigger picture of your job - something you do explicitly. That makes a whole lot more sense to me.

Monday, August 25, 2008

Hope Really Really Hard

There's a method of verification I've come across (and even used occasionally) that I call the "hope really really hard" method.

What Is It?
The Hope Really Really Hard verification method is where you verify fixes by deploying them and doing your own personal method of hoping. I cross my fingers and say "come on, you can do it" to the computer a lot. You're free to choose your own method.

When Do I Use It?
Use this when you have a potential fix for an issue that you don't totally understand, or that is going into an environment you don't really understand. It's ideal when:
  • You couldn't reproduce the problem
  • You can't reproduce the environment into which the fix is being deployed
  • The fix is so urgent it's being done directly in the production environment
  • The fix is done, but "we can't wait that long to show it works. You're confident in it, right?"
When Do I Not Use It?
Don't use this method when you have something better, anything better.

The Hope Really Really Hard verification method is the highest risk method you can use, particularly when you combine it with an account rep who desperately wants you to say this will fix everything, or this will provide an X% improvement. (P.S. Don't fall into this trap. Guesses versus facts get fuzzy when people get stressed).


Come to think of it, avoid the Hope Really Really Hard verification method if at all possible.

Friday, August 22, 2008

Talismans

There are days that are tailor-made to be hard. Big meeting, angry coworkers, interview, whatever it is. 

Those days I need my talisman:


Yup, great shoes.

I can handle ANYTHING as long as I'm wearing great shoes.

Thursday, August 21, 2008

The 12 Hour Rule

I've mentioned the 12 hour rule a couple of times, and thought it was worth explaining.

First, the rule:
Nothing done after 12 straight hours can be trusted.

Not me!
Yes, you. Everyone from nurses to truck drivers to factory workers to yes, even engineers. This is a general labor problem. After 12 hours, your brain is simply fatigued.

I've done some data mining of our defect tracking system, and this has been true over my last 3 or 4 jobs:
  • Our bug rates climb close to 3x for developers working during "crunch times" - precisely when they're pulling 12-16 hour days. 
  • Our duplicate bug rate, unreproducible bug rate, and "wow what the heck does that mean?!" bug rate all increase 2-5x during QA crunch times.
  • Our reopen rate goes up during the same crunch periods, as does the number of erroneous reopens.
But why?
In the case of software development, there are probably a couple of reasons. First is simply error rates as the brain gets tired. The second is that these long work hours are almost always in a crunch period when you're rushing something out the door. Adding fatigue to a sense of hurry is a recipe for cutting corners. Cutting corners increases mistakes.

What can we do?
First and most important, send people home. Keeping them working for more than 12 hours straight will only increase your problems and reduce your overall productivity. Don't fall into the trap.

Second, and this is implied by your first reaction, try to avoid excessive crunches. Crunch times will happen, and it can't be helped. So be sure you're well-rested going into them. And when they're done, ease off for a while.

Third, rotate people so your entire team isn't crunched all at the same time. Be careful here that the entire team has bought into this so you don't get friction from someone leaving at 5 when another person has many more hours of work to do.

We're all under a lot of pressure, and "yesterday" is usually the desired delivery date, but don't violate the 12 hour rule, or you'll just pay for it on the back end.

Wednesday, August 20, 2008

Break the Comfort Zone

It's really easy to get into a testing comfort zone. You test the same things over and over, and you expect the same results. And then one day...... you've missed something.

DARN!

So what happened? Well, we hit our ruts again.

Humans are predisposed to inattentional blindness, and this can easily kill testing. Something is wrong, and you just don't see it. This becomes worse when you're looking at the same things every day - and this happens to al of us. There's "the guy who knows the upgrader" and he tests the upgrader in every release, etc.

So how do we break out of our ruts? How do we reduce inattentional blindness?
  • Switch it up. Every so often - at least once a week or so - switch to looking at a component you haven't looked at in a while.
  • Play Games. This one can be effective when you are in a test case rut. Think soap opera tests, or headline tests. Anything that is not normally how you develop test cases.
  • Go away for a while. Stop testing for a bit. Go work on something else - that new script you needed, defect aging analysis, whatever.
  • Don't forget the 12 hour rule. After 12 hours, I don't trust your results any more. Your error rate will go through the roof. This goes for everyone, not just testers.

Tuesday, August 19, 2008

Clean Up This Joint

It's going to happen. The role will be different. The name will change. But one day, you're going to be faced with someone who was brought in to clean up this place.

Ladies and gentlemen, meet your new sheriff. This is going to be a bit of a rough period, so let's think before something too final happens.

What is this guy doing here?
Usually the sheriff comes in with a mandate from someone higher up - the CEO, the VP Engineering, sometimes even the board. This is where a lot of the really antagonistic metaphors come in. This guy is here to "kick a** and take names", or to "whip things into shape", or to "knock some heads together". 

Hidden in all the metaphors is the sheriff's purpose: this person is here because that someone higher up believes there is a problem and the current team can't or won't solve that problem.

Let's talk motivation
This is someone who's entire job is cleaning up the problem, whatever it is. His success resides in producing changed behavior, or at least the perception of changed behavior. 

The sheriff is NOT here to:
  • Learn how things are done
  • Mollycoddle (aka take note of anyone's feelings or pride)
  • "Tweak" processes or procedures
There may be a lot of talk about results, but results are a direct outcome of behavior. So the things you can expect to change are behaviors - policies, roles, processes, procedures.

What do I do now?
There are a lot of ways you can choose to react to the sheriff. Some of them are effective, others will leave you looking, well, more like this:


The choice is up to you.

The first prerequisite to any profitable reaction is to acknowledge that there is a problem. The problem may be only the perception of those higher up, but that is in itself a problem.

After that, you can:
  • choose to fight the problem and stay with your old way
  • put your head down and do what the sheriff wants
  • understand the problem and help fix it
What is your goal?
The ultimate goal is to fix the problem. The first thing you have to remember is that you and the sheriff are on the same team. The second thing you have to remember is that you can influence how this goes.

Remember:
  • Attitude is paramount. Of primary importance is that you appear willing to help. Resistance will likely be crushed, so don't act offended, resistant, pouty, etc.
  • Be confident in what you know. Sure, there's a problem, but it's likely there are some good things, too. Make sure the baby doesn't get thrown out with the bath water.
  • Change is okay. Some of the changes will probably be good. Chances are, if there's a problem of this severity, it's made your life miserable in some way. A change has a chance of fixing it.
  • Don't take this personally. The work you do is separate from who you are as a person. Furthermore, you don't even know that your work is the source of a problem. At least wait until you've been criticized to react! And then remember, its your work, not you.
  • Evaluate yourself. Take this as a chance to step back and think about your work and how you can make it better. Maybe or maybe not you're part of the problem, but there's always something you can do better.
Good luck, and remember, you can choose how you handle the sheriff.

Monday, August 18, 2008

Feeeeeelings....

Feelings are great. Having a "gut feel" about a product and following it up is part of what distinguishes a good QA engineer from a great QA engineer.

But....

Feelings are no good if you don't follow them up with facts.

An analogy:
Imagine you're a general, and you're in your command post. A soldier comes running up from the front lines and says, "We're losing! We're losing!". There's your gut feel.

What should the general do?

There are a lot of options - order a retreat, send reinforcements, wait it out - but any one of them could be correct based on the information you have. Your gut feel is a good indicator that you need to look closely at your battle, but it doesn't tell you how to fix it. You need facts. You need to know why that feeling is there.

So you ask your running soldier why he thinks we're losing. You probably send out a scout to see what casualties really are. And once you know that the terrain is harder for artillery than you thought, you bring in more planes and infantry (or whatever the problem really is).

And back to the real world:
A gut feel is a good indicator that something is going on. The best thing you can do next is gather information. Run tests, look at bugs, do a code review.  But gut feelings are not actionable, and you can't really do anything to solve it unless you've taken your gut feeling and added facts.


THINK based on your gut.
DO based on fact.

Friday, August 15, 2008

Pain-Driven Features

Often we find that there is not one problem in something, but a cluster of problems. This doesn't necessarily point to a problem in the code specifically, but often to a design weakness, or a process issue. In this case, it can be difficult to really define the boundaries of the problem and therefore the change required to fix the underlying issue.

Basically, a cluster of problems needs to be resolved, and that often means adding a new feature (to the product or to the process). So we treat it like a feature, and brainstorm what we really want it to do. After all, brainstorming is a fairly common requirements development technique.

Where it gets hard is that this feature isn't just a future problem. This is a feature you're creating because you're hurting today. So brainstorming is great, but keep in mind that you're feeling the pain right now, so speed matters. Dreams more than an hour or two long are dangerous - after a few hours you're not solving your pain point, you're designing something that's going to take you a while to build, and that means your pain isn't going away any time soon.

So when you're brainstorming how to handle a feature, keep in mind that your most important priority is not the feature itself. Your most important priority is alleviating the pain it's causing today. 

Think quick hits. Think doing it right. And most importantly, don't overthink. Just make the pain go away.

Thursday, August 14, 2008

Automation Cost Benefit Analysis

We've been talking about whether to automate certain tests lately. One of the driving factors here is that to automate it to the level we're discussing is unusually consuming of time and resources. We're talking about having to do a non-trivial expansion of our test infrastructure, plus write the tests themselves, so the question has arisen about whether this is worth automating in our usual fashion?

First, some background. As I've mentioned before, I work in an Extreme Programming shop, so when we write a feature, we write tests to expose that feature. Usually, these tests are automated*, and the default assumption is that a test will be written in code and run automatically every night or every week. Only if something is incredibly onerous do we start by assuming it's manual. We always manually test new features as well, looking for usability, things missed by the test code, etc.

The particular feature we're discussing here is to "support third-party software WidgetX" (of course it's not WidgetX, although that would be really awesome. The specific software doesn't matter). Now, let's assume for the moment that we use manual exploration to learn enough about WidgetX to create some intelligent tests. There are two paths we could take:
  • Manually re-certify WidgetX every release
  • Write the automated tests and run them, probably weekly.
The advantage to automating the tests is that we get feedback more often as we code that we haven't broken anything. However, writing the tests costs us money and time, and running the tests also isn't free

How long does a test have to run to make the effort of that test automation "worth it"?

In other words, under what circumstances have we recouped our investment? On the benefits side of the automation balance sheet we have:
  • the likelihood of regression - pretty high, since XP involves a lot of fearless refactoring
  • the importance of the software working
  • the time saved by not having QA run the same old tests manually
  • not creating an exception to our development process
On the costs side, we have the cost in man hours and in resource overhead.

I don't know where we're going to end up yet, but we have a framework to figure that out.





* By "these tests are automated" I mean that code is written to set up an environment, perform some action, and assert on the resulting state. This code is then put into our test framework and run automatically every night or every week. Human intervention is required only to diagnose failures.

Wednesday, August 13, 2008

Independent Verification

Sometimes I get into a situation where we see a behavior in the logs that indicate a client writing to us is doing things a certain way. Now, the first thing you have to know is that we log every request as it comes into the system. We write down the NFS operation, the volume, and the filehandle (you don't have to know what these things are, exactly - basically what the client wants us to do, the location of the file, and the name of the file to do whatever-it-is on) - and then we process it. We use these logs a lot when debugging and often find things that are interesting. For example, we may find that a client is doing three setattr operations when most clients do one. Or we may find that the client "just doing writes" immediately reads back everything.

This is great internally, and it's really helpful with client profiling. Sometimes it even finds a problem on the client side - consistently trying to create files before creating their containing directories, for example.

And there's the rub. Everyone's happy to believe our logs when they point out things in our system. When the problem is on the system that is a client to us, all of a sudden we find that our logs are insufficient. The support team for the client software doesn't believe that they're not manipulated in some way (don't get me started on politics!). I can't really blame them; they don't know our system. So to provide independent verification, we turn to WireShark. A quick trace shows our logs are accurate, and then we move on to solving the real problem.


The moral of the story is:

Just because you know something doesn't mean everyone knows it. Be prepared to provide some form of independent verification.

Tuesday, August 12, 2008

Reformat to Rethink

So I'm working on a new theory.

You should change your test plan format every few releases.

Let's step back and think about this for a minute.

Over time, we stop finding issues with the test plan we're using. We're wearing the same old pathways through the system.

We need a way to shake things up. Regression tests are great, but there are always new things to find. Looking at things the same way encourages us to think about them the same way.

Consequently, we change our test plan format every couple of releases. It's the same stuff, just formatted differently.  The theory is that looking in a new way at the same tests helps us see the whole system differently.

The downside to this, of course, is that it makes it a lot harder to compare results across releases.

Anyone else trying anything like this? How's it working for you?

Monday, August 11, 2008

Did You Test This?

The theme of this (young) week is "Did you test this?" All of a sudden, for reasons I'm not quite sure of, QA has been asked whether we tested certain scenarios a lot.

The trick to these requests is that there's almost always a story behind it.

Question: "Did you test replication between type X and type Y systems?"
Translation: "Marking convinced a client to upgrade from type X to type Y and tell a journalist all about it, and I really really need the data transfer to go smoothly."

Question: "Did you test upgrading with Active Directory configured?"
Translation: "Umm.... my Active Directory settings seem to be lost, and all I did was upgrade and tweak a few things..."

Your goal is to find that story.

The story is what you really need to solve. The question is just how the other person is trying to describe the story. Understand the story and you can answer the question the person intended, not just the question the person asked.

Friday, August 8, 2008

Evidence and Proof

One of the responsibilities of QA is to look at things that support can't figure out, and today one of them came up. It looked fairly simple on the surface. We have a multi-machine system, and there is a log collection feature. Basically, it goes out, grabs all the logs from all the machines, tars them up, and presents them for download - very convenient. In this case, some of the logs were missing.

So I dug in.

After tracing it through code inspection and log file inspection, I tracked down the real problem. Basically, it tried to bundle up the logs on each of the machines and then transfer them for inclusion in the full log bundle - this is normal. On three of the machines, that process failed, and on five of the machines it succeeded.

Now what?

I traced it a bit further on those machines specifically and tracked it down to a problem area. Here's what the code does (note that this is pseudocode, of course):

for each machine m in machines {
    nfs mount a drive on the machine
    grab the log tar file
    unmount
}

Now, on the five machines that worked, this worked fine. On the three machines that failed, this step didn't complete successfully, but it didn't throw any error messages at the log collection script level. Furthermore, I found evidence in the auth.log that the mount and unmount succeeded. So now I've narrowed it down to that log tar file. My theory goes that the tar file simply didn't exist or wasn't complete (it varied by machine). Earlier on, when it was making the tar file, that failed or didn't finish.

Here's the problem.

All the evidence points to my theory being correct. I have no way to prove this, though. All I see in the logs is a mount, a copy timeout, and an unmount. There's no log in our script that we successfully created this tar file or didn't. There's no evidence in Linux syslogs, auth logs, kernel logs, message logs, anywhere, that a tar file got created (after all, creating a tar file is simply creating a file, and that's not really interesting enough to log at the OS level!).

So what do you do next?

I eventually found my proof - an error message buried in one of our other logs having to do with the thread duration of the thread that was doing the tar. But it made me think:

When you can't prove something, how strong does your evidence have to be to convince you?

We don't live in a world of black and white. It's simply not possible to prove every assertion we make, particularly when we're outside of our (semi-controlled) test environment. I don't really have an answer for this, so I'll throw it to the world out there.

How do you handle these situations?

Thursday, August 7, 2008

How You Know You're Thriving

I'm on a mailing list that recently had a discussion sparked by someone looking to demonstrate that the field of testing is thriving by showing there are controversies. The thread rapidly went elsewhere, mainly to misnomers about how testers are perceived (or rather, about how testers perceive that they are perceived... say that 10 times fast!), but I was stuck back at the beginning of the thread, and I found myself wondering....

Is the existence of an argument really evidence that a group is thriving?

My gut says no. There are a lot of reasons that members of a group might be arguing or have different views on things:
  • Maybe the group is really just starting and seeking definition
  • Maybe the group is falling apart at the seams and this is evidence of an impending split
  • Maybe the group just enjoys conflict and doesn't seek resolution (I worked at a place where this was the case - "no fun unless there's a fire" - it was exhausting!)
  • Maybe the group really is thriving and this is a way of assimilating new ideas, new approaches, new challenges, etc.
So if the existence of arguments or controversies isn't evidence that a group is thriving, what is? How do I know if this group I'm looking at is functional and improving?

Let's start by defining what we mean when we say a group is thriving. A group is thriving if it is able to successfully respond to changes in its environment - new ideas, new challenges, new groups. Further, in order to be considered thriving, the group has to be able to both attract and retain members.

Awesome, I have a group and a definition. What do I look for to indicate that the group is actually in this wonderful thriving state? I don't think there's one answer for this, but I do think there are things you can look for:
  • Group membership is stable or slightly increasing over time, and turnover exists. Sure a group may occasionally downsize, but generally in a thriving group new people are joining and people who are there are staying for a while. A thriving group also needs people to be moving on. Not all of them, and not incredibly quickly, but some.
  • The group is aware of new ideas in their field. Ask the group about some concept, technique or tool that is fairly new. Reservations and controversy are okay, but if there's no awareness, that's not good.
  • Issues are resolved. It's all very well to have people with differing views in a group; to a large extent that's healthy. However, conflict for conflict's sake is a real problem. If the group is more interested in arguing than in figuring out a solution, the group is not thriving.
  • There is a common vocabulary.  No group can thrive unless the members of the group can communicate with each other. If you have a group that uses different words for the same concept, or the same word to mean two different things, then there's a problem.
I'm sure there's more that I haven't thought of. When you're looking at a group, what do you use to figure out if it's thriving?

Wednesday, August 6, 2008

Selenium Grid: Configuring Machines

I posted a couple of days ago about trying to get Selenium Grid to work. I've made some progress and wanted to get it down in case there's someone else out there trying to do what I'm trying to do. Most of this is sourced from elsewhere on the web - thanks to everyone who wrote a blog post, instruction set, document!

Goal
I have several Selenium tests that currently run in a single environment on a single browser. My application is Ruby on Rails. I'd like to make them run on all the browsers the product claims to support: Windows (IE, Firefox) and OS X (Safari, Firefox). Once this is basically functional we'll add the nice to haves: running in parallel, running multiple projects (each of which has its own tests), and some form of unified reporting.

Setup
I'm developing on OS X (Leopard). The build system is Linux. I'm using VMWare to run Windows clients. If you're on other platforms, well, I hope this at least points you in the right direction!

I'm assuming you've got Selenium tests running locally already.

Grid Pieces
Selenium Grid consists of several pieces:
  • Remote Controls: These are the things that actually run the tests. If you're currently using Selenium RC, you're using this.
  • Hub: This is the thing that is aware of the available remote controls and picks one to run a given test. The Hub knows a few things about the remote controls: the type (*chrome, *iehta, "Firefox on Windows"), the port, and the URL. It picks which one to use based on type.
  • Selenium tests: These are the tests themselves. They're not actually part of the Grid, but it's useful to consider.
Configuring The Hub
First thing's first. You have to configure your Hub. To do this:
  1. Go get Selenium Grid.
  2. Install Selenium Grid. I put it in /Applications/selenium-grid.
  3. Configure the grid and make sure it builds. I used these instructions. Note that you need to have Java and Ant.
  4. Launch the hub by CDing into /Applications/selenium-grid and writing "ant launch-hub".
At this point, you should be able to point your browser to http://localhost:4444/console and see the Hub running. You'll see a list of available environments. That's controlled from a configuration file: /Applications/selenium-grid/grid_configuration.yml. Feel free to modify the file to add new environments or delete environments you don't need. Just restart the Hub when you're done.

Be careful here. At the time I wrote this, Firefox 3 on Windows just didn't work, and Safari 3 didn't work reliably.

Configuring Each Remote Control
Each remote control is essentially standalone. I think of these as basically standing around and waiting for the Hub to give them something to do. Repeat this for each remote control you want to configure:
  1. Download Selenium RC
  2. Download Selenium Grid. I'm not completely convinced this is necessary, but I didn't get it working without it, and if it just sits there, well, no harm.
  3. Install Selenium RC. I used the instructions here.
  4. Configure Selenium RC to use the default port (3000). Be sure it's actually running.
Pointing the Hub At a Remote Control
Once you have a remote control ready and running you can point your Hub at it.
  1. On the machine running the Hub, leave the Hub running.
  2. Open a terminal, cd into /Applications/selenium-grid
  3. Start the remote control from the hub with: ant -Dport=5556 -Denvironment="*chrome" -Dhost=10.0.1.197 -DhubURL=http://10.0.1.206:4444 launch-remote-control.
For each remote control, set the port to something different; you can't have two remote controls using the same port.
  • -Dport is the port that the Hub will use to talk to the remote control. This must be unique.
  • -Denvironment is the environment you'll be using. Set it to something in the available environments list (check the Hub console in the browser).
  • -Dhost is the IP address (or name) of the machine that is running the remote control.
  • -DhubURL is the full URL (including http:// and the port) of the machine that's running the Hub
Run the Test
Now we get to the part where we actually use this thing we've created. We're in Rails, so we're going to do this as a rake task. Here's the actual rake task file that we'll put in lib/tasks:
================================
require(File.join(File.dirname(__FILE__), '../..', 'config', 'boot'))
require 'rake'
require 'rake/testtask'
require 'rake/rdoctask'
require 'tasks/rails'

@osx_browsers = ["*firefox", "*chrome"] #should be safari instead of firefox, but safari doesn't work in the grid yet
@win_browsers = []
@platforms = {"win"=>"Windows", "osx"=>"OS X", "linux"=>"Linux"}

namespace :test do

task :run_browsers do
unless ENV.include?("platform") && @platforms.has_key?(ENV['platform'])
raise "A platform is required! usage: rake test:run_all_environments platform=[win|osx]"
else
platform = ENV['platform']
case platform
when 'win' then
browsers = @win_browsers
when 'osx' then
browsers = @osx_browsers
end

puts "Running tests for the #{@platforms[platform]} platform"
browsers.each do |browser|
puts "Running tests in the #{browser} browser"
ENV["Browser"] = browser
ENV["TEST_DIR"] = 'test/selenium'
Rake::Task["test:selenium"].invoke #run all the selenium tests within the grid context
end
end
end

task :sanity_check do
verify "You must install RubyGems to run this example : http://rubygems.org" do
require "rubygems"
end
end

end

def verify(error_message)
begin
yield
rescue Exception
STDERR.puts <<-EOS ******************************************************************* #{error_message}  ******************************************************************* EOS raise    end end

==============================

To call the rake task, just:
  1. start your Rails server (script/server) 
  2. Run rake test:run_browsers platform=osx
If you've done everything right, then you'll see your tests start to run on each of the browsers you've defined.

Let's look at what this rakefile does:
  • Defines each of the browsers on each platform (OS type). This is the same as the name you put in -Denvironment. Make sure you have a remote control set up for each environment.
  • The sanity check probably isn't strictly necessary, but why take chances?
  • You'll notice that I'm calling another rake task. This is whatever task you've defined to run your Selenium tests. I find this easier than calling them separately; you can keep up with any special config you're doing that way.
Conclusion
And that's basically it. I hope this helps!

Tuesday, August 5, 2008

Cautionary Tale

This article has been around a while, but go read this case study in software development mistakes. It's long-ish, but worth a read.

The article is intended to point out a lot of the "classic" mistakes that get made over and over again in software development:
  • Lack of visibility into progress, which ends up in surprise when the schedule slips
  • Isolationist team members, which ends up with a game of "is this huge chunk of code any good?"
  • Over-optimistic scheduling, resulting in slips when reality intrudes
  • Feature creep
  • Unclear definitions of done
There were a couple things in the article that I noticed and wanted to point out that are also classic mistakes:
  • QA is the canary in the coal mine
  • Work harder to make up time
Some more classic mistakes in a software project.

QA is the canary in the coal mine.
This one is interesting. Consistently in the story, the first warning that things were not kosher came from QA. That's bad. If you're surprised by what QA tells you, then the dev team is not collectively aware of your code. There's nothing wrong with sending something buggy to QA... as long as you know it's buggy. Surprise says you aren't evaluating (or can't evaluate) your own code. The end result is that it's almost certainly not as good as you think it is.

Work harder to make up time.
In the article, schedule slippages are countered in part by forcing people to work 10 hours a day, then 12 hours a day.  After 12 hours, they may be at work, but they're not exactly working. They're checking email, surfing the web, and generally being non-productive. Working harder won't make up time, it'll just wear people down.

A corollary to this is one of my personal guidelines: "don't trust tests done or code checked in after 12 hours straight".


We all make mistakes. Let's just try not to make the same ones over and over again.

Monday, August 4, 2008

Not Self Contained

I've been starting to work with Selenium Grid, and I'm still in that "doesn't quite work the way I want it to frustrating" stage.

I can live with lack of documentation. (Thanks, forums for helping alleviate that!)

I can live with obscure and misleading error messages. (Thanks again, forums!)

There are two things that really get my goat, though:
  • lots of prerequisites
  • too many browsers just don't work
Regarding prerequisites, here's what I have to do to get the tests running:
  1. start the rails server (script/server)
  2. start the Selenium hub (ant launch-hub)
  3. start the remote control machines (ant -Dport=5556 -Denvironment="Firefox on OS X" launch-remote_control)
  4. repeat step 3 for each machine
  5. run the test
I know I've gotten spoiled by just having test::unit do a lot of this for me, and I can certainly write a Capistrano recipe to do these, but it seems a bit ridiculous to have so much that the test framework can't do. At the least you'd think you could do something like register remote controls so the hub could try to autostart them on launch.

Regarding browsers, I mostly got unlucky. I happened to start working with a remote machine running Firefox 3 on Windows. Too bad that browser isn't actually supported yet. Also on the doesn't really work well list is Safari 3. 

It's not all bad news. The people working on the project are very helpful. I just don't think Selenium Grid is quite as mature as I'd like yet.

More updates as I get this working.

Friday, August 1, 2008

Don't Be Part of the Problem

I was reading blogs, as one is wont to do on a Friday morning, and ran across this entry about "The Test Test". The gist of the article is that there are certain things that you should look for in an organization to determine if your test group has a chance of success. There's a lot of good stuff in there, but in the end it boils down to two things:
  1. Does this organization consider shipping a good product important?
  2. Is there respect for the testing organization?
First of all, I completely agree that an organization that is predisposed to consider testers valuable is going to be more fun to work in and more likely to ship a product that testers (and everyone else!) can be proud of. However, I get frustrated sometimes because I watch testers set themselves up for disrespect and failure.

Sometimes, we are the architects of our own bad situations.

So yes, please look for an organization that passes The Test Test. But also ask yourself if your attitude is part of the problem. 

DO NOT BE A TESTER WHO:
  • Seeks to feel inferior
  • Carries a chip on your shoulder larger than a baseball bat
  • Makes derogatory comments about developers who only see the "happy path"
  • Gives vague test cycle estimates without reasons why
  • Cannot explain testing goals and what they mean
In other words, if you wander around muttering about how poorly you're treated, or you walk into a new job expecting to have to claw your way to equality, then you will be treated poorly, and you will be treated as inferior. Approach a situation as if respecting test is the only option, and the rest of the organization is much more likely to simply fall in line. Keep in mind that your role is to educate, to help people understand the value you bring and continue to bring. Allowing an organization to remain ignorant of what test provides is your own fault. So fix it!

Like almost anything, creating and maintaining an organization that will enable testers to be successful requires actions from both sides:

The Organization's Responsibilities:
  • See The Test Test. I can't put it better.
Your Responsibilities:
  • Education: Help people understand in their terms what test provides. For the sales guys, speak in money terms. For the developers, speak in terms of fewer fires to fix things in the field.
  • Cordiality: Be polite and pleasant to be around, just like you want people to be polite to you. I don't care what your role is, there's no cause to be rude!
  • Justify Yourself: Be prepared to explain why your estimates are what they are, or why you test what you're testing.
  • Provide Value: This part I hope we can all do!
  • Show Results: Now that you're providing value, make it public. Show the number of support calls decreasing, show the number of patches in the field dropping.
So remember, there are two sides to this problem. Look for an organization that's going to create prerequisites for success. Look to yourself for the same thing. And then let's move on, all of us, better than before.