Friday, May 30, 2008

Focus and Flailing Are Both Okay

Bear with me, context matters for this one:

We had a problem at work, with some behavior we couldn't identify. The behavior itself was fairly clear - performance for a certain file size and certain write pattern degraded over time. The problem was getting that to happen in a test environment so we could poke at it and fix it! At this point, understanding and codifying the behavior is important, as is replicating the problem environment as closely as possible. The plan here gets quite simple: make your environment more and more like the problem environment, and eventually you will see the problem.

This is a very focused state. What you're doing here is actually straightforward. Find some way in which your test environment differs from the problem environment, and change it. Start with the core of your system and work your way out, getting more and more similar along the way.

Now we have a different problem. We can make the behavior happen (hooray!). We have no idea what's going on. Without knowing what's actually important we just start stressing the different factors. There are a plethora of possibilities, so we take our best guess at what might be a factor and start changing.

In this phase, documentation is incredibly important. You need a record of what you've tried and what effect it had. In general, changing one thing at a time is important, and only when that fails do you start changing things in groups.

This is the flailing state. Without a lot of guidance from the problem, you'll be trying lots of different ideas in lots of different areas. Which tests you do in what order is a fairly arbitrary choice; what matters here is gathering data. Feeling like you're flailing is okay, as long as you're doing one thing at a time, writing down what you do, and writing down the effects it has. Flail away, just do it with docs.

At some point, you catch a break. Through hard work, a lot of tests, research, debug logs, hail mary passes, etc.* we figured out what the stressor was. And it was a change between the version that didn't show this and the version that did. Change this one thing, and the system performance goes right back to where it was.

But, and this is important: you are not done yet.

All of a sudden the distinction between interesting and useful gets important. There are lots of things to know: when was this change introduced? Does making this change on the last good version of the software cause us to see the break (thus implying this is really the only aspect of the problem)? Do we have to make this change just in the one area we tried or in more areas. How does it affect performance on other file sizes? Other write patterns? There are a slew of good ideas and interesting things to test. But our goal here is not to characterize every aspect of the system; our goal here is to resolve this one specific issue. So, if it's not about this file size and this write rate and this hardware, it's interesting, but it's not important (not yet, anyway). 

This is focused and prioritized. You're still running a lot of tests, but you need to be brutally focused on what you are testing here. All you're doing is standing on your problem definition and defining it's limits. The minute you step away from the problem definition, you've lost focus and you're not helping fix the problem any more.

Long story short, you can often tell where you are in the path to diagnosing and fixing an issue by the types of tests you're doing. So ask yourself whether you're flailing around or whether you have a focus and a reason to be do each thing you're doing. Then ask yourself whether that's the right stage to be in, given what you know about the problem. Remember, flailing is okay, and a strong sense of focus is okay - you just have to be using each at the right time.

* In our specific case, it was literally a dream that someone had - very cliche but effective!

Thursday, May 29, 2008

On Reporting, Part II

I've written about reporting before, but that kind of daily report is a relatively structured thing. The triggers for sending them are fairly well defined, and there's generally an event after which daily reports stop (in our case, we start when QA takes code for release, and we stop when QA releases the build to support/deployment).

Sometimes there is a need for ad hoc reporting, too. These types of reports are when you have an issue that crosses teams and that matters to numerous people. Think client issues, think service packs, think office moves, etc.

It's really tempting to make an ad hoc report be a verbal thing, just something that "everyone working on the issue knows". Don't give in to this. No matter what's is causing you to need a form of ad hoc reporting, if it's going to last longer than a day or two, set up some sort of reporting expectation.

The report doesn't need to be a major time-consuming thing. It can be a wiki page, or a bug, or an email thread; the format really doesn't matter. It just needs to be something quick that can be updated by anyone working on the issue. There are only three required elements to the report:
  • What we know. Include things we've learned and things we've tried (and their results).
  • What we'd like to know. Include things we think are the stressors or the unknowns here.
  • What we're trying next.  Include a very brief sketch of what we are going to try in what order.
Without ad hoc reporting you'll spend more time explaining what you know and talking about next steps than actually working on the issue. So it seems silly, but take the 5 minutes and just get it started. Three days later you'll be thanking yourself.

Wednesday, May 28, 2008

Do QA Tools Engineers Still Exist?

The primary position I'm recruiting for right now is a QA tools engineer. This is turning out to be really hard to even get close to and I'm not totally sure why. There seem to be a number of factors that are making this one particularly difficult.

QA Tools Engineer is a development position.
I know this is in the QA group, but it's a development position. The only difference is that instead of developing a server logging module inside our product, you're developing a test logging module inside our test infrastructure. It still has to handle a lot of traffic (on the order of 1 TB a day, for this particular example), and it still needs to be very robust. It just doesn't sound quite right. This isn't a QA engineer working on tools. This is an engineer that happens to write tools.

QA Automation Engineer <> QA Tools Engineer
This one is particularly bad with recruiters. The keywords are "develop", "infrastructure", and "QA". You put those all in a pot together and sometimes you come out with different things. There's the QA Automation Engineer, who basically writes automated tests all day. Use of QTP or WinRunner is common, as is use of scripts (either inherited or home-grown). Then there's the QA Tools Engineer, who is a developer that just happens to write tools that are used in testing.

Perl has multiple uses.
Once you've overcome all the other problems, you still have to worry about the technologies. Our test infrastructure is written mostly in Perl, so I'm looking for someone strong in Perl and particularly in Object-Oriented Perl.  But Perl has two common uses in QA: for test scripts, and for infrastructure and frameworks. We want the latter. The problem is that the resume generally only includes a vague phrase like "wrote Perl to do X". So you have to screen a lot of scripters out to get to the infrastructure/framework developers.

So this is your test script hammer:

And when you bang it, it squeaks.

And this is your infrastructure hammer:

And when you bang it, the nail goes into the board further.

We want to drive in a nail, but asking for a hammer is resulting in a lot of squeaking and a little bit of nail movement.

I haven't come up with a solution to that one yet.

So my question to you all: do QA tools engineers still exist? If you know one (or you are one), email me! I've got an awesome job for you.

Tuesday, May 27, 2008

Hypotheses, Theories, and Explanations

In the land of tracking down issues, there are some important words that really shouldn't be mixed up.

Hypothesis. A hypothesis is basically a guess. It's generally the first step in coming up with an explanation. 
Example: "We ran out of memory!" 
Hypothesis: "A long-running job was reading a lot of objects into memory and just never flushed them."

Theory. A theory is a hypothesis with some backing. You've taken your hypothesis and compared it to all known facts, and it's holding up well. 
Example: "We ran out of memory!"
Theory: "A long-running job was reading a lot of objects into memory and just never flushed them. We see a Java core that shows an out of memory error at that time from that job. We also have logs showing the job starting and in progress, but never completing. We can make a long-running job exhibit this behavior in a test system."

Explanation. An explanation is a theory that can be reproduced on demand, with the same starting state and the same result as the original problem. Resolving the issue shown by the explanation resolves the original problem (i.e., causes it not to recur). Basically, an explanation is a theory that's been proven in the field.
Example: "We ran out of memory!"
Explanation: "On a system configured like the client, we run out of memory due to the proposed theory (see above). Modifying the behavior so that long-running jobs periodically dump their objects from memory results in no out-of-memory behaviors at the client site.:

In short, until you've reproduced it, you have a hypothesis. After you've reproduced it on another system, you have a theory. When it's happened again (or been fixed and shown to not happen), you have an explanation.

I started writing this post thinking it was going to be a fairly straightforward attempt to clarify some phrases that are often used in an overlapping manner, but it occurs to me that this might be a bit controversial. The only real strong (and completely tangential) thought I have here is that you don't know everything. You may know all the relevant things, but you may not. So until you've seen the problem fixed in the place the problem originally occurred, then you cannot say with 100% certainty that you've reproduced the issue. You may have reproduced a very similar issue. The vast majority of the time, you'll have the same issue; it's that last little bit and those really subtle issues that make life really interesting.

Friday, May 23, 2008

An Example's Worth A Thousand Words

I read a fair amount of requirements. 

Some of them are incredibly sparse, like so:


Some of them are overly verbose without actually saying much, like so:

The system shall [EDIT: these seem to always start with "the system shall"] provide a login mechanism requiring two fields: username, and password. The username field shall be 56 px in width and 18 px in height. It shall require the use of characters as defined in RFC1920, part A. The password field shall be 62 px in width and 18 px in height. It shall require the use of the accepted subset of characters as defined in RFC1920, part B. ....

Some of them are a nice combination of defining without being useless, like so:

Login requires the user to enter two fields, username and password. For username definition, see registration requirements. For password security, see registration requirements. There are three allowed states:
- failure to login:  show error message
- successful login: show home page
- nonexistent user: redirect to registration page

But the best requirements include one important thing: examples. An example can clarify a requirement in far less verbosity than actually spelling everything out would. A screenshot is a classic case of an example; instead of describing layout and pixel dimensions (which would go on for pages and pages), it is a quick way to say, "here's an example. Make it look like this."

Examples are good for more than just visual things. Often they're good for allowable and not allowable inputs. To continue with our login example, we could easily describe the allowable characters (or character sets), but often people more readily understand when there is also a set of brief examples and why they're good or bad.

All this comes with one very large caveat: examples do not replace requirements. They supplement and clarify requirements. Unless the requirement can be entirely encompassed within the examples, you will also need to include the actual rules and restrictions of the feature.

Next time you're puzzling your way through some dense set of requirements, throw in an example or two; it's amazing how much more clear the entire thing will become.

Tuesday, May 20, 2008

"Done" Is So Fuzzy

Hooray! We're done!*

* Done is such a fuzzy word. What exactly is "done"?

As far as I can tell, all of these are definitions for the word done:
  • Feature Complete. All features have been implemented. No testing or bugfixing is included in this one.
  • Feature Complete With Tests. All features have been implemented, and automated tests are written and checked in. (The tests may or may not pass!)
  • Feature Complete With Passing Tests. All features have been implemented and automated tests both exist and are running successfully. No system-level testing (aka QA) has been attempted.
  • Demo-able. All features have been implemented, automated tests are passing, and system-level tests are complete. There are bugs, but they can be worked around. The product can be used in demonstration successfully.
  • Release-able. All features have been implemented, automated tests are passing, and system-level tests are complete. Major and many minor bugs have been fixed; the few remaining bugs are low-priority. The product can be shipped to a customer with a high probability of success.
  • Perfect. All features have been implemented, automated tests are passing, system-level tests are complete, and all known bugs have been fixed. In practice, this one doesn't happen.
Now, it's not a big deal to have different definitions of done. The big deal is to make sure that everyone agrees on that definition.

Monday, May 19, 2008


I'm on vacation this week, so my blog updates will be spotty at best. Today I just wanted to drop a quick link on the subject of forking...

Thanks go to Coding Horror for this one. It's definitely not the same as my take - the post is more about why things fork than the eventual consequences for the community(ies) as a whole - but the chart I think illustrates some of what I've ben describing.

Thursday, May 15, 2008

Preparing for Being Gone

There's an old saying* that if you're indispensable you're doing something wrong. I absolutely agree with this, but would like to add one small follow-up: If you can walk away with no preparation, you're doing the opposite thing wrong.

In short, as an employee (or manager or whatever), you should be valuable but replaceable. 

If you're too irreplaceable, then you're really not training, mentoring, and documenting sufficiently. This is a great way to make a treadmill that it's really hard to get off; do you really want to be the guy who can't go on vacation because things will fall apart? If you're that guy, you also can't get promoted - that's what they call a double-edged sword! You should be helping others learn to do the things you can do, whether that's your peers, your subordinates, or even your bosses. Documentation also goes a long way here. It may not be as good, but it'll be good enough.

If you can walk away with no prep and no consequences, that's a sign that you're either not very busy or that you're spending as much time documenting as doing. If you're not very busy, well, let's hope there aren't layoffs coming! If you're spending as much time documenting as doing, then your work pattern may not allow for experimentation. Be mindful that sometimes you'll head down a wrong path before you head down a correct path. In those cases, overdocumentation every step of the way doesn't help; it only slows you down. 

Ideally, walking away should take some preparation but be something that can be done cleanly. Usually, before you're gone, you'll have to do a few things:
  • Pick an alternate contact. In the "wow this can't wait" moments, who else can someone work with? If you don't pick someone, they will - and it might be you!
  • Get everything to a good stopping spot. Your projects may not be done, but they at least should be in a stable state.
  • Document next steps. This isn't the same as documenting everything, but at least let someone know about the current state of your higher-profile projects. That way at least a status update can be accomplished.
  • Don't tell people you'll be available. If you have to be, then you have to be, but this should be a bonus, not something that people ought to rely on.

* Translation: I've heard this from numerous places and I'm too lazy to actually go look up the original source.

Wednesday, May 14, 2008

Easy Forks, Hard Forks

I've started playing with GitHub. Basically, it's a repository for projects using the Git source code management system.  Think SourceForge for a good analogy. First of all, this place is very cool. There's a ton of Ruby stuff on here, and some really neat plugins. You can find everything from a plugin for changing URLs for RESTful routes to a little utility that gives you team information for a Japanese basketball team (I did not make that up!).

But... (there's always a "but", isn't there?) GitHub has one thing that just makes me cringe. When you're looking at a project, there's a big button right at the top that says "Fork". Click it, and you get a whole new project based on the existing project.

This just kills me. In my opinion*, this is one of the hardest things about Ruby and about Rails. There are about 42 ways to do something, and about 15 plugins and gems and code samples to accomplish it. It makes it really hard to build a community when they're all a little bit different.

To take a simple example, a lot of people want to run Selenium tests for Rails projects. Selenium gives you that last UI layer that Test::Unit and RSpec don't. When I last went looking for ways to integrate Selenium with my Rails project, I found a lot of options:
  • Just use it totally separately, without any integration at all.
  • Selenium-on-Rails plugin, from the OpenQA repository
  • selenium gem (gem install selenium from the default repos)
  • selenium-fu
  • polonium (newer, renamed, eventually hopefully better selenium-fu)
  • downloadable Ruby client driver from the OpenQA Selenium RC page.
This, for the record, is just from the first two pages of search results. So I picked one (I happened to use the Selenium gem). Once you get up and running, you'll invariably have questions - there are definite quirks in this setup.

Here's where things go downhill.

The first place I go when I'm having a problem and can't noodle through it myself is to a search engine. Mailing lists, blogs, etc. may give me a clue to point me in the right direction. The trouble is, I get results from people using all of the different Selenium-Rails integration tools above. So now I not only have to figure out if they're addressing a similar issue, I also have to figure out if they're using a similar tool. In bad cases this can waste hours and hours.

Forking may be necessary sometimes, but making forking so easy is really something that I disagree with. Choosing to fork a project is effectively saying, "I think this version is incompatible with my goals and so I'm going to make something similar that works in a different way." Once or twice and it's no big deal. If this happens a lot, you wind up with a lot of different tools that look a lot alike but that all behave a bit differently. And then your users get confused and too frustrated to work effectively. Now you've got a real problem.

So to review: forking is okay, but forking too easily only harms the community as a whole.

* I started to say, "in my humble opinion", but I'm not particularly humble and no one who knows me would pretend otherwise!

Tuesday, May 13, 2008

Cruisecontrol rb Task Load Order

This one took me about 45 minutes to figure out last night. My tree looks like this (partial):

- app
- config
- db
- doc
- lib
   - tasks
      - cruisetask.rake
- log
- public
- script
- test
- tmp
- vendor
   - plugins
      - myplugin
        - cruisetask.rake

The rake task in lib will trump the rake task in vendor. The rake task in vendor will trump the default behavior in cruisecontrol rb. They don't combine in any way, so if you're using a project-specific rake task you have to put it all there.

And in the end I got it working!

Monday, May 12, 2008

Night Is 20 Hours Long

We have a problem. We have too many tests. Or to be far more precise, our nightly test runs take on average 20 hours to run. That doesn't leave us much time to react to what nightly has found!

The Current Situation
The nightly tests start at 7:05pm, and run until approximately 2 or 3pm the following day. At that point, the QA person of the week generates a report of all the failures, goes through each failure, logs (or updates) bugs, and then publishes this report.

Some of the Reasons
There are a lot of reasons nightly takes this long:
  • Many Long Tests. Some of the tests simply take 8 or more hours to run. As we find these we move them to a weekly run, but I don't think we've found them all.
  • Lack of Machines. Because developers and QA (you know, actual humans!) use the lab, too, we can't allow the nightly run to take over every single machine. So we can only run the nightly tests so much in parallel; some of it has to be serial.
  • Inefficiencies in Usage. This is actually more rare than you would think, but sometimes tests hang on to machines when they shouldn't, and so other tests can't run.
  • Hangs. Sometimes tests hang. They simply fail to return for hours until some human notices.
So, what do we do to make nightly runs complete faster? We have a lot of options, so let's break things down. First and foremost, a complete break is not an option. Whatever we do, we have to keep running the nightly tests in the meantime; our development model is predicated on that level of regression testing.

Small Changes
Small changes can have big effects, and they're relatively safe and cheap. We'll do them first:
  • Move Long-Running Tests to Weekly. Seek out more tests that take time and get them out of the nightly run. Put them in a weekly run that doesn't happen as often.
  • Test Hangs Are High Priority Bugs. If it's a hang, then that's a bug. Sure, the system behavior may be legitimate, but the test is broken. We make these high priority bugs and we get them fixed quickly.
  • Buy More Machines. Sometimes throwing more resources at the problem really does work! Running tests in parallel can shorten your total run time, and with more machines, the humans and the nightly are happy.
Medium Changes
Some changes are a bit bigger, but still not overly drastic.
  • Autogenerate Triage. Why wait to the end to generate triage? Have test failures available to the triage person as the tests fail. That way the bug logging/updating, etc. can take place throughout the day, and the report gets much easier to create.
  • Use an Existing Build. The first thing the nightly test run does is trigger a build, which takes about 90 minutes. We have a continuous integration system; we should just use the last known good build.
  • Merge Tests. If there are multiple tests with the same setup and tear down, but that exercise slightly different teams, maybe they can be more efficiently run in a single test or single suite.
Large Changes
If all of the above changes don't get us to where we need to be, then we have to consider large changes. These are fairly high-cost (in time at least) and are higher risk.
  • Run Constantly. There's an entire blog entry in this one, after I've thought it through a bit more. The basic idea is that you run a "nightly" run all the time, and it works more like a continuous integration system. It just picks up a test and runs it against the current build. You have to put logic in there to make sure all the tests get run as often as possible, and it changes the concept of "nightly run reports", but it's a way of rolling with the test growth instead of fighting it.
  • Aggressive Test Killing. Set an upper limit on test runs. If they don't run in that time, then they get cut off and that's a bug. This puts a huge onus on the test developer, and of course the duration is something that will vary by test type, but it's would certainly be an effective way to prevent  a few long-running tests from destroying an entire night's run.

I don't know how far we'll have to go to get out of our current problem, but we're definitely working toward a solution.

How do you handle test duration creep, particularly in automated tests? What do you do to balance the desire for tests with the desire for fast results?

Friday, May 9, 2008

Lipstick on a Pig

When you get a system into test, it usually has quirks (don't they all?). This is perfectly normal, and so you go about logging bugs. And the bugs get fixed, and you move on. This is how software development is supposed to work.

Sometimes, however, the quirks don't go away. Fixing the quirks just exposes (or creates) more quirks. At this point, you keep logging bugs, and they keep getting fixed. At this point, you need to stop. Just put away the defect tracking system. If the system keeps getting tweaked, and quirks keep showing up, well, you may have a pig of a system on your hands.

And no matter how much you fix and polish and document and work around the quirks, in the end, you've still got....

This is the point you need to go back to development, and maybe even all the way back to the customer. Because now is when QA is calling for a fundamental rethink of the problem. Somewhere along the line, our design, or our UI, or our understanding of the problem just failed. And all the bugfixes in the world can't fix that.

Remember, you're testing not just the actual implementation, but the design and the problem definition that underly that implementation. Don't just keep applying makeup; call the pig a pig and let's get something a little better in there.

Thursday, May 8, 2008

Failing Nicely

From Larry Osterman, a notion that application resilience is sometimes not good. This is all well and good, but I there's a distinction I wish he drew. And that is this:

Not catching is different from not showing.

The point of the post is that you don't want to catch (and hide) your failure because you're likely to fail again later and debugging is a LOT harder. This is true.


You should still avoid showing the end user an exception; those tend to cause panic. Instead, provide a good error message to the user and fail appropriately.

Wednesday, May 7, 2008

First Fix What You've Got

I have a routine when I'm working with tests, particularly with automated tests.

Step 1: Update the source code.
Step 2: Run the current tests.
Step 3: Write new tests.

This seems really slow at first. If I haven't touched a code base in a while and the developers haven't been maintaining the tests*, I often wind up spending an hour or so just getting the tests and the code back to where they were.

A shaky foundation is no place to start testing. So first stabilize, then extend.

* Bad developers! You break the test, you should fix it.

Tuesday, May 6, 2008

Measurement and Effectiveness

There are a lot of ways to measure how productive the people who work for you are. In the end, any system can be gamed, and whatever system you come up with will eventually be manipulated to make someone look good. The trick to measuring productivity is to make your measurements actually reflect reality.

The Good
Some ways to measure this are really good. Typically these are a complete pain to gather, but if you're really interested in measuring this on a quantifiable rather than a gut level, you've got to start digging hard.
  • On Time Ratio. Measure how on time the person is with tasks and projects. The idea is to over time average how far off the person is. For example, someone who completes a 10-day project in 9 days has an on-time ratio of -10% for that project. Someone who does the same in 11 days has a +10% on-time ratio. The idea is that over time, for productive and effective employees this should approach 0. That is, the person should, over a number of projects, prove to be accurate. Note that someone with a high negative number (so someone who is generally finishing early) is not necessarily effective; they're just padding estimates.
  • Management Ratio. I run a very self-directed team; my title is Lead, not Manager and there's a reason for that. I don't spend all of my time managing someone. Sure, things come up, but it shouldn't be a constant. The management ratio defines how much time you spend with each person performing management duties. This is a comparative matrix. If you're spending 0 time with a person, then something is wrong and you the lead probably aren't helping him be as effective as he could be. However, spending too much time with someone is a drain on the whole team. So, measure how much time you spend on management activities with each member of your team. If some team members are taking inordinate amounts of time, then there's a problem. The specific definition of "inordinate" will vary a little bit, but anything over twice your average is probably bad.
The good thing about these metrics is that gaming them is exactly what you want. You want your tester to try to game the system by making estimates really accurate. You want the person working for you to game the system by not requiring a lot of hands-on management.

The Bad
Bad measurements tend to be really easy to gather, and they look like they're telling people a lot. In reality, they are incredibly easy to game without actually helping the team.
  • Number of bugs found
  • Number of bugs verified
  • Number of test cases written
  • Code coverage
  • Run length of automated tests
All of these items fall short because improving or increasing them does not have an inherently positive effect on the employee, the team, or the system under test. Interestingly, these measurements tend to be automatically gathered and tend to masquerade under the fancy-sounding term "metrics".

The Ugly
Some metrics treat employees like real cogs in a wheel. I hope by now everyone has given up on these:
  • How many hours you work.
  • How many modules you have tested.
  • How much you talk in team meetings.
This is both dumb and short-sighted. Just because I can see someone doesn't make him effective.

Ultimately, a good measurement of an effective employee gets back to gaming the system. Accept that whatever system you use will be gamed. The tricky part, then is to set up the system such that gaming it produces desirable results.

What other measurements or incentives do you use?

Monday, May 5, 2008

Strong Lead-In

I got a resume from a recruiter, and there was a small pitch for the candidate that included some basic information.

"I am sending the resume of . He is currently unemployed and was making in his last job as a QA engineer. I thought he could be a good fit for this particular position cause he is a highly motivated software test engineer ...."

I haven't even opened the resume yet and already I'm thinking this is a really weak candidate. The moral of the story:

Be mindful of how others are representing you.

I'll read the resume anyway, but this candidate is starting off a leg down through no fault of his own.

Sunday, May 4, 2008

Man Does Not Test By Tools Alone

I would add that tools are important, and automation gives you good safety net, but automation provides future value. Thinking provides present AND future value.

Thursday, May 1, 2008

Development Cycle for QA

Any good software development lifecycle (SDLC) is about rhythms. It's about creating a predictable, repeatable* cycle. There have been a lot of articles written about the development lifecycle, but they're generally either software-oriented - what happens to the code over time? - or developer oriented - what does dev do now.

So, what does the development cycle look like for QA?

An Overview
Let's start with the system as a whole. I've gone with a basically agile lifecycle here just because it's most relevant to my current thinking.

So What's QA Up To?
On the surface, it sure looks like QA has a lot of free time! There's just one little segment named "testing". I think we all know by now that's not true. So let's break it down. Here's what QA is doing during each phase:

Initial Planning
This is a time for QA's planning as well. This is where you perform a lot of your projects that have a lead time. Some sample tasks:
  • Procure hardware and software.
  • Get a good understanding of your user; interview your clients, etc.
  • Gather your team and get to know each other. This may be new hires, contractors, a group of friends working on this project with you, or an existing team.
  • Figure out the kinds of testing that are going to be important to your product, your customer, and your company. Is security testing important? How about load testing? Does disaster recovery testing matter, or not yet?
This phase is about understanding the details of what the engineering team (including QA) will do. Keep in mind that in QA you have to be prepared to test early, so some of the requirements work translates into implementation work for you. Some sample tasks here:
  • Help define acceptance criteria. These are requirements just as much as the GUI screenshot or the command line option.
  • Define your toolset. Have a log collecter that needs some work? Now's the time to think about the things it needs to do. Starting implementation is a good thing here.
  • Create your test run infrastructure. If you don't already have it, make sure you complete setup (and implementation if necessary) of your continuous integration system, of your test run infrastructure, of your defect tracking system, etc. You need to go into implementation so that development can start using this immediately.
  • Define your system tests. As the system comes together, you'll need to test it as a whole. This is where you create the test plan that defines that.
This phase is where QA and development are really tied closely together. The goal of this phase is to have minimal time between doing something and seeing the test results; speed is important here. Patience is also important - things can get pretty broken in this phase, and that's okay as long as it doesn't stay that way. Some example tasks:
  • Accept features. As features are implemented, prove they work or identify the ways in which they don't work.
  • Verify bugs. This is an analog to accepting features.
  • Monitor automated tests. As automated tests run, QA should be always cognizant of the results. Are they failing? Are they passing but taking longer than they should?
  • Continue to improve the test infrastructure. You'll still be wanting to tweak and improve things here.
This phase is about working with development to ensure that the acceptance criteria are complete and adequate for the system as a whole. Example tasks include:
  • Running system tests. These are the tests that didn't pass during implementation because things were just a bit unstable. Now that things are settling down and most of the new features are in, it's time to exercise the system working together.
  • Stress new features. Now that the new features work, find their limits. Break them, use them with other features, stress them. It's all basically functional at this point, so find the limits.
This phase is about assessing how you did, both as a QA team and as part of the overall engineering team.
  • Figure out what you did right. Decide what worked really well. Did you have a great tool? Was the test order just awesome at finding bugs early?
  • Figure out what you need to do better. Decide what just didn't work. Were you too slow accepting stories? Did you spend a lot of time "clarifying" (or defining) requirements during implementation?
  • Figure out what you didn't do. Do you need another tool? Is there some mind-numbing manual process that you ought to automate? Did you spend more time keeping your massive stress test cluster running than the value you got out of it?
Full disclosure: I depart from classic agile methodologies here. Theoretically, deployment should be just that - sending it to production. I insert a step here that I call "release testing". This is because I find that it's a bit overoptimistic to say that the general agile lifecycle gets code stable early enough for QA to prove it is stable and functional prior to deployment. It simply takes longer than your average iteration to thoroughly exercise a system. This doesn't mean you break process; it just means that QA spends some time working a release behind making sure things are as stable as they should be. Example tasks here:
  • Long running tests. Run your tests that require you to have a stable build up for a week or more. You didn't have any nodes
  • Manual tests. Run your usability tests and manual tests now. This is a good time to catch how the system feels overall as it works together.
  • Customer-specific tests. Run tests against any applications or any customer scenarios that your automated infrastructure can't handle.

And THEN you deploy.

* I started to say repetitive cycle instead, but repeatable sounds much less dreary!