Tuesday, December 29, 2009

Words of Wisdom

Someone at work said this the other day:

"Can't fail" is not the same as "impossible".

Both indicate that something is important. But "can't fail" is an exhortation to the troops. A bit hyperbolic, maybe, but it's usually meant to excite and inspire. Note that it's also a bit of a cliche and often prompts the question, "yeah? what if we do fail?".

"Impossible" is demotivating. It just says that there's no way you think you can do this because everyone is just not that good in this scenario.

When you're attempting to motivate a team, neither of these is great. If you have to use one, though, say, "can't fail." Don't say "impossible".

Monday, December 28, 2009

Blast from the Past: Bugs in Iterations

A bit over two years ago, I wrote about handling bugs when you have an iteration or other timeboxed development process. I wanted to bring it back up again because I think it's still valid. I've reproduced the original article here:

One of the things about working in a de facto SCRUM environment is how you handle defects.

Basically, at the start of an iteration, you have a force-ranked list of what you're going to work on. The team walks down the list, commits to some portion of it, and the iteration starts. The list of tasks can be features, bugs, overhead work (install computers, etc).

Now, let's add a little twist (just a little one; this kind of thing happens every day):

Someone found a bug.

Okay, so that feature that you thought you had nailed had a bug in it. Now what? There are a lot of ways to handle this. You could:

Put the bug in the product backlog and handle it just like any other task.
  • Pros: Doesn't break the process!
  • Cons: If you have an urgent bug, you're basically stuck until at least the end of that iteration.
  • Net: This is great for non-urgent items. But for emergencies it's really not feasible. If you're really seriously considering this you've either got extremely patient clients or you're being overly optimistic.
Add the bug to the iteration - at the top of the queue.
  • Pros: Bugs get fixed.
  • Cons: All those tasks you committed to? Those aren't going to happen.
  • Net: This is probably swinging too far in favor of bug fixing.* It also will have you doing things your customers want less than all those other backlog items they've asked for.
Allot some amount of time for bug fixing as a task in every iteration.
  • Pros: Allows for bugs to happen, either previously existing or new, without destroying the iteration.
  • Cons: If there are no bugs and you have a lazy team, then you get people idle. Also, the amount of the iteration you need to allot is uncertain until you've done this for a while and learn what your needs really are.
  • Net: No bugs; yeah, right.
So, my preferred method is to allot some amount of time for bug fixing as a task in every iteration.

What have you seen tried? Do you have an answer for this dilemma?

* Disclaimer: Yes, we QA types do get to notice when something goes too far toward bug fixing. It's great when bugs get fixed, but sometimes that's not the best thing to do.

Now, looking back, we're still facing the same problem. We find bugs, some of them more urgent than others, but they all need scheduling some time. I still haven't come up with much better than to make sure you sign up for some bug fixing time as well as new backlog time. How much time you spend will depend on how many bugs you find, but that's a topic for another day.

Wednesday, December 23, 2009

If You Only Ask One Thing...

When you have a specification to review, it's your chance to find major holes early on. The problem is that specifications are often (although not always) feature-oriented. They'll include screenshots of individual screens, or they'll have a set of API calls and a brief summary of the purpose of those calls. This is all great information. But in the end, no one uses an API call or a screen. They use a system. They use a system to accomplish tasks.

So whenever you look at a specification, take the tasks the customer will do and ask yourself one thing:

The user sits down in front of the system. Walk through what they do to accomplish task X.

This will expose a lot of holes. For example:
  • An upload API with no way in the GUI to actually review what is uploaded
  • A cleanup process that leave the system unusable for hours on end
  • A background process with no ability to start or stop it
  • And a lot of other things...
What a system does is important, and a specification can describe it. But in order to know if the system hangs together as a whole, it's important to consider the tasks and the workflows of the system consumer.

Tuesday, December 22, 2009

Why You're Testing That

For new features, we create a user story (or multiple user stories), and we put the information we need into those stories. We describe the purpose of the change, the change itself, and the acceptance criteria for that change. That means we wind up putting the tests for a feature in the story itself.

This is a (slightly sanitized and simplified) story we've been working on recently:

Story: Cache state information
Motivation: Increase system management response times by not having to retrieve generally static information every time
Details: (proprietary bits here, but basically, cache things like IP address of all nodes, etc)
Acceptance Criteria:
  • Add/remove/kill a node and ensure that the cache is updated
  • Restart the system and ensure that the correct information is returned after restart
  • Remove a node while the system is down and ensure that after restart the system is gone
  • (Other state change type tests here)
  • Ensure that the system responds in under 3 seconds 98% of the time under our defined Load Test
All in all, it's not a bad little story, and not bad acceptance criteria. But here's the important question: in 6 months, when someone comes back and says, "yeah, I didn't know what that test was getting at", will we be able to answer that question? What we have here is another case of the six month test.

It's useful to state what we're going to do - in this case, what tests we will run - but it's not sufficient. We also have to provide some insight into why. We put some thinking into developing these tests, and into identifying what the relevant things to test were. So let's write it down. In our story above, we're trying to show that introducing caching: (1) really does speed things up; and (2) still provides accurate information. So let's write down our test goal in the story, along with the tests that demonstrate that we've achieved that goal.

In six months, we'll thank ourselves.

Monday, December 21, 2009

Test Lab Overhead

Le's say we have a test lab, basically a pool of machines available to anyone who wants to develop, run tests, etc. The lab consists of several parts:
  • a pool of machines for running tests
  • various resource machines for specific branches or versions (e.g., a build machine for each version)
  • various common utilities, including a central file server, DNS, a machine reservation system, etc.
In order to successfully use the lab, we've instituted some utilities and checks. For example, every time a machine is released, we run a check script on it to confirm that it has the right packages, DNS, mounts, etc. These kinds of utilities run out of a special mount that the entire lab has access to.

When we have one branch or a few very similar branches in active development, the lab works fairly well. There's really only one correct state for a test machine, and possibly a (very) few variations on that state. Updating DNS is a matter of changing one script on one or a few branches and updating that utilities mount. Simple, and works great.

Things get more complicated when you have multiple products or multiple branches that vary widely. All of a sudden your fairly simple check script has a lot of if clauses for different package versions on different branches. Changes (a new DNS server, for example) have to be propagated to a lot of different branches. It all starts to get a bit unwieldy. Is it still possible to work effectively? Sure. It's just more work.

The other obvious option is to separate the test infrastructure code entirely in source control. Put it in a separate project with separate branching, etc. You still have to maintain test code across branches at the interface points (interfaces will eventually change), but the problem is smaller because it's isolated into this one branch. Tests themselves would of course stay with the code they're testing. The downside to this option is that your dependencies are increased between projects and that introduces a different kind of management overhead.

You can choose either option - test infrastructure in the main source branch, or test infrastructure separate - but whichever one you choose, there are a few things you should always make sure of, just to keep your odds of success up:
  • Don't support a stagnant branch. If you're going to have to support a branch in the lab, then make sure it gets used periodically (built and a smoke test run). This ensures you don't go too long without checking for breakage and handling it.
  • Notifications must always work. Even if you have to compromise on features or make something harder to run, keep your notifications working. When you have things like build failure notifications on head, the absence of those on a branch can give you a false sense of security and let breakages persist silently for longer.
  • Retirement is an option. Maintaining branches forever is a huge proposition. At some point you have to draw a line and say that you're going to stop supporting a branch. Generally this will coincide with end-of-lifing or end-of-supporting that branch in the field.
Maintaining a workable test lab is part of the overhead of a project, and a key part of the success of a project. Doing it with as little fuss as possible is something that takes active thought and maintenance - make sure you're giving it the attention it deserves.

Friday, December 18, 2009

Cold Systems

It's unusually cold here in Boston. Yes, it's winter in the northeast US, but still, even for that it's cold! I was standing waiting for my bus this morning - shifting from side to side - and trying to think about something other than, "I should have worn thicker socks". And it occurred to me that testing a cold system is something that I don't do enough of.

A cold system is a system that's not really doing anything. In a storage system, for example, there are no reads, writes, management modifications, etc. occurring. If a busy system is hot with activity, then a cold system is one in which very little is going on. (I know I'm stretching the metaphor here, but I had to get from socks to something else, and I went from socks to hibernation to sluggish to systems that aren't doing much - it's really interesting inside my head sometimes!)

So when we test a cold system, what are we looking for?
  • Background processes. Many systems have background cleanup procedures, indexing policies, and other things that occur at system idle. When the system is cold, it's much easier to see these run and look for more subtle patterns that might get lost in system activity. Look for slow memory leaks, timing patterns, background processes "taking over" when they're not drowned out by other things, overruns and other loop-type problems, etc.
  • Startup issues. When you first start using a system, there might be lags or errors that you can't see very well when many things start at once. Now's your chance to see them in a state much closer to isolation.
  • Spindown problems. Without activity to sustain them, certain things might spin down too far. For example, your disks might autonegotiate themselves slower (green drives do this) if they don't meet activity thresholds, and then the few requests you are getting might have slower response times - oops!

Testing a cold system is about looking for patterns that get lost when a system is busy. It's about finding the signal that gets lost in the noise. So slow down, cool your system off, and see what new things you can see.

Oh, and hopefully we'll get some snow soon to make the cold worth it!

Thursday, December 17, 2009

We're Here Now

You're in the middle of a problem. You found a bug in the field and you need to patch a customer. Unfortunately, you've already started working on the next release, which happens to be off the same branch. And now you have a patch and no real good place too cut a build from.

There are two things to think about here:
  • What should you have done to prevent the problem?
  • What do we do now?
Hindsight being what it is, it's relatively easy (and tempting) to sit around and discuss what you should have done. Yes, you should have branched for the next release so you had a stable branch reflecting what had gone into the field. And you should have found the bug and fixed it before it ever made it out. And ... and and. Stop.

All of these are useful conversations, but that doesn't change your current situation. While it is important to do a postmortem and figure out how you're going to prevent the problems you're having, the fact remains that you're currently in this situation.

First figure out how you're going to get out of your current situation, then figure out how to prevent it.

You have a customer waiting on a fix, and right now the most important thing to recognize is that getting out of this current problem comes first, closely followed by prevention. I'm not going to minimize the need to prevent problem recurrence, but don't forget that you have to solve a problem before you can prevent it from recurring. There is an appropriate order; use it.

Wednesday, December 16, 2009

Feature Parity Is Rarely a Goal

This is a fun requirement that we sometimes see:

"Must have feature parity with X"

Depending on what you're working on, X might be the previous version of the product (if you're doing a rewrite), or it might be a competitive product, or it might be some analog that should be followed (e.g., "must follow Apple's style guide", which is just a variant on this theme). In any case, you've been presented with the easiest test oracle ever. Just go find X, and whatever it does, make sure your product does, too!

This may be the complete correct thing to do. But before you go blindly accepting the oracle, don't forget that testing your requirements is also a useful exercise. In this case you have a requirement that seems fairly straightforward, but let's dig a little deeper. Is "must do everything X" does really interesting or useful to the consumer of the product? Maybe.

If the consumer is... a marketing type who needs to put up a list of checkboxes and check as many as the competitor does, then yes, having all the features (in some way, anyway) will give him that. Here you probably want to engage in a discussion of how much of the feature is required to meet the "checkbox requirements" and proceed based on that.

If the consumer is... someone who currently uses the product and wants to maintain functionality, then complete feature parity probably isn't actually required. What you want here is the ability to do the same things that the user was doing before. Features the user didn't actually use in the old product probably don't need to be carried forward. Rather than trying to use the entirety of product X as your oracle, concentrate on the features that are actually used.

If the consumer is... a certifying or regulatory entity and product X is a reference implementation, then you really do need to match product X step for step.

The lesson of the day is that the requirement of feature parity may sometimes mean feature parity, but sometimes it may mean something else. Success is in finding the intention of the requirement and meeting that; in the end, this is what will keep your customers happy.

Monday, December 14, 2009

On Thick Skin

We have several teams here at work, and every week those teams have a team meeting. The topics range from escalations to design discussions to estimations, etc. Some of the team meetings are kind of casual about being a team; they just happen to work fairly symbiotically and meetings are often superfluous (and skipped). Other teams have their team meetings with huge regularity and it's a perfectly calm experience. One team, though, that's a real doozy. You can hear their meetings from down the hall. Yes, they really do yell at each other. And here's the thing.... when it's done, they're all still friends and they've solved their technical problem.

I've written before about being nice. It's important to treat others with respect and generosity, and to provide both positive and negative feedback in a manner that allows them to hear it without getting defensive. But...

You have to have thick skin.

For better or for worse, people aren't always going to be nice. They're going to say something harsh, or they're going to get frustrated and lash out. Witness the yelling team, who yells because they care so deeply about their product. And you need to handle it.

So what does thick skin mean?
  • React to the problem, not the message. Someone who is frustrated probably isn't going to be expressing themselves clearly, but that doesn't make the underlying problem any less real. Your job is to find the underlying problem.
  • Don't take it personally. It's unlikely that the person is lashing out at you personally. It's more likely to be frustration with a situation or a problem, and you're just the unlucky winner.
That's really it. Yes, it would be nice if everyone were nice all the time, but it's not going to happen. So make sure your skin is thick for those times when nice ain't happening.

Friday, December 11, 2009

Autistic Testers?

I recently saw an article about a company in Chicago training people with autism to be testers. There are some real doozies in there, like this:
Aspiritech — whose board includes Brix, now retired from Wrigley, and the actor Ed Asner, whose son Charles is autistic — claims those who are autistic have a talent for spotting imperfections, and thrive on predictable, monotonous work.
"The stuff we do is boring for [others], like going through a program looking at every detail, testing the same function over and over again in different situations, but it doesn't disturb those of us with autism," says Thomas Jacobsen, an autistic employee at Specialisterne. "That's our strength."

Now I don't want to knock anyone who is autistic, or anyone who isn't. But oh boy was this an article full of depressing assumptions, mostly that testing is repetitive and monotonous. (Sure, parts of it are, just like parts of any job are sort of repetitive and monotonous.) In the end, though, I mostly walked away surprised that a media outlet would publish something like this, seemingly designed to insult both autistic people and testers.

The easy answer is to laugh at how misguided this company is. The better answer is to figure out that we have to show the difficult, innovative side of testing. And that some great testers may be autistic, while others may not be. I say, look for someone who is a good tester, and figure it out from there.

Thursday, December 10, 2009

Is That Good Or Bad?

Let's say that I have a program and a set of tests I've written. Go me! Now the time has come to change that program. So I go in there and I muck about, and then I run my tests.

All my tests pass. Is this good or bad?

Good. I was refactoring, and didn't intentionally make functional changes. My tests tell me that I didn't (at least, in the areas I'm testing).

Bad. I was refactoring, and I did break something. My tests just aren't good enough. Oh, and I'm blind to it until I find a problem later in the process (in QA or in production).

Good. I added a new feature and didn't have to touch any existing code. And apparently I really didn't.

Bad. I added a new feature that changed the behavior of an existing feature. And my tests didn't catch it. Looks like my tests aren't quite thorough enough.

Some of my tests fail. Is this good or bad?

Good. I was refactoring, screwed something up, and my (now regression) tests caught it. That's what they're there for.

Bad. I was refactoring, and I didn't mess anything up. My tests are too closely tied to the internals of my implementation.

Good. I added a new behavior that changed the behavior of my existing features. Tests affecting those features should fail.

Bad. I added a new feature that didn't change existing features but did change some of my test's assumptions (reference data in the database, number of fields in a form, etc). Looks like my tests are a bit brittle.

All of my tests fail. Is this good or bad?

Good. I was refactoring, and broke something that's really at the core of my app. My (now regression) tests caught it. That's what they're there for.

Bad. I was refactoring, and I broke something, but that's a whole lot of tests to wade through for a problem that just happens to be in step 1 of every tests (whoops! login!). My tests are too repetitive; I should refactor and introduce dependent tests so I don't just repeat the same failures.

Good. I added a feature that broke pretty much every other feature. My tests caught it.

Bad. I added a feature and my test infrastructure can't handle it. Need some more robustness there.

As with most things, a passing test or a failing test isn't inherently good or bad, and the same thing applies to suites. When tests pass, you don't know your code is right; you only know that it hasn't broken in a way your tests are looking for. When tests fail, you know something is wrong, but it might be your code or your test, and you'll have to dig to find out. Either way, take a red bar or a green bar for what it is - a piece of information that will guide your next efforts.

Wednesday, December 9, 2009

Want to Participate

Rahul Khatal, this one's for you. (For those of you who aren't Rahul, see his question here). Rahul brings up two points about meetings:
  • Sometimes people go off the agenda
  • Sometimes people come to a meeting but don't actively participate (talk, basically)
There are many reasons people might go off a meeting agenda. Maybe they have something else they really want to talk about. Maybe they're not prepared for the agenda items and are trying to hide it. Maybe there's another meeting that needs to be happening and your meeting is getting co-opted. In any case, I generally find it best to "parking lot" non-agenda discussions by writing them down and moving on. Then the onus is on the meeting organizer to make sure that those parking lot items happen later on. Based on the parking lot, the agenda may be changed.

And then you have something that's kind of the reverse problem: people who show up but don't speak up. First off, ask yourself if you care. You might not mind non-speaking participants if, for example, the meeting is simply a demo to show off what you've done. However, if this is, for example, a design discussion, you probably want feedback from almost everyone, including the shy guy in the corner. The simplest way to do this is to go around the room and ask for feedback at the points in which you need feedback. Start with the shy people, and then proceed to the more outgoing of them. Ultimately, you can't force feedback; some people will respond that they have nothing to say. But if you want feedback from everyone, ask for it, individually.

We spend a lot of time worrying about how to elicit what we consider to be universal good behavior: meetings in which everyone participates, meetings that stick to the agenda. In general, you're dealing with people, which means you don't get to make all the choices, and forcing isn't always an option. So recognize that and make simple changes that everyone in the meeting can and will handle. It'll take you a long way toward your goal.

Tuesday, December 8, 2009

Stress, Don't Break

"We just got a new build, let's break it!"


You have a piece of software to test. Destruction is not the point. The point is to anticipate and subject the system to things that might happen, even if "might happen" is an extreme circumstance. If the system happens to break under those circumstances, then it's great you found it, so you can react appropriately. If it doesn't, then you know something about the system anyway. And that's good, too. What you're really doing in this case is attempting to stress the system. You're looking for the edges and the holes.

Every system has rules. Some are explicit (e.g., don't install more than 5 of these on a single server) and others are implicit (e.g., it's never going to send packets across the network at faster than wire speed). These rules should help guide your exploration of the system. You may use them to do boundary value analysis and testing, or to design a load test, or to decide at what points you want to measure latency, or whether fuzz testing is going to be worth the time it takes. In some cases, it's possible to break the rules (e.g., you might be able to install 6 of these on a single server), and that's probably a legitimate bug. It shouldn't usually be the starting point of a test; generally it's the end point.

However, if you have to break the rules to break the system, then your goal has changed from "gather information about a system in order to help make intelligent decisions about it" to "KILL!". This misalignment of goals negates some of your testing. You're likely to go far down paths that don't provide a lot of information, just because it's a way to break the system. Here's a hint: you may be acting like a bit of a jerk about it, too.

So don't be a jerk. Exercise a system, yes, but keep in mind that your goal is information and stress, not breakage. System breakage or lack thereof is a side effect of information, not the other way around.

Monday, December 7, 2009

Beyond Agenda

By now it's pretty common to go to require that meetings have agendas (those links are the top three on google at the moment, but it goes on for ages). That's not enough, though.

Meetings need a purpose.

Meetings with agendas can still be a waste of time.

So how do you know if you have a meeting with a purpose, or just an agenda? There are a number of warning signs:

If you can't say what the meeting is for in under 2 sentences, it's probably purposeless.
  • Good: "The meeting is for triaging potential release blockers." This is good because you're accomplishing a necessary business and software release task.
  • Bad: "We get all the departments together for status." This is bad because "status" is not useful to anyone by itself. Yes, communication is good, but only when it's accomplishing something.
If a meeting has the same agenda for months, it may be purposeless.
This one isn't universally true (see daily standups, for example), but a meeting with an unchanging agenda can indicate a following of form over function. Good meetings dynamically handle changing circumstances and needs, and you don't get that if you're stuck in an agenda. It may also be a sign that the meeting organizer and/or meeting attendees aren't paying attention and is just going through the motions.

When you're planning how to spend your time, and considering whether a meeting is useful, ask yourself whether it has a purpose, or just an agenda. No purpose, no reason to go.

Friday, December 4, 2009

Signs It's Not Ready

You know code is still in a bit of a raw state when a developer nearby working on some recently checked-in code says, "Don't they run this at all!?"

That code, by the way, is probably not ready for you yet!

Thursday, December 3, 2009

Context-less Factoids

We tend to throw around facts and figures to describe the work we do: "We found 11 bugs last week." "That release was 9 months long." "We have 3000 automated tests."

One of the dangerous things about facts is that they have different meanings for different people.

For example, let's say we put 10 man years of work into a product. If I'm a two-man shop, that's a lot of work! If I'm IBM and my project team is 20 people, that's a pretty short project.

So be careful when you're bragging. Your mountain may look like a molehill to the person you're bragging to.

Wednesday, December 2, 2009

It's Random

How many of us have logged or seen a bug like this?

"System does X randomly."

Oh boy. These are fun for a lot of reasons, but today let's just talk about that one word "randomly". What on earth does something happens "randomly"? Simple:

"Random" just means you haven't found the pattern yet.

Almost everything has a pattern. It has one or more sets of circumstances under which it occurs. Sometimes those are very subtle. Maybe, for example, random means "only when the memory returns a single bit error on a specific process that is attempting a write to a bad sector on the disk." That's really rare and very subtle, but it's a pattern.

You may not need to know the full pattern to fix a bug. Often it's helpful, and usually having part or all of the pattern will help know that a bug is fixed. However, sometimes you're not going to know all the circumstances. You may get part of them, as in "it fails 60% of the time when X", and that's okay. All you need is enough of the pattern to create and confirm a fix.

Don't fear random. It's not good to have a lot of randomness; it indicates a lack of understanding. However, at some point, randomness is still okay; when you have enough to fix and verify, then remaining randomness is acceptable. In other words, define it, don't fret.

Tuesday, December 1, 2009

Inside Out

When we as testers approach a system, we often attempt to do so through analysis and exercise. We break down inputs, do identify load characteristics, define interesting use cases, create performance paths for throughput testing, etc. These tests are all great, but they all share one major characteristic: they test from the outside in.

The net effect of testing outside in is that you work the components that are exposed to the user (UI, API, etc) fairly well, but you're not actually exercising the internal components much. The classic answer is that components are (or darn well should be!) unit tested. So we're fine. Except the ways in which we're not fine.

We still have a hole in our approach. We're exercising an internal component in isolation with our unit tests. And we're exercising externally visible components with our system tests. And we're hitting some aspects of the internal components with our system tests, but not particularly well. Think if it like a soccer game: if your team always has the ball, your forwards (your externally visible components) will get a lot of work, but your defenders aren't going to get a great workout. Sure, they're on the field, and sure they have roles and do things, but that's hardly exercising everything they should be able to do.

We need to exercise our defenders. Unit tests are great, but (to continue the analogy), they're more like drills than like an actual soccer game. We need to put it together. We need a system test of our internal components.

We need to design a test from the inside out.

Just like we did for our externally visible components, we need to take a hard look at our internal components and break them down. What kinds of inputs do they take? What can we do to those inputs (boundary value analysis, data type mismatching)? What about interactions between internal components, or with externally visible components? Then take that knowledge and design a system test to show it. You'll still get at it through external components, but you're manipulating the components in a way specifically designed to exercise the internal components in the ways you've identified.

For example, let's say your system has a UI and a database component. In this case, the UI is the externally visible component, and the database is an internal component. Certain external tests you do will validate some things about the database - like putting in a too long login name. However, the database is probably not stressed. So we design an internal test. We may notice when studying the internal component (our database) that it uses a stored procedure that locks a table when creating a user. Fine. So we're going to stress that and make sure it doesn't cause problems. So we design a load test that makes a lot of users, to see if that table lock causes any problems. The test will run through the UI using your preferred tool, but it's designed to exercise the internal component. It's a test we've designed from the inside out.

What exactly you wind up testing is going to be highly specific to your system and to the internal components of your system. The general rule is that you can analyze your system from the inside as well as from the outside in order to identify interesting system tests.