Tuesday, December 29, 2009

Words of Wisdom

Someone at work said this the other day:

"Can't fail" is not the same as "impossible".

Both indicate that something is important. But "can't fail" is an exhortation to the troops. A bit hyperbolic, maybe, but it's usually meant to excite and inspire. Note that it's also a bit of a cliche and often prompts the question, "yeah? what if we do fail?".

"Impossible" is demotivating. It just says that there's no way you think you can do this because everyone is just not that good in this scenario.

When you're attempting to motivate a team, neither of these is great. If you have to use one, though, say, "can't fail." Don't say "impossible".

Monday, December 28, 2009

Blast from the Past: Bugs in Iterations

A bit over two years ago, I wrote about handling bugs when you have an iteration or other timeboxed development process. I wanted to bring it back up again because I think it's still valid. I've reproduced the original article here:

One of the things about working in a de facto SCRUM environment is how you handle defects.

Basically, at the start of an iteration, you have a force-ranked list of what you're going to work on. The team walks down the list, commits to some portion of it, and the iteration starts. The list of tasks can be features, bugs, overhead work (install computers, etc).

Now, let's add a little twist (just a little one; this kind of thing happens every day):

Someone found a bug.


Okay, so that feature that you thought you had nailed had a bug in it. Now what? There are a lot of ways to handle this. You could:

Put the bug in the product backlog and handle it just like any other task.
  • Pros: Doesn't break the process!
  • Cons: If you have an urgent bug, you're basically stuck until at least the end of that iteration.
  • Net: This is great for non-urgent items. But for emergencies it's really not feasible. If you're really seriously considering this you've either got extremely patient clients or you're being overly optimistic.
Add the bug to the iteration - at the top of the queue.
  • Pros: Bugs get fixed.
  • Cons: All those tasks you committed to? Those aren't going to happen.
  • Net: This is probably swinging too far in favor of bug fixing.* It also will have you doing things your customers want less than all those other backlog items they've asked for.
Allot some amount of time for bug fixing as a task in every iteration.
  • Pros: Allows for bugs to happen, either previously existing or new, without destroying the iteration.
  • Cons: If there are no bugs and you have a lazy team, then you get people idle. Also, the amount of the iteration you need to allot is uncertain until you've done this for a while and learn what your needs really are.
  • Net: No bugs; yeah, right.
So, my preferred method is to allot some amount of time for bug fixing as a task in every iteration.

What have you seen tried? Do you have an answer for this dilemma?


* Disclaimer: Yes, we QA types do get to notice when something goes too far toward bug fixing. It's great when bugs get fixed, but sometimes that's not the best thing to do.


Now, looking back, we're still facing the same problem. We find bugs, some of them more urgent than others, but they all need scheduling some time. I still haven't come up with much better than to make sure you sign up for some bug fixing time as well as new backlog time. How much time you spend will depend on how many bugs you find, but that's a topic for another day.

Wednesday, December 23, 2009

If You Only Ask One Thing...

When you have a specification to review, it's your chance to find major holes early on. The problem is that specifications are often (although not always) feature-oriented. They'll include screenshots of individual screens, or they'll have a set of API calls and a brief summary of the purpose of those calls. This is all great information. But in the end, no one uses an API call or a screen. They use a system. They use a system to accomplish tasks.

So whenever you look at a specification, take the tasks the customer will do and ask yourself one thing:

The user sits down in front of the system. Walk through what they do to accomplish task X.

This will expose a lot of holes. For example:
  • An upload API with no way in the GUI to actually review what is uploaded
  • A cleanup process that leave the system unusable for hours on end
  • A background process with no ability to start or stop it
  • And a lot of other things...
What a system does is important, and a specification can describe it. But in order to know if the system hangs together as a whole, it's important to consider the tasks and the workflows of the system consumer.

Tuesday, December 22, 2009

Why You're Testing That

For new features, we create a user story (or multiple user stories), and we put the information we need into those stories. We describe the purpose of the change, the change itself, and the acceptance criteria for that change. That means we wind up putting the tests for a feature in the story itself.

This is a (slightly sanitized and simplified) story we've been working on recently:

Story: Cache state information
Motivation: Increase system management response times by not having to retrieve generally static information every time
Details: (proprietary bits here, but basically, cache things like IP address of all nodes, etc)
Acceptance Criteria:
  • Add/remove/kill a node and ensure that the cache is updated
  • Restart the system and ensure that the correct information is returned after restart
  • Remove a node while the system is down and ensure that after restart the system is gone
  • (Other state change type tests here)
  • Ensure that the system responds in under 3 seconds 98% of the time under our defined Load Test
All in all, it's not a bad little story, and not bad acceptance criteria. But here's the important question: in 6 months, when someone comes back and says, "yeah, I didn't know what that test was getting at", will we be able to answer that question? What we have here is another case of the six month test.

It's useful to state what we're going to do - in this case, what tests we will run - but it's not sufficient. We also have to provide some insight into why. We put some thinking into developing these tests, and into identifying what the relevant things to test were. So let's write it down. In our story above, we're trying to show that introducing caching: (1) really does speed things up; and (2) still provides accurate information. So let's write down our test goal in the story, along with the tests that demonstrate that we've achieved that goal.

In six months, we'll thank ourselves.

Monday, December 21, 2009

Test Lab Overhead

Le's say we have a test lab, basically a pool of machines available to anyone who wants to develop, run tests, etc. The lab consists of several parts:
  • a pool of machines for running tests
  • various resource machines for specific branches or versions (e.g., a build machine for each version)
  • various common utilities, including a central file server, DNS, a machine reservation system, etc.
In order to successfully use the lab, we've instituted some utilities and checks. For example, every time a machine is released, we run a check script on it to confirm that it has the right packages, DNS, mounts, etc. These kinds of utilities run out of a special mount that the entire lab has access to.

When we have one branch or a few very similar branches in active development, the lab works fairly well. There's really only one correct state for a test machine, and possibly a (very) few variations on that state. Updating DNS is a matter of changing one script on one or a few branches and updating that utilities mount. Simple, and works great.

Things get more complicated when you have multiple products or multiple branches that vary widely. All of a sudden your fairly simple check script has a lot of if clauses for different package versions on different branches. Changes (a new DNS server, for example) have to be propagated to a lot of different branches. It all starts to get a bit unwieldy. Is it still possible to work effectively? Sure. It's just more work.

The other obvious option is to separate the test infrastructure code entirely in source control. Put it in a separate project with separate branching, etc. You still have to maintain test code across branches at the interface points (interfaces will eventually change), but the problem is smaller because it's isolated into this one branch. Tests themselves would of course stay with the code they're testing. The downside to this option is that your dependencies are increased between projects and that introduces a different kind of management overhead.

You can choose either option - test infrastructure in the main source branch, or test infrastructure separate - but whichever one you choose, there are a few things you should always make sure of, just to keep your odds of success up:
  • Don't support a stagnant branch. If you're going to have to support a branch in the lab, then make sure it gets used periodically (built and a smoke test run). This ensures you don't go too long without checking for breakage and handling it.
  • Notifications must always work. Even if you have to compromise on features or make something harder to run, keep your notifications working. When you have things like build failure notifications on head, the absence of those on a branch can give you a false sense of security and let breakages persist silently for longer.
  • Retirement is an option. Maintaining branches forever is a huge proposition. At some point you have to draw a line and say that you're going to stop supporting a branch. Generally this will coincide with end-of-lifing or end-of-supporting that branch in the field.
Maintaining a workable test lab is part of the overhead of a project, and a key part of the success of a project. Doing it with as little fuss as possible is something that takes active thought and maintenance - make sure you're giving it the attention it deserves.

Friday, December 18, 2009

Cold Systems

It's unusually cold here in Boston. Yes, it's winter in the northeast US, but still, even for that it's cold! I was standing waiting for my bus this morning - shifting from side to side - and trying to think about something other than, "I should have worn thicker socks". And it occurred to me that testing a cold system is something that I don't do enough of.

A cold system is a system that's not really doing anything. In a storage system, for example, there are no reads, writes, management modifications, etc. occurring. If a busy system is hot with activity, then a cold system is one in which very little is going on. (I know I'm stretching the metaphor here, but I had to get from socks to something else, and I went from socks to hibernation to sluggish to systems that aren't doing much - it's really interesting inside my head sometimes!)

So when we test a cold system, what are we looking for?
  • Background processes. Many systems have background cleanup procedures, indexing policies, and other things that occur at system idle. When the system is cold, it's much easier to see these run and look for more subtle patterns that might get lost in system activity. Look for slow memory leaks, timing patterns, background processes "taking over" when they're not drowned out by other things, overruns and other loop-type problems, etc.
  • Startup issues. When you first start using a system, there might be lags or errors that you can't see very well when many things start at once. Now's your chance to see them in a state much closer to isolation.
  • Spindown problems. Without activity to sustain them, certain things might spin down too far. For example, your disks might autonegotiate themselves slower (green drives do this) if they don't meet activity thresholds, and then the few requests you are getting might have slower response times - oops!

Testing a cold system is about looking for patterns that get lost when a system is busy. It's about finding the signal that gets lost in the noise. So slow down, cool your system off, and see what new things you can see.

Oh, and hopefully we'll get some snow soon to make the cold worth it!

Thursday, December 17, 2009

We're Here Now

You're in the middle of a problem. You found a bug in the field and you need to patch a customer. Unfortunately, you've already started working on the next release, which happens to be off the same branch. And now you have a patch and no real good place too cut a build from.

There are two things to think about here:
  • What should you have done to prevent the problem?
  • What do we do now?
Hindsight being what it is, it's relatively easy (and tempting) to sit around and discuss what you should have done. Yes, you should have branched for the next release so you had a stable branch reflecting what had gone into the field. And you should have found the bug and fixed it before it ever made it out. And ... and and. Stop.

All of these are useful conversations, but that doesn't change your current situation. While it is important to do a postmortem and figure out how you're going to prevent the problems you're having, the fact remains that you're currently in this situation.

First figure out how you're going to get out of your current situation, then figure out how to prevent it.

You have a customer waiting on a fix, and right now the most important thing to recognize is that getting out of this current problem comes first, closely followed by prevention. I'm not going to minimize the need to prevent problem recurrence, but don't forget that you have to solve a problem before you can prevent it from recurring. There is an appropriate order; use it.

Wednesday, December 16, 2009

Feature Parity Is Rarely a Goal

This is a fun requirement that we sometimes see:

"Must have feature parity with X"

Depending on what you're working on, X might be the previous version of the product (if you're doing a rewrite), or it might be a competitive product, or it might be some analog that should be followed (e.g., "must follow Apple's style guide", which is just a variant on this theme). In any case, you've been presented with the easiest test oracle ever. Just go find X, and whatever it does, make sure your product does, too!

This may be the complete correct thing to do. But before you go blindly accepting the oracle, don't forget that testing your requirements is also a useful exercise. In this case you have a requirement that seems fairly straightforward, but let's dig a little deeper. Is "must do everything X" does really interesting or useful to the consumer of the product? Maybe.

If the consumer is... a marketing type who needs to put up a list of checkboxes and check as many as the competitor does, then yes, having all the features (in some way, anyway) will give him that. Here you probably want to engage in a discussion of how much of the feature is required to meet the "checkbox requirements" and proceed based on that.

If the consumer is... someone who currently uses the product and wants to maintain functionality, then complete feature parity probably isn't actually required. What you want here is the ability to do the same things that the user was doing before. Features the user didn't actually use in the old product probably don't need to be carried forward. Rather than trying to use the entirety of product X as your oracle, concentrate on the features that are actually used.

If the consumer is... a certifying or regulatory entity and product X is a reference implementation, then you really do need to match product X step for step.

The lesson of the day is that the requirement of feature parity may sometimes mean feature parity, but sometimes it may mean something else. Success is in finding the intention of the requirement and meeting that; in the end, this is what will keep your customers happy.

Monday, December 14, 2009

On Thick Skin

We have several teams here at work, and every week those teams have a team meeting. The topics range from escalations to design discussions to estimations, etc. Some of the team meetings are kind of casual about being a team; they just happen to work fairly symbiotically and meetings are often superfluous (and skipped). Other teams have their team meetings with huge regularity and it's a perfectly calm experience. One team, though, that's a real doozy. You can hear their meetings from down the hall. Yes, they really do yell at each other. And here's the thing.... when it's done, they're all still friends and they've solved their technical problem.

I've written before about being nice. It's important to treat others with respect and generosity, and to provide both positive and negative feedback in a manner that allows them to hear it without getting defensive. But...

You have to have thick skin.

For better or for worse, people aren't always going to be nice. They're going to say something harsh, or they're going to get frustrated and lash out. Witness the yelling team, who yells because they care so deeply about their product. And you need to handle it.

So what does thick skin mean?
  • React to the problem, not the message. Someone who is frustrated probably isn't going to be expressing themselves clearly, but that doesn't make the underlying problem any less real. Your job is to find the underlying problem.
  • Don't take it personally. It's unlikely that the person is lashing out at you personally. It's more likely to be frustration with a situation or a problem, and you're just the unlucky winner.
That's really it. Yes, it would be nice if everyone were nice all the time, but it's not going to happen. So make sure your skin is thick for those times when nice ain't happening.


Friday, December 11, 2009

Autistic Testers?

I recently saw an article about a company in Chicago training people with autism to be testers. There are some real doozies in there, like this:
Aspiritech — whose board includes Brix, now retired from Wrigley, and the actor Ed Asner, whose son Charles is autistic — claims those who are autistic have a talent for spotting imperfections, and thrive on predictable, monotonous work.
...
"The stuff we do is boring for [others], like going through a program looking at every detail, testing the same function over and over again in different situations, but it doesn't disturb those of us with autism," says Thomas Jacobsen, an autistic employee at Specialisterne. "That's our strength."

Now I don't want to knock anyone who is autistic, or anyone who isn't. But oh boy was this an article full of depressing assumptions, mostly that testing is repetitive and monotonous. (Sure, parts of it are, just like parts of any job are sort of repetitive and monotonous.) In the end, though, I mostly walked away surprised that a media outlet would publish something like this, seemingly designed to insult both autistic people and testers.

The easy answer is to laugh at how misguided this company is. The better answer is to figure out that we have to show the difficult, innovative side of testing. And that some great testers may be autistic, while others may not be. I say, look for someone who is a good tester, and figure it out from there.

Thursday, December 10, 2009

Is That Good Or Bad?

Let's say that I have a program and a set of tests I've written. Go me! Now the time has come to change that program. So I go in there and I muck about, and then I run my tests.

All my tests pass. Is this good or bad?

Good. I was refactoring, and didn't intentionally make functional changes. My tests tell me that I didn't (at least, in the areas I'm testing).

Bad. I was refactoring, and I did break something. My tests just aren't good enough. Oh, and I'm blind to it until I find a problem later in the process (in QA or in production).

Good. I added a new feature and didn't have to touch any existing code. And apparently I really didn't.

Bad. I added a new feature that changed the behavior of an existing feature. And my tests didn't catch it. Looks like my tests aren't quite thorough enough.

Some of my tests fail. Is this good or bad?

Good. I was refactoring, screwed something up, and my (now regression) tests caught it. That's what they're there for.

Bad. I was refactoring, and I didn't mess anything up. My tests are too closely tied to the internals of my implementation.

Good. I added a new behavior that changed the behavior of my existing features. Tests affecting those features should fail.

Bad. I added a new feature that didn't change existing features but did change some of my test's assumptions (reference data in the database, number of fields in a form, etc). Looks like my tests are a bit brittle.

All of my tests fail. Is this good or bad?

Good. I was refactoring, and broke something that's really at the core of my app. My (now regression) tests caught it. That's what they're there for.

Bad. I was refactoring, and I broke something, but that's a whole lot of tests to wade through for a problem that just happens to be in step 1 of every tests (whoops! login!). My tests are too repetitive; I should refactor and introduce dependent tests so I don't just repeat the same failures.

Good. I added a feature that broke pretty much every other feature. My tests caught it.

Bad. I added a feature and my test infrastructure can't handle it. Need some more robustness there.



As with most things, a passing test or a failing test isn't inherently good or bad, and the same thing applies to suites. When tests pass, you don't know your code is right; you only know that it hasn't broken in a way your tests are looking for. When tests fail, you know something is wrong, but it might be your code or your test, and you'll have to dig to find out. Either way, take a red bar or a green bar for what it is - a piece of information that will guide your next efforts.


Wednesday, December 9, 2009

Want to Participate

Rahul Khatal, this one's for you. (For those of you who aren't Rahul, see his question here). Rahul brings up two points about meetings:
  • Sometimes people go off the agenda
  • Sometimes people come to a meeting but don't actively participate (talk, basically)
There are many reasons people might go off a meeting agenda. Maybe they have something else they really want to talk about. Maybe they're not prepared for the agenda items and are trying to hide it. Maybe there's another meeting that needs to be happening and your meeting is getting co-opted. In any case, I generally find it best to "parking lot" non-agenda discussions by writing them down and moving on. Then the onus is on the meeting organizer to make sure that those parking lot items happen later on. Based on the parking lot, the agenda may be changed.

And then you have something that's kind of the reverse problem: people who show up but don't speak up. First off, ask yourself if you care. You might not mind non-speaking participants if, for example, the meeting is simply a demo to show off what you've done. However, if this is, for example, a design discussion, you probably want feedback from almost everyone, including the shy guy in the corner. The simplest way to do this is to go around the room and ask for feedback at the points in which you need feedback. Start with the shy people, and then proceed to the more outgoing of them. Ultimately, you can't force feedback; some people will respond that they have nothing to say. But if you want feedback from everyone, ask for it, individually.

We spend a lot of time worrying about how to elicit what we consider to be universal good behavior: meetings in which everyone participates, meetings that stick to the agenda. In general, you're dealing with people, which means you don't get to make all the choices, and forcing isn't always an option. So recognize that and make simple changes that everyone in the meeting can and will handle. It'll take you a long way toward your goal.

Tuesday, December 8, 2009

Stress, Don't Break

"We just got a new build, let's break it!"

Wrong.

You have a piece of software to test. Destruction is not the point. The point is to anticipate and subject the system to things that might happen, even if "might happen" is an extreme circumstance. If the system happens to break under those circumstances, then it's great you found it, so you can react appropriately. If it doesn't, then you know something about the system anyway. And that's good, too. What you're really doing in this case is attempting to stress the system. You're looking for the edges and the holes.

Every system has rules. Some are explicit (e.g., don't install more than 5 of these on a single server) and others are implicit (e.g., it's never going to send packets across the network at faster than wire speed). These rules should help guide your exploration of the system. You may use them to do boundary value analysis and testing, or to design a load test, or to decide at what points you want to measure latency, or whether fuzz testing is going to be worth the time it takes. In some cases, it's possible to break the rules (e.g., you might be able to install 6 of these on a single server), and that's probably a legitimate bug. It shouldn't usually be the starting point of a test; generally it's the end point.

However, if you have to break the rules to break the system, then your goal has changed from "gather information about a system in order to help make intelligent decisions about it" to "KILL!". This misalignment of goals negates some of your testing. You're likely to go far down paths that don't provide a lot of information, just because it's a way to break the system. Here's a hint: you may be acting like a bit of a jerk about it, too.

So don't be a jerk. Exercise a system, yes, but keep in mind that your goal is information and stress, not breakage. System breakage or lack thereof is a side effect of information, not the other way around.

Monday, December 7, 2009

Beyond Agenda

By now it's pretty common to go to require that meetings have agendas (those links are the top three on google at the moment, but it goes on for ages). That's not enough, though.

Meetings need a purpose.

Meetings with agendas can still be a waste of time.

So how do you know if you have a meeting with a purpose, or just an agenda? There are a number of warning signs:

If you can't say what the meeting is for in under 2 sentences, it's probably purposeless.
  • Good: "The meeting is for triaging potential release blockers." This is good because you're accomplishing a necessary business and software release task.
  • Bad: "We get all the departments together for status." This is bad because "status" is not useful to anyone by itself. Yes, communication is good, but only when it's accomplishing something.
If a meeting has the same agenda for months, it may be purposeless.
This one isn't universally true (see daily standups, for example), but a meeting with an unchanging agenda can indicate a following of form over function. Good meetings dynamically handle changing circumstances and needs, and you don't get that if you're stuck in an agenda. It may also be a sign that the meeting organizer and/or meeting attendees aren't paying attention and is just going through the motions.

When you're planning how to spend your time, and considering whether a meeting is useful, ask yourself whether it has a purpose, or just an agenda. No purpose, no reason to go.

Friday, December 4, 2009

Signs It's Not Ready

You know code is still in a bit of a raw state when a developer nearby working on some recently checked-in code says, "Don't they run this at all!?"

That code, by the way, is probably not ready for you yet!


Thursday, December 3, 2009

Context-less Factoids

We tend to throw around facts and figures to describe the work we do: "We found 11 bugs last week." "That release was 9 months long." "We have 3000 automated tests."

One of the dangerous things about facts is that they have different meanings for different people.

For example, let's say we put 10 man years of work into a product. If I'm a two-man shop, that's a lot of work! If I'm IBM and my project team is 20 people, that's a pretty short project.

So be careful when you're bragging. Your mountain may look like a molehill to the person you're bragging to.

Wednesday, December 2, 2009

It's Random

How many of us have logged or seen a bug like this?

"System does X randomly."

Oh boy. These are fun for a lot of reasons, but today let's just talk about that one word "randomly". What on earth does something happens "randomly"? Simple:

"Random" just means you haven't found the pattern yet.

Almost everything has a pattern. It has one or more sets of circumstances under which it occurs. Sometimes those are very subtle. Maybe, for example, random means "only when the memory returns a single bit error on a specific process that is attempting a write to a bad sector on the disk." That's really rare and very subtle, but it's a pattern.

You may not need to know the full pattern to fix a bug. Often it's helpful, and usually having part or all of the pattern will help know that a bug is fixed. However, sometimes you're not going to know all the circumstances. You may get part of them, as in "it fails 60% of the time when X", and that's okay. All you need is enough of the pattern to create and confirm a fix.

Don't fear random. It's not good to have a lot of randomness; it indicates a lack of understanding. However, at some point, randomness is still okay; when you have enough to fix and verify, then remaining randomness is acceptable. In other words, define it, don't fret.

Tuesday, December 1, 2009

Inside Out

When we as testers approach a system, we often attempt to do so through analysis and exercise. We break down inputs, do identify load characteristics, define interesting use cases, create performance paths for throughput testing, etc. These tests are all great, but they all share one major characteristic: they test from the outside in.

The net effect of testing outside in is that you work the components that are exposed to the user (UI, API, etc) fairly well, but you're not actually exercising the internal components much. The classic answer is that components are (or darn well should be!) unit tested. So we're fine. Except the ways in which we're not fine.

We still have a hole in our approach. We're exercising an internal component in isolation with our unit tests. And we're exercising externally visible components with our system tests. And we're hitting some aspects of the internal components with our system tests, but not particularly well. Think if it like a soccer game: if your team always has the ball, your forwards (your externally visible components) will get a lot of work, but your defenders aren't going to get a great workout. Sure, they're on the field, and sure they have roles and do things, but that's hardly exercising everything they should be able to do.

We need to exercise our defenders. Unit tests are great, but (to continue the analogy), they're more like drills than like an actual soccer game. We need to put it together. We need a system test of our internal components.

We need to design a test from the inside out.

Just like we did for our externally visible components, we need to take a hard look at our internal components and break them down. What kinds of inputs do they take? What can we do to those inputs (boundary value analysis, data type mismatching)? What about interactions between internal components, or with externally visible components? Then take that knowledge and design a system test to show it. You'll still get at it through external components, but you're manipulating the components in a way specifically designed to exercise the internal components in the ways you've identified.

For example, let's say your system has a UI and a database component. In this case, the UI is the externally visible component, and the database is an internal component. Certain external tests you do will validate some things about the database - like putting in a too long login name. However, the database is probably not stressed. So we design an internal test. We may notice when studying the internal component (our database) that it uses a stored procedure that locks a table when creating a user. Fine. So we're going to stress that and make sure it doesn't cause problems. So we design a load test that makes a lot of users, to see if that table lock causes any problems. The test will run through the UI using your preferred tool, but it's designed to exercise the internal component. It's a test we've designed from the inside out.

What exactly you wind up testing is going to be highly specific to your system and to the internal components of your system. The general rule is that you can analyze your system from the inside as well as from the outside in order to identify interesting system tests.

Monday, November 30, 2009

Helpfulness Balance

Interacting with your team and with external parties is all about preserving balance. You want to be helpful, but not stifling. You want to get help but not be seen as leeching off someone. Sometimes you'll be the giver of help, and other times the asker of help, but it all needs to even out in the end. If it doesn't, you start to get a bad rap.

So for everyone you interact with, you've got a little green bar showing your balance of helpfulness. Think of it like this:

You can ask for help, which drags the green bars down, or give help, which moves the green bars up.

So what happens when you first meet someone? Where's the bar then? After all, you haven't had time to establish a balance yet. Where you sit depends on your relationship with the person. Let's look at a few examples:
  • The person is a potential client. You're getting ready to ask for something (and you can bet the potential client knows it), so you're already implicitly in debt. Your green bar is pretty low.
  • The person is a formal mentor. You're in a mentorship program and both of you know it. This person entered the relationship seeking to help you. Your green bar is quite high.
Okay, so we have to give to receive, and vice versa. So what?

So use this to figure out your behavior. If your green bar is low, keep your requests and questions few and carefully worded. Look for ways to be helpful, so you bring the bar up. If your green bar is high, don't feel guilty about asking questions and making requests. Help where you can, of course, but don't feel like you can't do anything until you've helped.

None of this is rocket science, but as you're getting ready to ask for help, it pays to think about where your helpfulness balance is with someone so you can make your request in an appropriate way.

Wednesday, November 25, 2009

Steel Threading

This is a true story. Names of people and components have been changed to protect the innocent.

Background:
There are a few things you need to know:
  • We work in two week iterations.
  • We basically work from stories in the XP(ish) style.
  • Stories as written have customer benefit
The Problem:
We wanted to put in a major feature (let's call it the "Behemoth"). The Behemoth was going to basically be a UI replacement. It was going to be great for our customers, and give us better UI scalability and testability, too. There was only one downside: the Behemoth was huge. As in a year or so, rough estimate. There's no way it was going to fit in an iteration, or even in a single release.

Options:
As with most things, we have options. We could....
  • ... branch and send a team off to work on this. When it's done, merge, test, and release!
  • ... hold the release until it's done (eep!).
  • ... break it down into parts.
Now that last one, that's interesting. What if we could break the Behemoth down to some size that would fit in a release, or even better, in an iteration? That sounds good except we have a mandate to not put anything in that doesn't help the customer.

Normally, when you break a Behemoth down, we'd do it into components - say, coordinator, and renderer, and config, for example. We'd then build each component, and at the end we'd string them all together and we'd have a Behemoth. Trouble is, a coordinator is no good to any of our customers, so now we have unused code in the system and we're not providing customer benefit. That's not particularly good of us.

Enter steel threading.

Steel threading is when you break down a project into the smallest end-to-end thing you can do. Then you do another small end-to-end thing. Repeat until you have the thing you're looking for. I don't actually know where the name came from, but I think of it like bridge cable - lots of long skinny steel threads all wrapped up together to make a huge cable that holds up a bridge.


We can use the same trick on our Behemoth. Instead of building a coordinator and then a renderer, we're going to do one tiny use case. For example, we're going to do "view a single widget" in the Behemoth, and we're going to write just enough code to be able to do that. It's far from the customer's full need, but it provides some marginal benefit to the customer, and we can write it small. Next iteration, when we've done "view a single widget", we're going to do "view 2 widgets". Then we'll do "add a widget", followed by "view 1000 widgets". And we'll just keep going until the whole Behemoth is built. This also tends to reduce integration problems, because you've had to integrate your components the whole way along, so if your coordinator can't talk to your renderer, you find out before you'd written a whole lot of code.

As with any technique, steel threading is not universally appropriate. In cases where you're faced with a huge task, though, give it a shot.

Tuesday, November 24, 2009

Partial Payment

So we have this piece of code. It's been around for a while, and it's been hacked on by a number of people. Some of them knew what they were doing, some not so much. It started out as one thing, and then suffered from "wouldn't it be neat if we also..." syndrome. Now it does several things, mostly related. And it's getting a bit.... crufty. It's not awful, but it is definitely starting to smell a little bit. (I'm pretty sure we all have a piece of code like this!)

So the right thing to do here is go in and refactor it when we next need to make a change. In principle, no problem.

As luck would have it, I happen to be the next person daring to venture into this piece of code. I want to add something fairly simple that makes it generate a list of successes in addition to the list of failures. Should be straightforward. Oh, except I need to go do some refactoring.

My task used to be this:
  • add a method to get a list of passed tests
  • add a few lines to some pre-existing methods to generate the file and header information
  • add a few lines to dump the list of passed tests into the output file.
But because I need to do the refactoring, my task now looks like this:
  • generalize the getFailedTests method to get either passes or failures, and refactor all the calls to that method
  • add a few lines to some pre-existing methods to generate the file information (separate output for successes)
  • refactor the existing generateHeader method to give me a different header based on whether I want successes or failures or both
  • refactor out the formatting of the output so it can be be HTML (failures show up on a web page) or plain text (passed tests just go into an archive)
All these are good things, but my 2 hour task just became most of a day. Ouch!

The risk with doing a refactoring next time you touch the code is the same as the reason you didn't refactor last time you touched the code - you're just out of time. Yes, this is how technical debt grows and stays around.

So how do we overcome this? We don't have time to pay back as much technical debt as we should here, and there's a risk that I'm simply not going to add the feature because I don't have a day to give to the feature+refactoring. So we compromise. I did some of the refactoring, but not all of it, and I added my feature. Total time: 4 hours. The code isn't where it should be, but it's closer.

Just like any other debt, technical debt takes time to pay down. You don't have to pay it all at once, but every little bit you pay is helpful. So don't be afraid of how much is there. Just start paying it off, bit by bit. The goal isn't to get rid of everything crufty in one fell swoop. The goal is to leave the code you touch better than you found it. Do that often enough, and you'll pay off your debt.

Monday, November 23, 2009

Fail Safe

Like many people, we have scripts that do various tasks. They update libraries on lab machines, check and clean out temporary directories, archive old test results, and myriad other things.

There's one thing that all our scripts must have before we'll begin using them:

A fail safe.

That's right. The problem with these kinds of background cleanup mechanisms is that when they go bad they go really really bad. Updater installs a package that leaves the machine inaccessible over the network? Multiply that by several hundred and you have a real problem. Test archiver fills up its target? Continuing to flood in requests isn't going to get you anything but network traffic.

Having learned that lesson the hard way, all utilities we use have to have a simple fail safe. They check their operations already, to make sure there's no problem. If an operation fails, or fails more than n times, it shuts itself down. This prevents all kinds of runaway code problems, and things move a lot more smoothly.

You'll need utilities and scripts to perform cleanup and maintenance tasks. Go ahead and write them. Just be sure that you put in a fail safe so it doesn't go from helpful to nasty!




Friday, November 20, 2009

Find the Heartbeat

Projects generally have heartbeats. These are the rhythms of a team, and they're both large and small. You probably have a small team heartbeat - a daily standup, a weekly meeting. Then the project probably has a larger heartbeat - a bi-weekly iteration, or a monthly release.

A heartbeat implies conformance, repeatability, closure. Think about what happens, for a minute, when the human heart beats:
  • the heart expands, allowing blood in
  • the heart squeezes, moving the blood around inside the heart
  • the heart squeezes differently, moving the blood out of the heart into the body
  • during all this, the valves of the heart open and close
(I'll note I'm definitely NOT a doctor.)

When a heart does things in a kinda sorta way, you've got a problem. A leaky heart valve, that's a problem. A heart that doesn't pump all the blood through, that's a problem. These are inefficiencies in the human heart, and when they get bad enough then you're in serious trouble.

The same thing is true of a development project. We don't create and maintain a project heartbeat because it feels good. We do it because if we don't have rhythm then we're showing ineffficiencies and when those get bad enough then we're in serious trouble. For example, if our stories no longer fit in an iteration, that's a leak. Once or twice is okay, but if it happens a lot or badly, then the project is in trouble, and we're likely to be late or broken, or both.

Don't have a project heart attack. Keep an eye on your project's heartbeat.

Thursday, November 19, 2009

Cheap Usability

Usability testing is an art form. To do it properly takes significant experience, time and resources. But....

If you lack the time, resources, expertise, or will to do full usability testing, don't give up. You can do one thing that will be a huge first step:

Draw what you're talking about.

That's it. Simple, huh?

You can use a whole lot of words describing something. But if you just draw it out, you'll start to see some of the big problems. It might or might not be the most usable thing in the world, but shy of doing actual usability testing, it's a good start.

Wednesday, November 18, 2009

The Untrained Tester

I'm pretty much an untrained tester. And yet somehow I generally know how to test stuff. Huh? Okay, so my testing classroom time is limited. I am, after all, an autodidact in test. I'm untrained. But I'm not unlearned. That's an important distinction.

There are many ways to learn how to test something:
  • Training. Yes, this can work. Classroom or online, both count.
  • Books. I still swear by "The Complete Guide to Software Testing" by Bill Hetzel. It's a bit outdated in some ways, but it's got a lot of things I can poke at and say, "oooh, is that relevant and how would I apply it?"
  • Blogs and other online guides.
  • Google. This one's great for learning specifics, like how to work with certain tools. I tend to hit this one late in the process.
  • Past experiences. Things we tried that worked, or dev techniques, or things that failed. I learn a lot from coworkers, both testers and developers.
So keep in mind that learning counts, whether its formal training or something a lot more information. If you can to say, "what do I need to know?", then you can go learn it. Don't wait for the formal training. Just go learn.

Monday, November 16, 2009

One New Thing

One of the amazing things about testing is that you get a chance to try something over and over again. Every release, you get a new chance to try a similar process. Every time you run automation, you get a chance to make it stronger, whether that's per build, nightly, weekly, whatever. We're lucky to have so many opportunities to do roughly the same thing.... better.

Take advantage of that opportunity.

Every time you test, ask yourself what one thing you can do better this time. If you're feeling ambitious and things are generally under control, go for two or three things better. Try a new test technique. Try a new ordering of the test plan to shake out problems earlier. Fix a couple of the reporting oddities in your test infrastructure that have been bothering you.

This isn't news. I've written about changing up your test plan before. It still bears repeating. Do what you were doing... and do one thing better.

Friday, November 13, 2009

Project Doldrums

Sometimes a project gets into a pretty frustrating state, in which:
  • it's "mostly" done
  • it's highly visible
  • it's just starting to get tried by a broad audience, and not all of them know the background and details of the project, just this thing they have been asked to try.
If you're not careful, this is where you get stuck in the project doldrums. Now is the time to avoid getting stagnated on the project. You're getting feedback, which is probably introducing new requirements or ideas. You're probably finding a few issues. It's likely that you have one or two things you already knew you needed to do. And those things just sort of keep piling on each other.

It's now up to you to get control of it and get momentum again. (I could keep the doldrums metaphor going and say that you have to turn on the motor and get out of the listless winds.)

Getting momentum isn't hard, really. There are only three key things that you must do:
  1. Time box it. You should be happy to take feedback, but you're only giving people a set amount of time to provide it. After that, no new requirements, no whining. It will go into production as it is spec'd.
  2. Make your task list public. You have a set of things you now need to do (fix bugs, update config, add a few features). Publish it, and publish where you are on that list. That way you don't get the same complaints over and over, and when it's fixed, you can tell people to try again. It's a way to show that feedback is not ignored, that you will get to it, and that you are making progress.
  3. Do only your tasks. Don't make random changes or other changes. Every change you make should be based on a task in your list. It is imperative that your task list be complete. If you find something else, add it to the task list, then do it. You don't want to give your (now very public) audience the impression that you're flailing around making random changes. It makes them lose confidence in you.
Projects can hit the doldrums. It will happen eventually. Don't worry about it overly; you can get out of them. Just do it with momentum, and do it with confidence.

Thursday, November 12, 2009

Rituals

After you work in a team for long enough, you start to develop rituals, formal and informal. A standup is a daily ritual. An iteration retrospective is a ritual. The guy who brings in donuts most Wednesdays is a ritual.

Rituals are great. They are the affirmation that you're a team, and that the team is almost a living organism. It has a heartbeat and habits - and those are your rituals.

But...

Rituals are only affirming if they continue to have meaning. There's no point to having a retrospective if you're no longer coming to small stopping points every iteration. Otherwise it's just a standup, only longer; you can't retrospect in the middle of something. There's no point to having donuts on Wednesdays if you're forced to bring them in; the beauty of that ritual is the small thrill of informality.

Embrace the rituals you have. But evaluate them to make sure there is still meaning behind your rituals. As soon as they lose their meaning, stop doing them. There is no affirmation in empty rituals.

Wednesday, November 11, 2009

Break It Down

We're working on an internal project that involves (among other things) sending notifications programmatically to Jabber users. It's at that stage where it works for some people and not for others. There are two versions of code doing the sending (different OSes). There are two OSes on the clients, and there are about 10 different clients.

ACK!

So it's time to break it down. It's overwhelming to try to tackle it all at once, but if we make a table we can see what works and what doesn't, and start to try to get all "Y"s into the table, and see if there are any patterns. It gives us a base to work from.



When you're lost, write down what you know, manipulate the data to visualize it, and you'll see a way out.

Tuesday, November 10, 2009

Sufficient Quality

How do you measure yourself? How do you know your release is of acceptable quality? You've found a lot of bugs, and you've fixed a lot of bugs. You have a set of great new features, and you've done all sorts of interesting security and usability testing. It's a great release! Or is it?

Your release is of sufficient quality if your customers are sufficiently happy.

The real trick here is to define "sufficient". You could have hundreds or thousands of bugs in the product, and if your customers don't hit them or don't mind them (or think a bug is a feature!), then it's still a release of acceptable quality. You could have a total of 5 bugs, but if your customers hit them a lot and they're bad, then this is not a release of sufficient quality.

So if you want to know how you as an engineering (and requirements gathering and sales) team are doing, ask your customers. They're the ultimate arbiters.

Monday, November 9, 2009

Grunt Work

I've been working a bit on some data analytics projects. I've been looking at two major things:
(1) what kinds of issues we find in the field; and (2) what kinds of issues we find late in a release. To do this, I go diving through our defect tracking system. We use Jira, so this is mostly creating filters, and generally runs along the lines of "show me all the issues by client in the customer escalations project".

The problem - and this happens with many things - is that our reporting now has more data than it used to. For example, we didn't used to track which client an escalation was open at as a query-able field (it was just in the text). We now track it as a separate field, but that means that all the old issues were never updated. So I have two choices: I can either construct a special query that pulls the info out of the comments; or I can move the data into the field where that field is not populated.

The advantage to a special query is that I can construct it and I don't have to touch a lot of bugs. The disadvantage is that I have to reuse and maintain that fairly complex query every time I need the information. (And if someone else wants to use it, well I hope they can figure out how!) So instead I'm going to make our old issues comply with our new practices - and populate the field we're now using.

The moral of today's story is:
Sometimes, you just have to do the grunt work.

It's not fun, and sometimes it's more manual than I'd like, but your future self will thank you.

Thursday, November 5, 2009

Window of Opportunity

It's relatively easy to decide to change things. It's even fairly easy to decide what you're going to do differently, generally. In order to be successful, though, you also have to consider when you make the change.

Any change has a window of opportunity, time period during which it is most likely to be effective.

For example, let's say you decide to change your release process. Instead of simply sending an email to operations letting them know that the release is ready, you're going to appoint a "development liaison" who will work with operations getting the release in production. The goal of this change is to prevent unintentional misconfigurations (which you've had a problem with in the past). You could do this change right at the beginning of your development effort, but it wouldn't really buy you a lot - after all, you're not releasing so you're not going to try your great new change. No, instead your window of opportunity is a bit before release.

As another example, let's say you're doing iterations and you're not quite perfect at it yet, so the end of an iteration is a bit... frantic. Don't introduce change when you're frantic - it'll only make you more frantic. Your window of opportunity is earlier in the iteration.

So describe your goal, describe your change, and then think of your window of opportunity. All those together will help you gain success.

Wednesday, November 4, 2009

State Your Purpose

Being a tester, I see a lot of tickets. Some tickets, unfortunately, hang around for a while, and tend to be worked on by multiple people. These wind up with the basic ticket writeup and a series of comments by different people. Particularly when the ticket is a difficult one, there are theories being tried and discarded.

Let's use an example:
We have an issue where access to the system, either for standard use (reading, writing data over mounts) or for diagnosis (logging in to the box) was slow. The system had several exported mounts, was performing replication, and was deployed in our lab. That's about all we knew going in to it.

As we're working on the ticket, a lot of theories came up, ranging from load on the box to a kernel problem to a network issue (turned out to be saturation of the switch when other system using that same switch we engaging in network-intensive operations).

So the question becomes, how do we talk about this in the ticket? There are good and bad ways to write this up.

A Poorly Written Comment
The replication schedule is:
- 20:00 (average duration: 90 min)
- 07:00 (average duration: 45 min)
- 13:15 (average duration: 80 min)

A Well Written Comment
We noticed that the slowness described only occurs sometimes. Looking at what the box is doing at the time, it always seems to be replicating.

The replication schedule is:
- 20:00 (average duration: 90 min)
- 07:00 (average duration: 45 min)
- 13:15 (average duration: 80 min)

We've seen slowness at:
9/12 21:10
9/13 07:08
9/16 07:14

Earlier comments in this ticket indicate that load average is not the problem, but what else might replication be triggering? Early thoughts: increased threads, increased memory use, increased network use...

The Six Month Test
A good comment is one that makes sense six months later, after you've forgotten all the details. This means it needs to:
  • describe how it relates to the issue as a whole
  • describe what the reader is intended to do or take away from the comment
Just like your bugs, write your comments for posterity. Future you will thank you.

Tuesday, November 3, 2009

Paralysis

There are days when I walk into work and have a whole lot of different things that need doing, none of them short. A typical list would look like:
- reinstall an object store (1 hour)
- finalize a task list (45 min, and needs people)
- run a scanner utility against a test data set to gather a baseline (2 hours)
- write up how to do a big configuration I've been working on (3 hours)
- provide feedback on a document (1 hour or so)

And I don't want to start any of 'em because that means I'm not making progress on the others! This is a form of paralysis. Fortunately, it's mild.

There's only one way I know to get out of it, and that's to write down my task list for the day, pick one, and start. It doesn't matter which one I pick, as long as it's one single item.

In this case, I invoked my particular prioritization method:
  • first the stuff that's blocking other people
  • then the stuff I'm going to forget if I don't do
  • then the stuff that others are waiting for but not blocked by
  • then everything else.
In this case, I did the feedback on the document, followed by the task list (also needed by others). After that, I did the configuration writeup, and then started on the scanner utility (and then the day was over!).

Does this ever happen to you? How do you deal with the "too much to do to even get started" problem?

Monday, November 2, 2009

The Rest of the Product

We have a test plan, and it's great. It covers all the features, and all the workflows of the application. We've got stories, we've accepted them. We've written some automated tests.

Congratulations, you're now half done with the product.

The product is not useful until you can actually use it in production. So now that you've built the darn thing, it's time to think about:
  • What's production going to look like? How many machines? What configuration?
  • How are you going to get the software into production?
  • How about the config info? How're you going to go from dev to test to prod? (hope it's not hard coded into the war file or rpm somewhere!)
  • Okay, once you got it into production, how're you going to start it?
  • Come to think of it, how're you going to stop it?
  • One day you're going to have to maintain this thing. Got a plan for that? Is down time okay? Can you do some rolling upgrades or maintenance?
  • How are you going to see what's going on? Got logs? Got a way to get logs back to dev for analysis?
  • How will you know it's running? Any monitoring? Notifications?

There are a lot of questions to answer once you've done the basic implementation. Don't forget to include those when you're thinking about testing it, too.

Friday, October 30, 2009

Taking Notes

I tend to take notes in meetings. As I was doing that today, it occurred to me that there are different kind of notes that I take:
  • All notes all the time. These are extremely extensive notes. It almost gets everything (but not quite - I'm not that fast). I take these kind of notes when I'm not sure what's going to be important. Typically these are Q&A sessions with customers, or a presentation for which I'm totally unprepared. I try not to do this often because it's really hard to listen and take these kind of notes at the same time. If I'm going to have to share notes with people not in the meeting, these are the notes I take.
  • Reminder notes. These are much more outline-like notes that I take for most meetings. These are intended to just provide triggers for my memory. These are the notes I take if I need to share them with people who are in the meeting.
  • No notes. I do this for a lot of meetings. If I need to be actively participating (or leading) a meeting, I generally don't take notes. I'd rather be fully engaged while I'm there.
Do you take notes?

Thursday, October 29, 2009

Choosing

We make tool choices constantly, sometimes explicitly and sometimes implicitly. For example:
  • I write a bash script to grab some network info off multiple machines. Tool chosen: bash. Didn't even think about it, just did it.
  • We're moving parts of our test plan into Jira. Tool chosen: wiki + Jira. This one we discussed for a while, and eventually made our choice based on some cruft with the wiki. I'm not sure it's going to work, but we're giving it a shot.
  • I burned a CD of our latest installer. Tool chosen: Disk Utility on my mac. This one is quick and handy, and I haven't gotten a bad burn off it yet.
As we make all these tool choices, we're implicitly considering the properties of the tool and comparing that to the requirements of the task. We have to think about only a few things:
  • What is the tool good for? Jira, for example, is good for workflows. It's horrible for documentation. A wiki is good for documentation but workflow is simply awful. Some tools are more equipped for long term projects and growth than others. Other tools are a lot lighter and good for quick or small projects.
  • How convenient is it? The tool I already have will usually trump the tool I don't have, just because of setup overhead. It's not universally true, but it takes a really great feature - or a seriously large annoyance with what I have - for me to switch.
  • How accessible is it? Whatever tool I use needs to be accessible to everyone who needs it. IMing out info is no good for my boss, for example, who doesn't use IM. If he needs to know, then I can't use the IM tool.
Many times tool choice is a really quick, almost unconscious decision. Other times it takes a lot of evaluation and explicit consideration (especially when it's expensive or has far-reaching ramifications). In the end, though, what tool you choose really only comes down to a few simple questions. So don't stress about it too much. In the end, it is just a tool.



Wednesday, October 28, 2009

Postmortems

After a release goes out the door, we hold a postmortem. It's pretty standard stuff, usually. We talk about what we did well, what really didn't work out, and what we didn't anticipate.

Timing is an issue, though. You can hold a postmortem right after release, or you can wait to see how it actually does in the field and then hold a postmortem. They each have benefits.

When you hold a postmortem right after you release, you get:
  • Motivation. People are still stinging from the things we didn't do so well, and are generally aching to fix them. If it was a good release, getting together to remember it will also provide a good boost to people's egos (rightly so!).
  • Recency. Memories are better right after the release, and you'll be able to have a better discussion about why you did things and what specifically didn't work about them. You'll have a much more precise discussion while everyone's memories are fresh.
  • Ability to change. If you want to make changes, the sooner you start them, the better chance you have.
When you hold a postmortem after you have some field experience with it, you get:
  • Perspective. That process ya'll tried that seemed painful right after you did it might not be so painful now. Maybe you've learned it better, maybe you've started reaping benefits, maybe it wasn't as bad as it seemed.
  • Field Experience. Maybe that release that seemed really shaky has performed like a champion in the field. Perhaps that awesome release ya'll tested extensively has had all sorts of problems. These are things you don't know until it's been out for a while.
My not-so-innovative solution is to realize that our postmortems take us about 45 minutes. That's not very long, so we do both! We hold a postmortem within a week after we release. Then, we hold another one about 6 months later, when the release has been in the field for a while, and we ask ourselves how we really did.

In the land of making things better for ourselves, postmortems are a valuable tool. Holding them twice just lets us learn more from the software and from our development process than we could with just one. Give it a shot.

Tuesday, October 27, 2009

The Other Fence

A lot of derision has (rightly) been spilled on the idea that development writes code and chucks it over the fence at QA. Fortunately, at least in the places I've worked, this doesn't really happen any more. That development-QA fence is basically gone (hooray!).

Now maybe we should start working on the other fence.

Other fence?

Think for a second about what happens when you're done testing (and developing) on a release. What do we do? We chuck it over the fence at operations (or professional services, depending on how installations and upgrades get done).

Oh, that other fence.

I've been thinking about this fence, and wondering if it's bad. After all, we didn't used to think it was bad that development finished and chucked the code to QA for work! Now we know better. Maybe we should be starting to learn better as we chuck a build out of engineering and into Operations.

Let's posit for the moment that the engineering-ops fence is bad. What kinds of things might we do to break down the fence, and how might that help?
  • Change how we structure our builds to make them more releasable. This is somewhat analogous to writing more testable code.
  • Help deploy. Just like our developers do some testing now, maybe we can help deploy, or create some utilities to help. Rake tasks, deployment scripts, hand installations - how does this stuff get deployed, and can we make it easier or better?
  • Get help building and packaging. Just like development sometimes asks a tester how best to approach a TDD problem, engineering can get some advice from operations on how to handle a configuration issue, or a packaging question.
  • Pair on problems. When there's a problem in the field, we don't have to look in isolation, or bounce questions back and forth. We can work on it together. With two different views and skills looking at the problem, you're more likely to figure out a problem that has a foot in both worlds.
Depending on your current organization, and on who receives your code, your list may be different. Maybe you're working with support right after release, or sales. Maybe engineering owns operations, so you don't have this problem. At this point, this is just something to think about.

What do ya'll think? Is there a fence after engineering? And is it time to start talking about that fence?

Monday, October 26, 2009

Org Charts

There are a lot of different ways to set up an engineering organization. Generally, they fall into one of two categories: function-oriented, and team-oriented.

A function-oriented structure says that people with similar skills and responsibilities should be a team. That team then provides their collective services to other teams. A team-oriented structure says that everyone working on one goal (project, product, feature area, whatever) should be on the same team.


A Function-Oriented Organization

This is a simplistic example of a function-oriented organization. You have your basic disciplines (development, test, product management) as separate groups, and within those groups you have different breakdowns based on the projects that you do.

Pros:
  • In-discipline learning is easier and more fruitful. Devs will feed off what other devs are doing, testers will see test innovation and build on it - all because you're working with people who are thinking about the same things you are.
  • Allows dynamic resource allocations. If you need an extra tester on a project, great, we can add a tester.
  • Explicit thought leadership. You have a head developer who is explicitly charged with improving architecture and development practices. You have a QA manager who is explicitly responsible for evaluating and refining test practices.
Cons:
  • People are serving multiple masters. They're trying to help their teams and also to conform to their (function-oriented) organization structure. This leads to some conflicts of interest.
  • Higher risk of silos. If it's a separate group, then you're more likely to have problems with communication.
Use this when:
  • You lack predictability in your projects. This happens in consulting a lot, but it can happen in other places, too. If you can't predict how many devs you'll need, it helps to have a pool of devs to draw from.
  • You have unusual requirements on one or more of your groups. If you're doing some really unusual testing, for example, you may need to keep your testers together so you pick up the learning and innovation effect.
Warning:
  • Avoid this if you're attempting to embrace SCRUM or some other cross-functional team ownership mentality. The "multiple master" problem will get you in this case.

A Team-Oriented Organization

This is a simplistic example of a team-oriented organization. Each team is a group, and contains members from all relevant disciplines.

Pros:
  • Unity of purpose. The team is all working toward the same goals. There is no secondary or other goal.
  • Breakdown of silos. If you can get true team ownership, you start to find developers testing, and testers helping with product management, etc.
  • No need for functional management. The role of "QA Manager" goes away here. Instead you have team leads.
Cons:
  • Harder to drive functional change. When you have several teams with a few testers each, it's a lot harder for testers to innovate or learn from each other. The same goes for developers. The groups are simply too small to get that kind of momentum.
  • Hard to handle changing needs and moving through software development phases. You run the risk of having idle testers as you start an effort, and idle developers at the end of an effort. This is something that can be overcome, but you have to encourage cross-functional work, and be sure to plan appropriately.
Use this when:
  • You're using SCRUM.
  • You can have generally stable teams. This implies your projects (or products) are pretty consistent in size and resource needs.
Warning:
  • Avoid this if you have a particularly weak functional area (or more than one). There's a large risk that isolation within a stronger team will make them even weaker.

So Which To Use?
I've seen organizations of both types - functional and team - work great. And I've seen them both fail spectacularly. The trick is to align your teams with your development and business philosophies. Have you embraced SCRUM? Are your projects generally consistent in size and skill sets needed? Cool - you probably want a team-oriented structure. Do you have highly specialized needs in one or more areas, or an extremely lumpy (in terms of resources wanted) plan? Consider a functional-oriented structure.

In the end, pick the structure that works for you. Just do yourself a favor and pick a single structure. Trying to mix and match will lead to heartache, but pick a single way and you'll give yourself a good chance at success.


Friday, October 23, 2009

Merging

We work in what I imagine is a fairly typical environment. We code away on HEAD for a while. Once we're feature complete, we branch (so now we have HEAD and RELEASE). Then we fix stuff on HEAD and merge it to release until we hit code complete. We also go ahead with the next features on HEAD, but that's not currently the point.

The closer you get to code complete, and particularly after code complete, things get tricky. What do you merge?

There are, after all, several kinds of changes that might be candidates for merging into your branch:
  • Code changes to production code. Bug fixes, new features, etc.
  • Code changes to test code. Change the tests, not the code that actually ships.
  • Infrastructure changes. Change something underlying about the lab or environment (e.g., update the default fstab that gets installed)
So what do we do?

Code changes to production code tends to be the most commonly considered case. You evaluate the risk of the change, how much retesting you need to do, the benefit of the change (and how many of your customers are likely to benefit), and the amount of time left before you really really have to ship. Based on that, you choose to take it or not.

Code changes to test code are trickier. On the one hand, change is change and all change introduces risk. Sure, this code doesn't ship with your product, but it's still change. Plus, you have to consider risk here, too. If your test change breaks something, you might get less information out of your tests in the future, and that would be bad. On the other hand, it probably has some benefit, too: tests run faster so you can do more of them; or a test passes a failure point and lets you expose any other problems that occur later in the test; or perhaps you just spend less time looking at the error that isn't telling you anything new. For me, the bar to test code is lower than the bar to taking production code, simply because the risk to our actual (field) customers is lower, but there are some things that I try to consider:
  • Is the change caused by a failure or just a cleanup/enhancement/nice to have? The former is more likely to get put in than the latter.
  • Is the change going to fix something that causes problems for other tests? (e.g., a hang that stops all later tests in the suite from executing). A bad citizen like that is more likely to get fixed.
  • Is the change risky? The same types of analysis apply here as for product code. Avoid big, sweeping, likely-to-break-something changes.
  • How many more test runs are we going to have? The closer we get to the end, the longer we can just deal with the problem and not bother to fix it.
Infrastructure changes are generally not really optional. If you want your release tests to keep running in your infrastructure, they have to keep up with changes to your infrastructure. That being said, make sure you really need that infrastructure change, and be mindful of making the changes as small and safe as possible.


Merging is a tricky business, and the closer you get to a release the more of a "gut feel" kind of thing it turns into. So before you get into the thick of it, think about what you will merge and why. It'll save you some arguments later!

Thursday, October 22, 2009

All The Other Tests You Did

I've been verifying bugs for the past day or so. It's actually work I really enjoy. The vast majority of the time, it's concrete evidence of the product being better, which is awesome. Plus it's very easy to see the progress I'm making, which appeals to the list maker in me (I love checking things off!).

Here's the thing, though: I'm not just verifying bugs. I'm performing lots of other tests at the same time.

For example, a bug I verified was a display problem with replication progress. This is a small issue, but, hey, it's fixed, so we'll verify it.

To verify it, I had to:
  1. install two systems
  2. create a volume on one system
  3. configure replication between the two systems
  4. configure replication on the volume

So, just to verify one bug, I had to do an installation test, a volume creation test, and a replication test. All I had to do was a quick check to confirm these weren't throwing errors not visible to the end user, and then I have a small other thing done. Repeat enough time, and this adds up to rather a lot.

So next time you're verifying a simple bug, ask yourself what other tests you're doing. You may be accomplishing more than you think!

Wednesday, October 21, 2009

Today's Top N

I don't drive very often - I live and work within the city, and I tend to take the T to work and pretty much everywhere else, too. Consequently, I don't listen to the radio very often. But this weekend I was on a road trip and I had the radio going. I kept hearing the same theme over and over as I flipped around the stations:

"Today's Top 10"
"the Weekly Top 20"
"Top 5 Countdown"

And I remember thinking, "What a good idea!"

After all, the top 10 songs, or top 20 albums, or whatever, are in some ways like the parts of our test plans:
  • some of them are the same over and over again: how many weeks is one song at the top of the charts, or at position two or five? Same thing with areas of code.
  • some of them change each time: eventually a song falls off the list, and eventually we're comfortable enough with an area of our test plan that we move on.
  • they sound a little different every time: maybe this week it's the dance mix and next week it's the acoustic version - same underlying thing, just a bit different.
So why not? Let's embrace the theme!

We happen to be really close to a release. So for this week, we're having the QA Top 4 (there are four of us, so this makes it easy). Every morning, we come in and pick the four areas we're currently least comfortable with, as a group. We all throw ideas around until we agree on the four. Then we go work mostly on those four items - they're the top of our list for the day. The next day, we repeat the procedure. Maybe it's four different items, maybe some of them are the same and some new - doesn't matter, really. But that's the new QA Top 4. So we work those new four for the day.

The idea here is that we get a chance, every day, to re-identify the scariest areas of the code. And then we work on them. If they're still scary, we'll work on them again the next day. If not, we'll work on the new scariest areas.

There are a lot of ways to prioritize things, but having new ways to think about it sparks new ideas. This is just a new (to me, anyway) way to present that old risk evaluation, and hey, it's kind of fun.


(And by the way, you should see the jokes.... "Replication, by the Again Agains" at #1, and "Defect Verification" off the "Oh Boy It Worked!" album at #2. We're pretty easily amused!)

Tuesday, October 20, 2009

Good Citizen

As we're testing our software, we have lots of different kinds of requirements. We have use cases, functional requirements, performance requirements, usability requirements, testability requirements, etc. One of the requirements that we don't usually talk about explicitly is the good citizen requirement.

Wait, what's a "good citizen"?

A bit of definition:

Software that is a good citizen behaves in a manner consistent with other software, with regard to interaction with other assets with which it interacts.

That's kind of a pompous way of saying that software is behaving like a good citizen when it does what the systems around it expect (e.g., log in a way that centralized logging tools can handle it) and does create excessive load or resource usage (e.g., doesn't attempt to create hundreds of DNS entries when one will do). In other words, this is software that does what it ought to do, and doesn't behave badly.

Software that is being a good citizen does:
  • support logging in a common format (e.g., NT Event logs, etc)
  • use centralized user or machine management (e.g., Active Directory or NIS)
  • does automatic log rolling
  • can be configured to start on its own after a power outage or other event
  • can be disabled or somehow turned off cleanly (to allow for maintenance, etc)
Software that is being a good citizen does not:
  • log excessively (at least, except maybe in debug, which should be used sparingly)
  • create excessive traffic on infrastructure servers (DNS, Active Directory, mail, firewall, etc)
  • send excessive notifications (e.g., a notification for every user logging in would probably be overkill)

Normally, the good citizen requirement is not explicit. Sometimes you'll find mention of it in requirements indirectly (e.g., must support Active Directory for user interaction), but sometimes you won't. You usually won't find the negative requirements (e.g., doesn't renew its DHCP lease too often) at all. But if you miss one and your software misbehaves, you can bet you'll hear about it! Good citizen requirements are generally assumed, even though they're often not mentioned directly.

As you're testing, ask yourself, "is my software being a good citizen?"

Monday, October 19, 2009

"Certifying" Clients

We're a storage company. Lots of people write to us with lots of different programs. These run the gamut from drag-and-drop to homegrown bash script to full HSM solutions, and lots of points in between. Sometimes we'll get asked to "certify a client" application. What's going on here?

Let's break it down:

Who's asking
Usually for us sales or support is asking. Sometimes sales has a client who wants to use a program and wants a guarantee it will work. Other times sales has a client who has not picked a client and wants to know what we recommend. Alternatively, support might have a client who's attempting to use a program and finds something they don't like about it (doesn't work, works too slowly, etc).

Let's say you're not in a storage company (some of us aren't!). This could be a browser, if you're a web app. It could be a reporting or monitoring tool (anyone else ever had a client ask to point Crystal Reports directly at your database?).

Either way, someone's now looking for a guarantee that a client program will work with our software.

"Guarantee"
I'm usually afraid of the word "guarantee". You can do all the testing in the world, and a new patch of a client program will come out and break in a truly spectacular manner. Or the customer will use an obscure undocumented flag you didn't test and... kablooey! (tm Calvin and Hobbes). Or the client will install it on some totally unsupported hardware and scratch his head when it doesn't work. "Guarantee" is a very strong word that means "it's totally my problem to fix".

I usually get around this by saying, "here's what we've tested" rather than "we guarantee this".

Certification Levels
There are a number of different things you can do and call it certification.
  • The standards approach. This is where you point to some external standard and say, "we conform to this. Any client that works with this will work with us." By external standard, you should make sure you choose a public standard: NFS v3, or W3C compliance, or whatever's appropriate for you. In this case, you don't actually have to test the client. However, you'd better be darn sure you conform to the standard, or this one may eventually bite you.
  • The "we test this" approach. This is where you offer up the version and configuration you test, and you say that has been tested and will work. Any deviance from that configuration or version may work but isn't guaranteed.
  • The "certification program" approach. This is where you turn it around on the client application, and offer a certification program. The idea is that they conform to you, rather than the other way around. You offer a set of criteria, test systems (or a lab for people to come test in), possibly scripts and reporting mechanisms, and you let people run your tests. Then you analyze their results, and either put your stamp of approval on or not (think "runs on Vista", etc). If you're large enough and important enough, people will do the compatibility testing for you. This doesn't work so well if you're kind of a tiny nobody in your industry. I've not done this one personally.

So What to Do?
In the end what you do is driven by how much team, time and sensitivity you have. The real goal here is customer (or potential customer) comfort. So you do what you have to do to achieve that customer comfort, within the bounds of the worth of that customer.

My first approach generally is to do a test for that client. If this is an important client, we can get from them (or create if they don't know), a configuration that will work for their situation. Then we test this (and retest it on new versions of our code). It's client-specific, but it gives us the comfort to go to the client and say, "follow our advice and this will work."

My next approach, if this starts to get to be too much volume, is to publish a "known good" configuration (including version) of the client software. We test that client config with every release we do. We tell clients what works, and then let them experiment from there, if they need to.

These two approaches have gotten me far enough, so far. In the end, there's no substitute for trying it at the customer, but short of that, you can give them comfort. And all "certification" really means, at least in this sense, is comfort.