Friday, October 30, 2009

Taking Notes

I tend to take notes in meetings. As I was doing that today, it occurred to me that there are different kind of notes that I take:
  • All notes all the time. These are extremely extensive notes. It almost gets everything (but not quite - I'm not that fast). I take these kind of notes when I'm not sure what's going to be important. Typically these are Q&A sessions with customers, or a presentation for which I'm totally unprepared. I try not to do this often because it's really hard to listen and take these kind of notes at the same time. If I'm going to have to share notes with people not in the meeting, these are the notes I take.
  • Reminder notes. These are much more outline-like notes that I take for most meetings. These are intended to just provide triggers for my memory. These are the notes I take if I need to share them with people who are in the meeting.
  • No notes. I do this for a lot of meetings. If I need to be actively participating (or leading) a meeting, I generally don't take notes. I'd rather be fully engaged while I'm there.
Do you take notes?

Thursday, October 29, 2009

Choosing

We make tool choices constantly, sometimes explicitly and sometimes implicitly. For example:
  • I write a bash script to grab some network info off multiple machines. Tool chosen: bash. Didn't even think about it, just did it.
  • We're moving parts of our test plan into Jira. Tool chosen: wiki + Jira. This one we discussed for a while, and eventually made our choice based on some cruft with the wiki. I'm not sure it's going to work, but we're giving it a shot.
  • I burned a CD of our latest installer. Tool chosen: Disk Utility on my mac. This one is quick and handy, and I haven't gotten a bad burn off it yet.
As we make all these tool choices, we're implicitly considering the properties of the tool and comparing that to the requirements of the task. We have to think about only a few things:
  • What is the tool good for? Jira, for example, is good for workflows. It's horrible for documentation. A wiki is good for documentation but workflow is simply awful. Some tools are more equipped for long term projects and growth than others. Other tools are a lot lighter and good for quick or small projects.
  • How convenient is it? The tool I already have will usually trump the tool I don't have, just because of setup overhead. It's not universally true, but it takes a really great feature - or a seriously large annoyance with what I have - for me to switch.
  • How accessible is it? Whatever tool I use needs to be accessible to everyone who needs it. IMing out info is no good for my boss, for example, who doesn't use IM. If he needs to know, then I can't use the IM tool.
Many times tool choice is a really quick, almost unconscious decision. Other times it takes a lot of evaluation and explicit consideration (especially when it's expensive or has far-reaching ramifications). In the end, though, what tool you choose really only comes down to a few simple questions. So don't stress about it too much. In the end, it is just a tool.



Wednesday, October 28, 2009

Postmortems

After a release goes out the door, we hold a postmortem. It's pretty standard stuff, usually. We talk about what we did well, what really didn't work out, and what we didn't anticipate.

Timing is an issue, though. You can hold a postmortem right after release, or you can wait to see how it actually does in the field and then hold a postmortem. They each have benefits.

When you hold a postmortem right after you release, you get:
  • Motivation. People are still stinging from the things we didn't do so well, and are generally aching to fix them. If it was a good release, getting together to remember it will also provide a good boost to people's egos (rightly so!).
  • Recency. Memories are better right after the release, and you'll be able to have a better discussion about why you did things and what specifically didn't work about them. You'll have a much more precise discussion while everyone's memories are fresh.
  • Ability to change. If you want to make changes, the sooner you start them, the better chance you have.
When you hold a postmortem after you have some field experience with it, you get:
  • Perspective. That process ya'll tried that seemed painful right after you did it might not be so painful now. Maybe you've learned it better, maybe you've started reaping benefits, maybe it wasn't as bad as it seemed.
  • Field Experience. Maybe that release that seemed really shaky has performed like a champion in the field. Perhaps that awesome release ya'll tested extensively has had all sorts of problems. These are things you don't know until it's been out for a while.
My not-so-innovative solution is to realize that our postmortems take us about 45 minutes. That's not very long, so we do both! We hold a postmortem within a week after we release. Then, we hold another one about 6 months later, when the release has been in the field for a while, and we ask ourselves how we really did.

In the land of making things better for ourselves, postmortems are a valuable tool. Holding them twice just lets us learn more from the software and from our development process than we could with just one. Give it a shot.

Tuesday, October 27, 2009

The Other Fence

A lot of derision has (rightly) been spilled on the idea that development writes code and chucks it over the fence at QA. Fortunately, at least in the places I've worked, this doesn't really happen any more. That development-QA fence is basically gone (hooray!).

Now maybe we should start working on the other fence.

Other fence?

Think for a second about what happens when you're done testing (and developing) on a release. What do we do? We chuck it over the fence at operations (or professional services, depending on how installations and upgrades get done).

Oh, that other fence.

I've been thinking about this fence, and wondering if it's bad. After all, we didn't used to think it was bad that development finished and chucked the code to QA for work! Now we know better. Maybe we should be starting to learn better as we chuck a build out of engineering and into Operations.

Let's posit for the moment that the engineering-ops fence is bad. What kinds of things might we do to break down the fence, and how might that help?
  • Change how we structure our builds to make them more releasable. This is somewhat analogous to writing more testable code.
  • Help deploy. Just like our developers do some testing now, maybe we can help deploy, or create some utilities to help. Rake tasks, deployment scripts, hand installations - how does this stuff get deployed, and can we make it easier or better?
  • Get help building and packaging. Just like development sometimes asks a tester how best to approach a TDD problem, engineering can get some advice from operations on how to handle a configuration issue, or a packaging question.
  • Pair on problems. When there's a problem in the field, we don't have to look in isolation, or bounce questions back and forth. We can work on it together. With two different views and skills looking at the problem, you're more likely to figure out a problem that has a foot in both worlds.
Depending on your current organization, and on who receives your code, your list may be different. Maybe you're working with support right after release, or sales. Maybe engineering owns operations, so you don't have this problem. At this point, this is just something to think about.

What do ya'll think? Is there a fence after engineering? And is it time to start talking about that fence?

Monday, October 26, 2009

Org Charts

There are a lot of different ways to set up an engineering organization. Generally, they fall into one of two categories: function-oriented, and team-oriented.

A function-oriented structure says that people with similar skills and responsibilities should be a team. That team then provides their collective services to other teams. A team-oriented structure says that everyone working on one goal (project, product, feature area, whatever) should be on the same team.


A Function-Oriented Organization

This is a simplistic example of a function-oriented organization. You have your basic disciplines (development, test, product management) as separate groups, and within those groups you have different breakdowns based on the projects that you do.

Pros:
  • In-discipline learning is easier and more fruitful. Devs will feed off what other devs are doing, testers will see test innovation and build on it - all because you're working with people who are thinking about the same things you are.
  • Allows dynamic resource allocations. If you need an extra tester on a project, great, we can add a tester.
  • Explicit thought leadership. You have a head developer who is explicitly charged with improving architecture and development practices. You have a QA manager who is explicitly responsible for evaluating and refining test practices.
Cons:
  • People are serving multiple masters. They're trying to help their teams and also to conform to their (function-oriented) organization structure. This leads to some conflicts of interest.
  • Higher risk of silos. If it's a separate group, then you're more likely to have problems with communication.
Use this when:
  • You lack predictability in your projects. This happens in consulting a lot, but it can happen in other places, too. If you can't predict how many devs you'll need, it helps to have a pool of devs to draw from.
  • You have unusual requirements on one or more of your groups. If you're doing some really unusual testing, for example, you may need to keep your testers together so you pick up the learning and innovation effect.
Warning:
  • Avoid this if you're attempting to embrace SCRUM or some other cross-functional team ownership mentality. The "multiple master" problem will get you in this case.

A Team-Oriented Organization

This is a simplistic example of a team-oriented organization. Each team is a group, and contains members from all relevant disciplines.

Pros:
  • Unity of purpose. The team is all working toward the same goals. There is no secondary or other goal.
  • Breakdown of silos. If you can get true team ownership, you start to find developers testing, and testers helping with product management, etc.
  • No need for functional management. The role of "QA Manager" goes away here. Instead you have team leads.
Cons:
  • Harder to drive functional change. When you have several teams with a few testers each, it's a lot harder for testers to innovate or learn from each other. The same goes for developers. The groups are simply too small to get that kind of momentum.
  • Hard to handle changing needs and moving through software development phases. You run the risk of having idle testers as you start an effort, and idle developers at the end of an effort. This is something that can be overcome, but you have to encourage cross-functional work, and be sure to plan appropriately.
Use this when:
  • You're using SCRUM.
  • You can have generally stable teams. This implies your projects (or products) are pretty consistent in size and resource needs.
Warning:
  • Avoid this if you have a particularly weak functional area (or more than one). There's a large risk that isolation within a stronger team will make them even weaker.

So Which To Use?
I've seen organizations of both types - functional and team - work great. And I've seen them both fail spectacularly. The trick is to align your teams with your development and business philosophies. Have you embraced SCRUM? Are your projects generally consistent in size and skill sets needed? Cool - you probably want a team-oriented structure. Do you have highly specialized needs in one or more areas, or an extremely lumpy (in terms of resources wanted) plan? Consider a functional-oriented structure.

In the end, pick the structure that works for you. Just do yourself a favor and pick a single structure. Trying to mix and match will lead to heartache, but pick a single way and you'll give yourself a good chance at success.


Friday, October 23, 2009

Merging

We work in what I imagine is a fairly typical environment. We code away on HEAD for a while. Once we're feature complete, we branch (so now we have HEAD and RELEASE). Then we fix stuff on HEAD and merge it to release until we hit code complete. We also go ahead with the next features on HEAD, but that's not currently the point.

The closer you get to code complete, and particularly after code complete, things get tricky. What do you merge?

There are, after all, several kinds of changes that might be candidates for merging into your branch:
  • Code changes to production code. Bug fixes, new features, etc.
  • Code changes to test code. Change the tests, not the code that actually ships.
  • Infrastructure changes. Change something underlying about the lab or environment (e.g., update the default fstab that gets installed)
So what do we do?

Code changes to production code tends to be the most commonly considered case. You evaluate the risk of the change, how much retesting you need to do, the benefit of the change (and how many of your customers are likely to benefit), and the amount of time left before you really really have to ship. Based on that, you choose to take it or not.

Code changes to test code are trickier. On the one hand, change is change and all change introduces risk. Sure, this code doesn't ship with your product, but it's still change. Plus, you have to consider risk here, too. If your test change breaks something, you might get less information out of your tests in the future, and that would be bad. On the other hand, it probably has some benefit, too: tests run faster so you can do more of them; or a test passes a failure point and lets you expose any other problems that occur later in the test; or perhaps you just spend less time looking at the error that isn't telling you anything new. For me, the bar to test code is lower than the bar to taking production code, simply because the risk to our actual (field) customers is lower, but there are some things that I try to consider:
  • Is the change caused by a failure or just a cleanup/enhancement/nice to have? The former is more likely to get put in than the latter.
  • Is the change going to fix something that causes problems for other tests? (e.g., a hang that stops all later tests in the suite from executing). A bad citizen like that is more likely to get fixed.
  • Is the change risky? The same types of analysis apply here as for product code. Avoid big, sweeping, likely-to-break-something changes.
  • How many more test runs are we going to have? The closer we get to the end, the longer we can just deal with the problem and not bother to fix it.
Infrastructure changes are generally not really optional. If you want your release tests to keep running in your infrastructure, they have to keep up with changes to your infrastructure. That being said, make sure you really need that infrastructure change, and be mindful of making the changes as small and safe as possible.


Merging is a tricky business, and the closer you get to a release the more of a "gut feel" kind of thing it turns into. So before you get into the thick of it, think about what you will merge and why. It'll save you some arguments later!

Thursday, October 22, 2009

All The Other Tests You Did

I've been verifying bugs for the past day or so. It's actually work I really enjoy. The vast majority of the time, it's concrete evidence of the product being better, which is awesome. Plus it's very easy to see the progress I'm making, which appeals to the list maker in me (I love checking things off!).

Here's the thing, though: I'm not just verifying bugs. I'm performing lots of other tests at the same time.

For example, a bug I verified was a display problem with replication progress. This is a small issue, but, hey, it's fixed, so we'll verify it.

To verify it, I had to:
  1. install two systems
  2. create a volume on one system
  3. configure replication between the two systems
  4. configure replication on the volume

So, just to verify one bug, I had to do an installation test, a volume creation test, and a replication test. All I had to do was a quick check to confirm these weren't throwing errors not visible to the end user, and then I have a small other thing done. Repeat enough time, and this adds up to rather a lot.

So next time you're verifying a simple bug, ask yourself what other tests you're doing. You may be accomplishing more than you think!

Wednesday, October 21, 2009

Today's Top N

I don't drive very often - I live and work within the city, and I tend to take the T to work and pretty much everywhere else, too. Consequently, I don't listen to the radio very often. But this weekend I was on a road trip and I had the radio going. I kept hearing the same theme over and over as I flipped around the stations:

"Today's Top 10"
"the Weekly Top 20"
"Top 5 Countdown"

And I remember thinking, "What a good idea!"

After all, the top 10 songs, or top 20 albums, or whatever, are in some ways like the parts of our test plans:
  • some of them are the same over and over again: how many weeks is one song at the top of the charts, or at position two or five? Same thing with areas of code.
  • some of them change each time: eventually a song falls off the list, and eventually we're comfortable enough with an area of our test plan that we move on.
  • they sound a little different every time: maybe this week it's the dance mix and next week it's the acoustic version - same underlying thing, just a bit different.
So why not? Let's embrace the theme!

We happen to be really close to a release. So for this week, we're having the QA Top 4 (there are four of us, so this makes it easy). Every morning, we come in and pick the four areas we're currently least comfortable with, as a group. We all throw ideas around until we agree on the four. Then we go work mostly on those four items - they're the top of our list for the day. The next day, we repeat the procedure. Maybe it's four different items, maybe some of them are the same and some new - doesn't matter, really. But that's the new QA Top 4. So we work those new four for the day.

The idea here is that we get a chance, every day, to re-identify the scariest areas of the code. And then we work on them. If they're still scary, we'll work on them again the next day. If not, we'll work on the new scariest areas.

There are a lot of ways to prioritize things, but having new ways to think about it sparks new ideas. This is just a new (to me, anyway) way to present that old risk evaluation, and hey, it's kind of fun.


(And by the way, you should see the jokes.... "Replication, by the Again Agains" at #1, and "Defect Verification" off the "Oh Boy It Worked!" album at #2. We're pretty easily amused!)

Tuesday, October 20, 2009

Good Citizen

As we're testing our software, we have lots of different kinds of requirements. We have use cases, functional requirements, performance requirements, usability requirements, testability requirements, etc. One of the requirements that we don't usually talk about explicitly is the good citizen requirement.

Wait, what's a "good citizen"?

A bit of definition:

Software that is a good citizen behaves in a manner consistent with other software, with regard to interaction with other assets with which it interacts.

That's kind of a pompous way of saying that software is behaving like a good citizen when it does what the systems around it expect (e.g., log in a way that centralized logging tools can handle it) and does create excessive load or resource usage (e.g., doesn't attempt to create hundreds of DNS entries when one will do). In other words, this is software that does what it ought to do, and doesn't behave badly.

Software that is being a good citizen does:
  • support logging in a common format (e.g., NT Event logs, etc)
  • use centralized user or machine management (e.g., Active Directory or NIS)
  • does automatic log rolling
  • can be configured to start on its own after a power outage or other event
  • can be disabled or somehow turned off cleanly (to allow for maintenance, etc)
Software that is being a good citizen does not:
  • log excessively (at least, except maybe in debug, which should be used sparingly)
  • create excessive traffic on infrastructure servers (DNS, Active Directory, mail, firewall, etc)
  • send excessive notifications (e.g., a notification for every user logging in would probably be overkill)

Normally, the good citizen requirement is not explicit. Sometimes you'll find mention of it in requirements indirectly (e.g., must support Active Directory for user interaction), but sometimes you won't. You usually won't find the negative requirements (e.g., doesn't renew its DHCP lease too often) at all. But if you miss one and your software misbehaves, you can bet you'll hear about it! Good citizen requirements are generally assumed, even though they're often not mentioned directly.

As you're testing, ask yourself, "is my software being a good citizen?"

Monday, October 19, 2009

"Certifying" Clients

We're a storage company. Lots of people write to us with lots of different programs. These run the gamut from drag-and-drop to homegrown bash script to full HSM solutions, and lots of points in between. Sometimes we'll get asked to "certify a client" application. What's going on here?

Let's break it down:

Who's asking
Usually for us sales or support is asking. Sometimes sales has a client who wants to use a program and wants a guarantee it will work. Other times sales has a client who has not picked a client and wants to know what we recommend. Alternatively, support might have a client who's attempting to use a program and finds something they don't like about it (doesn't work, works too slowly, etc).

Let's say you're not in a storage company (some of us aren't!). This could be a browser, if you're a web app. It could be a reporting or monitoring tool (anyone else ever had a client ask to point Crystal Reports directly at your database?).

Either way, someone's now looking for a guarantee that a client program will work with our software.

"Guarantee"
I'm usually afraid of the word "guarantee". You can do all the testing in the world, and a new patch of a client program will come out and break in a truly spectacular manner. Or the customer will use an obscure undocumented flag you didn't test and... kablooey! (tm Calvin and Hobbes). Or the client will install it on some totally unsupported hardware and scratch his head when it doesn't work. "Guarantee" is a very strong word that means "it's totally my problem to fix".

I usually get around this by saying, "here's what we've tested" rather than "we guarantee this".

Certification Levels
There are a number of different things you can do and call it certification.
  • The standards approach. This is where you point to some external standard and say, "we conform to this. Any client that works with this will work with us." By external standard, you should make sure you choose a public standard: NFS v3, or W3C compliance, or whatever's appropriate for you. In this case, you don't actually have to test the client. However, you'd better be darn sure you conform to the standard, or this one may eventually bite you.
  • The "we test this" approach. This is where you offer up the version and configuration you test, and you say that has been tested and will work. Any deviance from that configuration or version may work but isn't guaranteed.
  • The "certification program" approach. This is where you turn it around on the client application, and offer a certification program. The idea is that they conform to you, rather than the other way around. You offer a set of criteria, test systems (or a lab for people to come test in), possibly scripts and reporting mechanisms, and you let people run your tests. Then you analyze their results, and either put your stamp of approval on or not (think "runs on Vista", etc). If you're large enough and important enough, people will do the compatibility testing for you. This doesn't work so well if you're kind of a tiny nobody in your industry. I've not done this one personally.

So What to Do?
In the end what you do is driven by how much team, time and sensitivity you have. The real goal here is customer (or potential customer) comfort. So you do what you have to do to achieve that customer comfort, within the bounds of the worth of that customer.

My first approach generally is to do a test for that client. If this is an important client, we can get from them (or create if they don't know), a configuration that will work for their situation. Then we test this (and retest it on new versions of our code). It's client-specific, but it gives us the comfort to go to the client and say, "follow our advice and this will work."

My next approach, if this starts to get to be too much volume, is to publish a "known good" configuration (including version) of the client software. We test that client config with every release we do. We tell clients what works, and then let them experiment from there, if they need to.

These two approaches have gotten me far enough, so far. In the end, there's no substitute for trying it at the customer, but short of that, you can give them comfort. And all "certification" really means, at least in this sense, is comfort.

Wednesday, October 14, 2009

Strict

We're making a big switch in the lab; we're upgrading the underlying operating system on our machines. This is something that we kind of have to do - wouldn't want to be on an ancient OS because it only makes things like security patching harder. I can't say it's my idea of a good time, though - it's a lot of work!

Anyway, once we've done all the basics - setting up FAI, setting up build and test systems, getting the lab migrator to run (so we can move machines from old to new and back again), etc. - then we can start the tests.

We start slowly - one night on the new OS, and then not again for a while. This way teams get a chance to go through their problem areas and fix them before they get hit with the same ticket again. At this point, too, our number of failures is generally quite high, and one or two problems will take out entire swaths of tests ("can't talk to the NTP server: 42 machines can't be cleaned up!" or "stunnel configuration is different: killed two entire suites!"). It's a fairly quick and easy way to find problems that affect each of the teams. It's a learning experience, it makes a bit of a mess, and we all join in cleaning up after it.

At some later point, then, we make the decision that it's time to switch. At this point it's time to get strict. So now we worry about things in the following order:
  1. First, resolve compilation errors. If it doesn't compile, not much is going to run.
  2. Second, resolve bugs that cause machines to leak. If a test causes machines to not clean up after it's done, then they're not available for other later tests. This causes the entire lab to grind to a halt. Generally this is accompanied by cries of "we're out of machines!" and tests just not finishing because there's no machine for them to run on.
  3. Third, resolve bugs that are likely to hide other bugs. If I have a bug in my test setup, who knows what will happen when I get to actually exercising the thing I thought I was testing!
  4. Fourth, handle everything else. Once you've gotten through the first three items, then just start fixing bugs according to your preference.
The overall goal is to expose the bugs. Get things running, then follow up with getting them running right. Hopefully this whole process doesn't take you too long, but sometimes when you're mired in the land of "this underlying thing broke a lot!" it helps to step back and think about what to prioritize. You can make the whole process go a bit more smoothly if you think for a minute, then leap in and start fixing.

Good luck, and happy resolution!

Tuesday, October 13, 2009

Just Do It

At any point in time, when I sit down at my desk, there are about fifteen things I could do. For example, when I sat down after today's dev leads meeting, I had my choice of:
  • verify some bugs targeted for the next release
  • verify some bugs on head
  • answer a question from a sales engineer
  • review a potential client sizing worksheet
  • fix one or more of about three small-ish bugs assigned to me ("fetchlogs doesn't fetch when... whoops!")
  • read a white paper
  • work on an in-progress FAI server I'm configuring
In the end, which one I pick isn't nearly as important as one thing:

Just pick something already. Then just do it.

To a certain an extent, what I do doesn't matter as much as simply accomplishing something. I have a general priority list:
  • active fires
  • stuff for clients
  • stuff for sales
  • stuff other people on my team can't do
  • other stuff
Based on that, I chose to answer the email first (and do the analysis that I needed to do to provide a good answer). But I could have chosen almost any one of these, and it all would have helped our current situation. And that's the point - just do something. That'll help.

Monday, October 12, 2009

Chatting

I just read this article by Joanne Rothman, talking about a "low level buzz" with chats, emails, and talking going on all the time. She basically says that maybe having a chat open all day, or high email traffic with quick responses might work for some people but not for her.

Our team is colocated, and we still chat all the time. It's a little odd. Basically, most of us have a chat window open on our desktops constantly, and we take a look at it when we're waiting (for compilation, a grep to finish, etc). It's that low-level background buzz. In addition we have, and frequently use, the option of simply walking over and talking to each other.

We tend to use chat for things that the entire group might be interested in, like:
  • build failure notifications, and their causes
  • discussion of which branch to run in the nightly automated tests
  • notification when weekly lunch arrives (the important stuff!)
  • code review requests
  • heads up about interesting or risky checkins
  • pairing requests
  • general pleas for help (e.g, "how do I do X in perl?")
We get together from there. Someone will wander over and answer how to do X in perl, if it's non-trivial, or will start pairing. It generally works for most of us. I like it in particular because that way you don't have to feel bad about asking questions - you just ask the collective group and whoever is least concentrating at that point, or currently waiting for something, will answer. I'm not then interrupting someone who's really deep in something.

Use for yourself with caution!

Thursday, October 8, 2009

Magic Words

Just like there are some magic numbers that can point you to the source of a problem, there are some magic words that also have power. The words you use to describe a problem will color how other engineers (developers, other testers, managers, even you) look at the problem and what they do to track it down.

Just like with the numbers, some of these are common to many systems:
  • Crash. Most people will be looking for a core file and an uncontrolled shutdown in this case.
  • Hang. Often you'll get asked how long before you mark something as being hung. Also, this word means that absolutely no progress is being made. If it's progressing at a snail's pace, it's still not a hang; it's just really slow.
  • Failed. This one means completed with error. It doesn't mean "still going".
  • Wedged. This is imprecise but generally seems to make people think deadlock.
Other magic words are more specific to your product. For example, in our product we discuss:
  • Data loss. The system could lose data in this scenario (YIKES!). This one raises red flags all over the place.
  • Failure. This word means system change that resulted in the loss of availability of a system component. Generally its used for hardware failure (of a disk, node, power supply, network connection, etc). Generally "failure" does is modified by some component name (e.g., power supply failure), and is not used for the more general "software didn't work" case.
  • Simultaneous Failure Vs Sequential Failure. In a redundant self-healing system like ours, number of failures (see above) matters, as does whether the second one occurred at the same time as the first or after. Depending on which one happened, a whole different debug path will be invoked.
What other magic words are there? I'm pretty sure I've missed some.

Wednesday, October 7, 2009

Magic Numbers

There are some numbers that I call magic numbers. These are the special numbers that have meaning in the context of your system. These numbers are typically diagnostically important, or triggers to identify problems or potential problems.

Some of these are common on many systems:
  • 86400 seconds: As in, "the test timed out after 86400 seconds". This is a day. As in, the darned thing didn't finish up in a whole day. Oops.
  • 2^32 or 2^32 -1: If you're on a 32 bit system, you're starting to wrap address space here. Look for really large negative numbers where you're expecting positives, etc.
  • 45 seconds: the default connection timeout for network mounts in Windows. If something fails after this long, you're looking at a timeout probably.

Some of these are specific to your system. For example, in our system we know:
  • 6 minutes: The RPC timeout for a query to a remote system is 2 minutes, and it does 3 tries. 6 minute waits mean you're hitting this.
  • 5 minutes: The frequency of heartbeat checks for certain operations. Failover after 5 min means these never succeeded.
  • 25: default max number of simultaneous connections. Can be increased indefinitely, but if you start to see slowdowns or connection timeouts and you have 25 clients in use, you're probably going to want to change it.
  • XX: Java heap size. (I don't remember this one off the top of my head, but I know it when I see it)
What is interesting is not simply knowing the numbers. What is interesting is the shorthand debugging that it offers you. For example, if support calls up and says that a customer is complaining that the management functions are "very slow to load", and it turns out to be about 6 minutes, then the first place I'll look is to see if it's trying to talk to a remote system, and if there's some sort of problem in that communication.

It's not perfect, but knowing your system's magic numbers can often be a shortcut to finding its problems.

Tuesday, October 6, 2009

Real Time Feedback

This is a trick I use in particular for UI bugs.

The Problem
I'm working on a web-based application and am testing it across five browsers - Safari, IE 7/8, and Firefox 3/3.5. At the moment I'm working on layout and styling.

I have identified an issue related to buttons. Specifically, they look like this on IE7:


Normally I would do what I generally do when I find a bug: log it. I'd attach a screenshot, explain which browser(s) were affected, and move on. There'd be a cute title like "buttons look funny", but most of the explanation would be in the screenshot.

And then a developer would decide to work on it, and he'd move pixels around, and he'd recut the button so his CSS sprites were lined up. And he'd generally getting it looking okay. Mark the bug resolved, and it's time to go to lunch!

So along I'd go, and great, it looks fine on IE7 but now the text is way too high and falling off the button on Firefox 3. I reopen the bug (or log a new one), and kick it back to the developer.

Launder, rinse, repeat.

The Approach
Let's bypass the defect tracking system as communication tool technique that we're currently using. The developer and I set up a time to properly fix the buttons. He's got Safari and Firefox 3.5. I launch IE7, IE8, and Firefox 3. We're going to sit next to each other, make a change, and check it until we're both happy with those darn buttons in all browsers.

Here's how it goes:
  • Both of us open our various browsers
  • Both of us point all those browsers to the developer's machine. Now we can see everything in all browsers pretty quickly.
  • We make sure no one's caching CSS or JS (Often this is a configuration setting on the development environment, and probably already set up, but it's worth checking.)
  • Developer makes a change.
  • We both reload all browsers. Nope. Not there yet.
  • Developer makes a change.
  • We both reload all browsers. Success on IE7, but we broke Firefox 3. Darn.
  • Developer makes a change.
  • We both reload...
  • ... etc etc etc ...
  • Developer makes a change.
  • We both reload all browsers. Success!
  • Check in.
What We've Done
By sitting side by side and knocking out the problem, we've substantially reduced the feedback loop duration. It's easy for the developer and for QA both to see the behavior in all the browsers we care about; everyone can look at everyone's machine and we don't have any lost information in terms of visual behavior. I should note that this can work virtually, but in that case it's easier if you can see each other's screens in some way.

Obviously, this technique won't work for all bugs. But for the visual ones that are mostly a matter of messing around with CSS or with JavaScript, it tends to work really well. The total time to get rid of the bug is much much shorter than if you'd both sat at your respective desks and bounced the bug back and forth.

Give it a shot!

Monday, October 5, 2009

Always a Requirement

When you ask a customer or potential customer what their requirements are for your system, they've generally got a list. It needs to support 50 concurrent client connections. It needs to not lose their data (hey, we are a storage company!). It needs to support ingest and reads from their client application of choice. And so on.

But that's not all. That's just the ones they've thought about.

Most sales teams will poke at this a little bit. They'll go through the potential system configuration with the customer and identify some more requirements, like required replication, or limits on the amount of power draw. They will also likely identify some areas where the customer doesn't feel there are requirements. In particular, it's not uncommon to here the phrase: "There are no performance requirements."

This is a trap.

Even when there are no performance requirements, there is always a performance requirement.

The customer has expectations about the performance of any system. They may not expect it to be fast, but they expect something. To be fair, the customer probably doesn't actually know what his performance requirement is. But that doesn't absolve you of responsibility. When it takes 2 minutes to open a file off the system, that's going to be a problem. The customer probably doesn't expect 2 seconds, but 2 minutes... nope. Same thing goes for ingest, replication, page load times, pick your relevant metric.

So now that you know you have a performance requirement, your job is to suss out what that requirement is. You have to back into the customer's performance needs. To do this, look at two things:
  • Any mention of times in the project. Consider backup windows, data flows that include your system (e.g., stored on primary for 30 days and then on your backup product for 60, then on tape forever - means you have to be able to take in one day of generated data every day because that's what'll come off primary to keep primary at 30 days), the speed of current solution (if any), any number you can get. From this, you can calculate performance requirements. Number of messages passed over an interface divided by hours available for batch feed processing equals required messages per hour for you.
  • Find comparables. Most things, particularly performance, are relative. If a customer is opening a file off your system, look at how fast the customer can open a file off another network drive. That's your implied performance requirement. If a customer can open his web-based EMR page in 3 seconds, then your web-based scheduling system probably ought to load pages in about 3 seconds. You can be off from comparables a bit, but you want to be pretty close to keep your customer happy.
I'll note that you may not meet these performance requirements. Your potential customer's expectations may be completely unrealistic. The point is not to blindly meet the implied requirement; the point is to figure out what it is. If you can meet the requirement, great. If you can't, you need to reset your customer's expectations appropriately. It'll save angst later.

Any other tricks up your sleeve for implied or unexpressed requirements?

Friday, October 2, 2009

Calculating Velocity

When we're doing estimations, one of the tools I use is what I call a breadbox sizer. Basically, this is a tool to help us figure out how much work we should be committing to in an iteration - it's a measure of our velocity. Here's how it works:

Categorize the things you've done.
The idea here is to create a list of the things you work on. Usually this is features or product areas. You can also put performance testing or other categories on there. This is basically how you categorize your testing effort. For example, my categories include:
  • GUI management
  • CLI
  • Test Infrastructure
  • HA
  • Replication and snapshots
  • ...
Try to keep this between 10 and 20 categories. Fewer than that and you'll be putting too much in each category. More than that and it gets unwieldy to work with.

Put your past stories in the categories.
Go back over the last three or four (or more if you can) iterations and put the stories/features/tasks you worked on into the categories you've created. For example, we just did a story "add separate timeouts to setup, test run, and teardown". That would go in the "Test Infrastructure" category.

Layer in the estimate sizes for each item.
This part is important. Take the size of the story as you estimated it, and add that to the category for that iteration. I don't care how long it actually took you. I care about how much time you thought it was going to take up front. (We're going to use this for estimates, so I care about how much of your estimated time you actually got done. We don't need another layer in the middle to translate "estimated time" to "actual time" worked.)

For each story, decide if it was "small" (1-3 units), "medium" (3-5 units), "large" (5-10 units), or "extra large" (bigger). If one category had multiple stories, add them up and put the overall time spent in that category. You'll wind up with something that looks like this:



Translate that into work done.
On the spreadsheet we're going to total the work done. Basically, we add 3 for every small, 5 for every medium, etc. This gives us the total amount of estimated work that we actually wound up doing in each iteration. Our example now looks like this:


In our first iteration, we managed 8 units of work, total. In our second, we got 15 units of work. In our third iteration, we did 16 units of work.

Get to a common denominator
Now we start dividing by the things that change. For example, during iteration 1, one of our QA engineers was on vacation, so we had two engineers. During iterations 2 and 3, we were at full strength. So we're going to divide this up to define how much work per engineer per iteration we can do. In addition, our iteration 3 was three weeks long (doesn't matter why for this example), but the first two iterations were two weeks long each. So we divide that up, as well. This gives us a number per person per week.




Make Conclusions
So now we know that each QA engineer can do about 2.5 units of estimated work each week. When we go into the next estimation session, that's where we'll draw the line for test work. We estimate just like we always do, and we then will walk down the list committing to 2.5 units of work per week. When we run out of allotted time, we'll stop. (By the way, our 2.5 is days, so we're getting about 50% effectiveness; my general experience says that's about right, and the rest of the work day is spent on email, meetings, escalations, sales assistance, and other tasks.)

There are a number of ways to calculate velocity. Many of those methods involve calculating how much time you actually spent, getting a velocity of actual time, and then working to get your estimates to match up how much time you spent. You can choose to go this way; it's perfectly legitimate. However, I prefer the way I've outlined here simply because it avoids the fuzziness factor you get when trying to figure out how much time you actually spent (which I find really hard to do).

So see what you find when you calculate your velocity. Good luck!

Thursday, October 1, 2009

Our Diva Moments

Many of us consider ourselves rational creatures. We like to think that we evaluate tools and environments on their merits and come up with the best tool for the job.

Then the flame wars start:
Linux vs Microsoft
Emacs vs vi
Language zealots (I think most languages have zealots)

There are some serious cliches here. And in almost every case, the arguments devolve rapidly from rationality into "well... just because!".

When a decision is unfounded by merits, and is based solely on some irrational belief or preference, that's what I call a Diva Moment. "I use Ruby all the time because I just can't get anything done in a language as overbearing as Java!" Diva moment. "I can't possibly work on a mac; it's just too cute." Diva moment.

Most of us have our diva moments (or our diva topics). I, for example, refuse to use iChat because it just feels wrong. So I use Adium. Would using iChat kill me? Nah. Is iChat plus Messenger really any worse than Adium? Not really - both of 'em let me talk to people. It's just my diva moment.

It's normal to have diva moments, but we need to recognize them for what they are. And then we need to recognize the cost. That guy who refuses to use Java might miss out on a cool new job because it's a Java shop. Sorry, buddy. The anti-UNIX geek creates a persona, intended or not. That is the choice we make when we choose our irrational preferences. And that's okay. Just recognize that it is a choice, and the choice is yours, and the are yours consequences, too.