Friday, October 31, 2008

Who Cares?

We're in the throes of a release cycle, which leads to all sorts of fun conversations, many starting like this:

"So, is bug 123 a blocker?"

Well, that's an interesting question. Like many organizations, we have guidelines for this sort of thing:
  • if it results in data loss or corruption, it's a blocker
  • anything that makes the system crash is a blocker
  • anything that's going to create excessive support calls is a blocker
It's more subtle than that, though. If a blocker will make you miss the release date, and you have revenue riding on that release date, is it still a blocker? How about if it won't affect the customer providing the revenue?

Ultimately, for each bug the real way to understand if it's a blocker is to ask:
  • who cares?
  • what will this entity who cares do if they hit this bug?
  • what are the consequences of fixing this bug?
  • which is worse - what happens if the bug occurs in the field, or what happens if we go ahead and fix the bug?
If it's worse to fix it, then it's not a blocker. If it's worse to hit it, then it's a blocker.

Ultimately, whether an issue is a blocker depends on who your real customer is and what they will do in the "fix it" scenario and in the "don't fix it" scenario.




* An aside for the (quite large) school of thought that says testers provide information and do not make these decisions: well, I'd rather not get into that argument. After all, we're not talking about who makes the decisions here; merely about how the decisions get made.


Thursday, October 30, 2008

Too Busy For a Solution

Go read this about being so busy dealing with a problem that you never get to fix it.

Sure, we all know we need to balance today and tomorrow, but that's one of the first things it's easy to lose sight of. And yet you have time to read this blog.

So here's the deal I'll make with you. I'm going to stop this particular blog entry here and save you the 30 seconds more you would have spent reading this.

You breathe. Take the time you just saved and think of one thing you can do to make your tomorrow better. Then go do it.

Wednesday, October 29, 2008

Wacky Alternate Methods

Welcome to my anti-cygwin tirade.

First, let me say that cygwin has its place. It's great when you have a lot of utilities that are UNIX-based and you need to introduce Windows. For example, my company uses a reservation system for machines. That reservation system is a UNIX script (I'm oversimplifying slightly), and cygwin lets us use the same reservation system for our Windows machines.

But

Cygwin is a crutch.

Because we have cygwin...
  • we can mount drives through cygwin instead of standard Windows methods
  • we can copy files using cygwin rather than through drag-and-drop or Windows copy commands
  • we don't have to learn Windows
That crutch has made us weak, in particular because of the last part. Just because you can get a UNIX-like environment on Windows doesn't mean you should always use that environment. Just because you have a crutch doesn't mean you should always use it. Save the crutch for when your leg is broken and you have no other choice.

End anti-cygwin tirade. Promise!

The problem really isn't cygwin. The problem is choosing to use that instead of the native (Windows) OS wherever possible. The problem is that when you use cygwin you're not really doing what your users do on Windows - you're using wacky alternate methods. So you copied a file with cygwin. That's different than if you copy a file with Windows Explorer, and one day you're going to miss a bug because of that.

If you really do have to put something powerful and rather odd on a system (aka a crutch), that's fine. Do it. Just recognize that you're forcing something into a place it doesn't quite fit, and don't use it for anything but that one purpose. Wherever possible, use the native functionality instead. Is it more work? You betcha. Is it better in the long run? Absolutely.

It may be a bit unfamiliar, but you'll learn it, and in the end you'll be much better for having both.

Tuesday, October 28, 2008

Bugs As Records

A bug in a defect tracking system has a lifecycle:
  • it's logged
  • it's triaged
  • it's discussed
  • it's fixed
  • it's verified
  • it's (maybe) reopened
Most of the time, after a bug is fixed and verified, no one ever looks at it again.

But....

There is one other major use of a bug, and that's as a record.

When you don't know what's going on, and don't know where to start debugging an issue, the defect tracking system can be a great reference. Look up the error message you're getting and see if it sparks anything. Look at all the things you might find:
  • Maybe you have to reopen the bug (oh no!)
  • Maybe you find out that the bug was fixed a bit later than you remembered and the fix is in the next release, so you've just found another occurrence of the problem
  • Maybe it didn't happen quite that way, but it points you toward another log that has some interesting information
  • Maybe that module didn't throw the error, but another one did and the calling module is the same
  • Maybe you find out that this really has never happened before (at least that your defect tracking system knows about)
The point isn't that looking at the defect tracking system may help find a duplicate. The point is that even if it doesn't find a duplicate, looking at the defect tracking system may help you think about the bug a bit differently.  It's another way to think about solving the issue.

So when you're puzzled by a problem, don't forget that closed bugs are a resource, too.

Monday, October 27, 2008

Bug Verification Checklist

Verifying bugs is a bit of an art. It's also a time to make very very sure you're right. After all, there are only four possible scenarios:
  • You verify a bug and it's actually fixed. This is what we want to have happen.
  • You verify a bug and it's not fixed. This means you're going to find it in the field. Your customer will be unhappy AND you'll have egg on your face. Not good all around.
  • You kick a bug back to dev and it's not fixed. This is the second best scenario; a fix would have been better but, hey, at least we caught it. In the end, it's not much worse than finding the bug in the first place.
  • You kick a bug back to dev and it's actually fixed. This is where we waste time. It's a minor embarrassment and it erodes developer trust in you a bit (rather like finding a bug that's clearly not a bug).
So, with only one happy outcome, one mediocre income, and two chances for us the testers to embarrass ourselves, let's approach defect verification carefully.

Here's what I do to verify the bug:
  1. Make sure I'm running a build that has the fix in it. In particular when there are a number of branches this is something that needs double-checking. Rely on check-ins and build tags for this, not on bug comment time stamps.
  2. Repeat the steps that reproduced the issue and make sure the behavior is what I expected. This is the obvious part. I try the thing that broke before and see if the behavior has changed. If there's zero change in behavior (i.e., the exact same thing happens), I'm really suspicious - after all the fix attempt is likely to have at least modified the system behavior, even if the fix is complete.
  3. Make sure I got all the way through. I can't prove this bug is resolved unless I can prove I exercised the thing that used to cause the bug. A failure before we even get as far as the bug leaves me in a verification form of Shrodinger's Cat - I can't prove whether it was fixed or not!
  4. Look for markers of resolution. Often a bug fix will include a mark or a note that is a secondary way to know the bug was fixed. Usually this is in the form of a message that does appear in the log (XX complete) or that does not appear in the log (failure message "blah" does not appear). Look for the positive indicators of a fix - success message - in addition to the negative indicators of a fix - lack of prior error.
  5. Reread the bug. Think I got it? Great. I'm going to read the bug one more time, start to finish, especially with a really long bug. Maybe the behavior morphed over time, or maybe there is a reference to another problem that sometimes hides this problem. Maybe there's a reference to some documentation change that should be opened as a separate request. Once this bug is closed out, it's unlikely to be read again, so make sure you get everything out of it that you need.
Once you've done all that, only then can you mark the bug as verified or failed, depending on what you found.

Friday, October 24, 2008

Wabbit Hunting

There are two kinds of escalations that come in from support: those where the issue is still ongoing and those where the issue is no longer occurring but we'd like to understand what happened so we can fix whatever first caused the problem.

I like to think of these as issues that are alive and issues that are dead (and just need a postmortem).

Issues that are dead are in their own way more simple. You take the logs, the issue description, and any other information that has been gathered, you apply your 5 whys or other form of analysis, and you state what you believe occurred. Since the issue isn't ongoing, proof is difficult to come by; you're looking for the most likely cause and what you can do to prevent recurrence of that most likely scenario.

Issues that are alive - that are still ongoing - are different. Now we're wabbit hunting.


The issue is still occurring. Either it hasn't been fixed at all, or recovery has been attempted and the problem has happened again. Your goal here is different; it's not about finding ultimate cause now. It's about getting the customer running again.

To be sure, a lot of your analysis techniques still apply, but don't be afraid to start fixing. This isn't the time for a leisurely analysis. It's a time to balance analysis with action. Got a problem? Great, get that problem to stop. Then see if there's another problem. As long as you're nondestructive and you're actually looking for the cause of the problem rather than simply hiding it, getting the customer up trumps creating an elegant theory. 

"What could we have done better?" is a question for a dead issue. 
"What can we do now?" is a question for a live issue.

To stretch the analogy: Shoot the wabbit. THEN figure out how it got into your garden.

Thursday, October 23, 2008

Extraneous Parts

What's red and smells like blue paint?
.
.
.
.
.
.
Red paint.


Remove the word blue and the joke gets easier, although less of a groaner.

So drop the extraneous parts, find what doesn't matter, and the answer will be much more clear.

Wednesday, October 22, 2008

"With Just a Few Small Changes"

I just ran across this article on trying baseball today. I encourage you to read it (it's funny, or at least I think it is!). The upshot is that you can say you're doing something "with just a few changes" and be totally off base.

This lesson applies to test infrastructure as well.

What We Do In the Real World
One of the things we have to test is an upgrader for our system. Here's roughly what happens:
  1. upload the upgrader package
  2. run an upgrade prep script
  3. run the upgrader
  4. wait while the upgrader checks the health of the system
  5. wait while the upgrader puts the system in maintenance mode
  6. wait while the upgrader deploys packages to the system
  7. wait while the upgrader reboots each machine in the system
  8. wait while the upgrader reapplies specific configurations
  9. wait while the upgrader puts the system back into normal mode
  10. check everything
What We Do In Test
The good news is that we have a number of automated tests for this very process. They run automatically and assert that various things that are supposed to change during an upgrade change. They even assert that various things to be preserved across upgrade are preserved.

However, in our automated tests, we're exercising upgrade with just a few small changes. We deploy packages just a bit differently, for example. We have some extra shares mounted to get at test code, etc. So it's an upgrade test... almost.

Is It Legitimate?
So is our automated upgrade test legitimate? Can we really say we've gathered some useful information?

I think it's a pipe dream to say that your test automation is going to be exactly like your manual test, or to say that either of those is going to be exactly like what will happen in the field. Instead, we have to strive for best achievable.

The goal isn't necessarily the same. The goal is to understand the differences and account for them. I'm not going to worry, for example, that my upgrader runs on a system that faces south when most of our customers deploy with systems facing east; that's a difference I'm willing to accept. I understand that package deployment changes mean I may be able to install a package in my automated test that would never install in the field, so I cover that with manual tests that more closely mimic the real world. So what's the lesson here?

Understand the changes you're making - intentionally or unintentionally - and do not expect your tests to provide information in areas that you've increased your risk by making a change. Each test has it's purpose; a change from the field simply informs and influences what that test's purpose can be.



* Note that what I've shorthanded here as "automated tests" means "test scripts that run from a cron job every night on the latest code and make assertions without human intervention." Human intervention is required to identify the cause of any failures or errors.

Tuesday, October 21, 2008

First Step

We've been talking about the way we develop software, and as usual with process discussions, it seems like a great idea to make huge sweeping changes.

"We'll define a good process, and then switch to that!"

No. Let's not.

Let's do something achievable. We'll pick the one thing that bugs the most people the most, and we'll make that better.

We write software incrementally, making it just a little better each day. Let's treat our process the same way: make it just a little better each day.

Monday, October 20, 2008

High-Volume Bug Lists

There's a bit of back story to this:

Last Thursday, I wrote about parent and child tickets, which we use internally to help with root cause analysis and that period between analysis of an issue and fixing the issue. Glenn Halstead commented that it "seems to assume that all test failures result in a defect being logged."

So I thought I'd step back and talk about how we handle our automated tests and repeat failures, etc.

Here are the basics:
  • Every night at 7:05 pm the automated tests start. (We call these tests "nightly".)
  • The test runner machine triggers a build.
  • The test runner machine then starts running all the automated tests. There are details about how this is done, but we can talk about that later.
  • (Hours pass)
  • In the morning, someone in QA checks on the progress of the automated tests. We use a script that gives us a quick overview of progress.
  • When the automated tests are done, the test runner machine is quiet.
  • QA then performs what we call triage.
So what can happen in triage?
There are several things that can happen:
  • The test finished successfully. (Hooray!)
  • The test finished but there was a failure (by this we mean assertion failure).
  • The test never finished (by this we usually mean hung or timed out)
  • The test finished but threw an error not in an assertion.
  • The test finished but left its infrastructure (machines, network configuration) in an unclean state.
That's a lot of options, I know. Let's address each in turn:

The test finished successfully.
What It Means: This is the standard test passes case. This case covers the vast majority of tests.
What QA Does: Nothing. The passing test result is already in the test logs, and QA takes no further action.

The test finished but there was a failure.
What It Means: This is what I think of as the standard "test fails" case. There was an assertion, and the actual result did not meet the expected result.
What QA Does: Logs or updates a bug. This may be a new bug or added to the bug for an existing issue, but either way, the failure gets noted every single night right in the defect tracking system. This makes it a lot easier when we're later looking at a bug and trying to identify frequency of failure, when it started failing, if it's still happening, etc.

The test never finished.
What It Means: One of two things happened here. The test may have timed out and been cut off. Or the test may have simply sat there "forever" (where forever actually means until it had been far longer than the test should have taken and a human went in and killed it). This happens for us when a test is waiting for a copy to finish over a mount that has died, for example.
What QA Does: Logs or updates a bug. Again, the bug might be new or added to an existing bug. What we're looking for here are bugs that aren't in assertions. Sometimes these are test infrastructure bugs (the test killed the mount underneath the file copy, for example), and sometimes this is a symptom of an issue that the test simply doesn't directly assert for (say, for example, a kernel problem not properly closing the connection when timing out NFS mounts).

The test finished but threw an error not in an assertion.
What It Means: These are what I think of as "crufty" bugs. They're not going to kill you, probably, but they sure make later debugging of client problems harder because there's a lot more noise in the logs. And often they're indicators of an inefficiency. For example, maybe we're trying to use an interface before the network is configured. As long as there's a retry it'll work eventually, but it means you have a logic flaw in your code.
What QA Does: Logs or updates a bug. As usual, the bug might be new or added to an existing bug. These tend to be lower priority bugs and may not get fixed as quickly, but they are still in the defect tracking system. Where this comes in really handy is when you're analyzing logs from a client site, and you see a bunch of errors. A quick search of the defect tracking system can reassure you that sure it's an error, but it's not fatal, and you need to keep looking for the real problem.

The test finished but left its infrastructure in an unclean state.
What This Means: Our tests are expected to clean up after themselves: remove mounts, tear down special network configurations, etc. This way the machine that the test is running on can be used for another test and we know it's clean. When this doesn't happen, a machine "leaks" (i.e., cannot be used until it's manually fixed).
What QA Does: Handles an automatically logged bug. This is the only instance in which our test infrastructure logs a bug automatically, and QA cleans up the machine and figures out if there's a bug or if it's something like failed hardware.

There's a definite pattern there: if the test doesn't succeed completely - including setup, running without error, passing all assertions, and teardown - then it's almost certain to result in a bug. That bug might be new, or it might be a notation in an existing bug. This means we have a pretty high volume of bugs, but it also means our collective test history is pretty much contained within the defect tracking system. For my part, I think it makes pattern analysis easier, and it makes sure we don't miss a bug - any issues we're not directly asserting on still get caught and handled.

Friday, October 17, 2008

Fun Words

If I were writing a technical dictionary, I would definitely include the following words*:

Horked: Broken, generally in an interesting and not yet explained way. Has connotations of machine unresponsiveness or other major breakage.

Slew: More than a few, less than a ton. Varies by the general size of what you're talking about (slew of bugs, slew of files, slew of checkins), but typically something two or three times the normal volume.

Dinky: Small and unmeasured (except by gut). Can have good connotations, as "that replication took a dinky amount of time!" or bad connotations as in "that was an awfully dinky file for it to fail on".

Ton: A lot. Generally at least 10x normal volume. Can refer to pretty much anything, usually something that the person speaking has to handle.

Greedy: When a thing is using resources and others (things or people) are feeling the lack of resources. Often refers to the largest or most aggressive consumer. For example, "the automated tests are sure greedy with CPU! My report can hardly run." This can go along with anthropomorphizing scripts and other machine actions, as in "continuous is greedy with lab machines".

Sluggish: Slow but not measurably slower than normal rates. Almost always refers to times or rates, as in transfer rates or run times for tests.

What else would you include?




* Note that I really did hear all of these (or say them!) at work this week.

Thursday, October 16, 2008

Child Bugs

Here's a dilemma I've been trying to figure out.

Sometimes we have multiple bugs that turn out to have the same root cause. And sometimes we know that they have the same root cause but we haven't yet fixed it. So the bug keeps happening. With a manual test, that's no big deal - we just don't log it again. But what do we do when it's an automated test and it just... keeps.... failing?

The Normal Case
If we're in a normal case, this is an isolated bug that only shows up in this same test. We just keep logging the failure into the same bug. This way, when we go back we can see how often it failed and if there were any other factors that affected the way in which it failed.

The Root Cause Case
It gets a bit more difficult when you have multiple tests failing. So now I've got two (or five or ten) bugs, and they have the same root cause. How do we handle this? We've got options:
  • Close one bug as a duplicate of another
  • Leave the bugs separate and just comment that they have the same root cause
  • Merge or otherwise combine the bugs
  • Create a bug for the root cause and link the different instances in (how you link is up to you)
We use the latter method and call it "creating a child bug". We log a new bug just for the root cause, and then mark all the other bugs as children. Then we keep logging errors into the child bugs.

The upside to this is that it's easy to see the root cause (and all analysis based on that) without cluttering it with test failures. The downside is that you have to go to each of the children to ensure that it really is fixed in all situations.

How do you handle this scenario?

Wednesday, October 15, 2008

Quiet Mistakes

This morning I was coming back from the gym, heading to my apartment. In the lobby there was a man talking to the guy who does the dry cleaning. In the time it took me to cross the lobby and get on an elevator (about 90 seconds), it had escalated from a conversation to shouted accusations on both sides and repeated imprecations: "Don't call me a liar!"

Way to resolve the situation, guys.

Our jobs are often about telling others that something is wrong, and it's very easy for this to get negative quickly. Things have definitely gone downhill if we're getting into arguments about "your code is wrong!" and "how can you think someone would ever do that?".

Many testers and quite a few developers will pay lip service to the value of test (or QA). It's about proving that things work as well as we hope. It's about not being surprised when the product hits the field. It's about gathering information for others (the "them" in this particular us vs them mindset) so that they can make good decisions.

When a bug gets logged, though, there's a little bit of a "darn! screwed up!" feeling for the recipient of the bug. Heck, I've been on the receiving end of these, and I still think that every single time. It's not that I'm sorry someone found the issue; it's that I'm annoyed with myself that I didn't think of that.

So how do we keep this from going negative?
  • Be polite while logging bugs. This is hugely important. "Nanny nanny boo boo" should never be part of a bug report.
  • Handle the negative in private and the positive in public. This is the mistake that man made when I was walking through the lobby this morning. It's a lot easier for someone to say - to themselves or to others - that they screwed up or that they missed something when it can be done quietly. Leave people their public pride and talk about problems only in the audience that needs to know; public scourging only makes people more defensive.
  • Don't overlog your bug. This kind of hitting-someone-over-the-head with their problem . Log the bug once and move on. Don't keep finding followups (bug 12346: "because of bug 12345, which crashed the system, if I then attempt to write to the system it fails!") that are obviously direct consequences of the bug. Be complete and specific, and then let it go.
Because of what we do all day, we're likely to end up in situations that turn negative. It's incumbent on us to do everything we can to avoid negativity and instead work on solutions. 

Tuesday, October 14, 2008

Laptops in Meetings

I've heard two schools of thoughts about laptops (cellphones, iphones, blackberries) in meetings:
  • School 1 says: Meetings are for paying attention. Get in, get business done, get out. No laptops (et al).
  • School 2 says: We're all adults here. Don't treat people like kindergartners by taking their toys away.
Now, here's the thing. I like to say that I'm all for School 1. However, I'm sitting in a meeting working on this blog entry. And at least a number of people in the office (hello, informal survey) are the same way. We take laptops to meeting because we feel like we're more efficient that way.

So here's the deal I will make with any person running a meeting I'm in:

I won't use my laptop if this meeting is effective.

Parsing this out:
I won't use my laptop
  • I will pay attention
  • I will have my laptop closed or not present at all (unless I'm the official notes taker)
this meeting is effective.
  • The meeting starts on time
  • The meeting ends on time
  • The meeting has a goal (and usually an agenda)
  • The meeting ends when there's no more to say, even if it runs shorts
  • The meeting doesn't change purpose just because a useful group of people happens to be in the room
Hopefully no one treats anyone like a kindergartner, and we can all spend a little less time in meetings.

Monday, October 13, 2008

Please Ask

We all have those days (or weeks, or even months) where we're struggling hard. Too much work, too few hours in the day, and everything you touch just falls apart. It's always tough to ask for help, because you're just having a bad day/week/month, and it's really easy to tell yourself that you're a good worker and you'll just power through it.

Don't.

Stop.

Ask for help.

Now picture being on the other side of it, watching a peer struggle. You want to help, but when the other person keeps saying, "I got it,", you have to either believe them or start managing them. The former leads to continued struggle. The latter leads to a power imbalance and eventually to resentment all around.

Don't put anyone else in that position. Your struggles are not yours alone. Ask for help; you'll be helping more than just yourself.

Friday, October 10, 2008

Overheard Phrases

A QA guy and an IT guy walk into the development lab, and this is all we hear them say: 

"It's easier to ask for forgiveness than permission."

At this point you learn how much you trust your employees. Do you go see what's going on or do you keep doing what you're doing?




For the record, I stayed where I was. Whatever's going on, it's either right or it will be fixed quickly. Either way, they can take care of it.

Thursday, October 9, 2008

Pairing is Not Just Programming

At work, we spend a lot of time pairing. No code gets checked in to the source tree without a pair or a reviewer, and there's a definite bias toward pairing. Now, most of the time you hear about pairing, it's shorthand for pair programming. So how does pairing help a test organization? Just the programming we do?

No way. You can pair in a lot of ways.
  • Pair testing. We do this a lot. Think exploratory testing, only with two heads instead of one.
  • Pair programming. Yes, we do this. Not so much for quick scripts, but in particular for test harness modifications and additions.
  • Pair issue analysis. This we do with support quite often when they have a problem they can't solve. Sure, we can take the logs and the problem description and get back to them with a notice that a hard drive failed or something. Or we can pair with a support person and that person can learn exactly how we figured out a drive failed.
Pairing is something I really enjoy having my team do. It increases our productivity and focus (it's a lot harder to waste half an hour on your newsreader when someone's sitting there working with you). It helps alleviate the transience of institutional memory. It increases confidence that the work is well-done; issues have fewer leaps of faith if someone has had to explain each step to someone else, and designs are stronger when two people are looking at them.

All in all, I'm a big fan of pairing for a lot of test work. Pairing isn't just for dev!

Wednesday, October 8, 2008

Your Leg Ain't Broke

Yesterday I wrote about how asking questions is great. And it is!

Sometimes, though, you need to think about who or what you ask. Think of all the people/things you can walk up to and ask a question:
  • Other people. Preferably they are more knowledgeable than you are.
  • A system. After all, isn't testing ultimately just asking questions of a system with purpose?
  • A book. This is often called "research".
  • Yourself. Sometimes you know the answer and just need to think about it for a while.
So please, ask all the questions you want. Just be sure you're asking the right questions of the right people/things. A really fast way to look needy and make others not want to help you is to ask questions that show you didn't put any effort into thinking it through yourself. Don't be lazy. If you can answer a question yourself, do it. If you can't, then go ask someone else.

A simple example:
Someone I work with goes to a meeting every week. So the question here is, "what am I going to be asked?" Don't go ask that meeting coordinator, "hey, what are you going to ask me?". Stop for a minute and think about it. If they've asked about bug counts every week, they're probably going to ask again. If they never ask about the temperature setting of the air conditioner, they're not likely to ask now.

Once you've come up with a list, the question changes and becomes more refined: "I know we're going to cover X, Y and Z. Anything unusual I should prepare for?". Now the person you're asking knows you've put some effort into it, and your chances of getting a good answer just went up.

Tuesday, October 7, 2008

Simple Questions

Asking questions is a wonderful thing. Even a simple question that you feel kind of dumb asking is generally a good question to ask. For example, one of the questions we're working on is:

How does WidgetDataMover write files?

Sounds easy, right? In the WidgetDataMover GUI there is a drag and drop feature that looks pretty much like Windows Explorer. You click on a file on the right side, drag it to the left side, and release the mouse button.

But....

Under the covers it does the following (all after you release the mouse button):
  • Write the source file path into a database with a pending flag
  • Check to make sure that the destination path is written and writable
  • Write the destination path into the database
  • Move the file
  • Set file attributes
  • Write the success of the write into the database
That simple question is actually not so simple!

Never be afraid to ask a simple question - there's generally a lot more under the covers than you think.

Monday, October 6, 2008

Codeless Bug Fixing

Someone and I were having a conversation today, and at some point the following exchange occurred:

Him: "We should just get rid of oldScript entirely since newTool does the same thing."
Me: "But there are still bugs in newTool."
Him: "It's just that newTool counts things in a slightly different way."
Me: "Satisfactory explanation of the discrepancy counts as a fix."

The point here is that fixing bugs does not always require changing code.

After all, a bug is ultimately a difference between expected and actual (behavior, appearance, performance, experience, ). Resolving that difference is the point of fixing a bug. However, and this is the tricky part, there are two halves to the difference - the software and the user (person or system). Either side, or both, can be changed to fix the bug.

The myriad ways we can fix a bug include:
  • write, modify, or delete code to alter the behavior of the system
  • explain what's going on such that the user no longer perceives the behavior as unexpected
  • modify the configuration of one or more elements in or outside the system, resulting in modified system behavior or user experience
  • change the hardware or underlying components to produce a change in system behavior
  • some combination of the above
It's easy to look at a bug and then immediately look at the code. Sometimes, though, it's a better idea to consider the system as a whole and figure out where your fix really ought to be. There are, after all, several valid choices. The code is important, but sometimes your bug fix can be codeless.

Friday, October 3, 2008

Quiet At The End Of The Week

My weekly schedule looks something like this:

Monday: A couple of meetings, say 75 minutes total
Tuesday: Meeting day, part 1. Team meetings, standing support training, and dev leads meeting, total about 3-4 hours
Wednesday: Meeting day, part 2. Client issues meeting and XP customer team meeting, total about 2 hours.
Thursday: QA team meeting, say 45 min.
Friday: No meetings.

I have two thoughts about this:
  1. Ouch that's a lot of meetings
  2. Fridays are amazing
Let's leave the "ouch that's a lot of meetings" for later.

Having a day without meetings, especially at the end of the week, is a huge boost to productivity. I'm very strict about not having meetings on Fridays, and I get a lot of work done. So why Fridays?
  • I have a backlog of work from the week, so I know what I need to be doing.
  • I have the weekend available if it spills beyond Fridays. Oddly, knowing that I have that extra time seems to help make me more productive since I spend little to no time worrying about what will happen if I don't get it done. After all, there's always the weekend, even if I don't use it.
  • Friday nights are "date night" for my husband and I, and this way no meeting ever runs into date night!
  • If I'm going to take a long weekend, I can take Friday off and not have to worry about moving a lot of meetings around.
  • Conversely, the office tends to be quieter on Fridays, with people either taking it off or otherwise just plowing through their days.

Sometimes it is really hard to do, but give yourself that quiet no meeting day at the end of the week. It's the best part of the week.

Thursday, October 2, 2008

Complain, Then Educate

I've noticed that when we testers get together, there's kind of a lot of complaining.* And the complaints often go something like this:

"I'm a TESTER. Why don't they understand that my job is to provide information? It's not to find bugs! It's not even to validate the system! What does validate even mean? It's not like it's possible to prove the nonexistence of problems!"

First of all, it's great to have these kinds of discussions in the testing community. Second, it gets just a bit repetitive. I've been a tester for a number of years, and even when I came in, the voice of the community was pretty consistent that test wasn't about finding bugs. So why do we still talk about it?

The problem is that although we in the (enlightened, of course!) testing community understand for the most part what testing is and is not. Our job now is to educate people tangential to the testing community - testers who don't participate in thinking about the purpose and future of test, developers who haven't worked with a modern testing group, executives and sales people who think that they can outsource the finding of bugs and we just need some metrics for success. So let's talk education. How do people, and more importantly, how does a corporate entity, learn to value the things we value and learn to measure things that actually describe our success?

I make the following assertion:

Corporate and community learning occurs only when that learning provides value to the corporation or community.

So the trick is not "how do we teach people?", but "what value will changing their perceptions provide?" And how do we measure value to the community or corporation, or more precisely, how does the community or corporation measure value?
  • It must be concretely measurable. "80%" or "$15000" or "3x" is good. "Better" is not good.
  • It must be reinforced with the tools and terms you use. If you don't want to be thought of as a bug finder, don't bring in the "bug hunter" paraphernalia. And don't go into every meeting with the defect tracking system open. Go in with a reporting mechanism more like the business side of the house uses, whatever that is. Summaries are your friend.
  • It's your job to constantly portray the way you want to be seen. Don't be part of the problem.
We can educate our consumers - corporations and communities - but we have to use their language and fit within their framework. Trying to force wholesale change is going to result in us beating our heads against the wall. Create adaptation by causing change within the framework of current thinking. We'll get a lot farther that way.


* Complaining is definitely not limited to testers. Developers do it, dancers do it, nurses do it - I think we're all pretty much complainers.

Wednesday, October 1, 2008

Bug vs Feature

It's pretty common for something to come up, and for the question to eventually be raised:

Is this a feature, or is this a bug?

This one can cause a lot of hand-wringing and a lot of heated tempers, although I'm really never sure why. It gets into the "well, it SHOULD do this, so of course it's a bug and we need to fix it really fast" versus "we didn't screw up and this is something new to us so it's a new feature" argument. This is the realm where people get a bit defensive, particularly when you're facing a cluster of problems that are causing pain.

In truth, it doesn't really matter. Both stories and bugs need to be prioritized, tracked, implemented/resolved, tested, and shipped. However, to forestall meetings where this is argued over, I tend to publicly apply a pretty simple rule:

If it's something we've never tried to do before, then it's a feature. If it's something we've tried to do and have messed up or missed an edge case, then it's a bug.

It's not perfect, and there's still a lot of grey, but it sounds definitive. Usually it's enough to drop it in one bucket or another and move on. Anyone got any better ideas for distinguishing feature from bug?