Tuesday, March 31, 2009

Physical Labor

We've just moved offices - not far, just across the street. However, that meant the whole lab - some 500 machines - had to be turned off, unracked, moved, reracked, and recabled.

I have to say, I really enjoy the physical labor.

Not often, and not for long, but standing there plugging in cables provided a great time to think. One part of your mind is muttering to itself "port 23", but the rest of it is free to think. No meetings, no interruptions, no email. Just cables, and thought.

At the end of it, I'd figured out how I want to restructure my test plan for the next major release.

If you have to be at work, and you need some time to think, go find a physical task to do. It's amazing how much you'll get done!

Monday, March 30, 2009

If Your User Were Different...

I find myself sometimes getting into a rut particularly around users. We've met some users, and we've done some research on usage models, so we're not completely in the dark. However, it gets really easy to start making assumptions about the product based on the usage model you think you know.

So every once in a while I find it a useful exercise to challenge my assumptions about the product by changing my assumptions about our users.

Our user is generally an enterprise employee - think big company with thousands of employees. There are two main classes of users: storage admins, end users. End users pretty much just care about having a big storage space in the sky that they access with a network mount that's always there - space and uptime are pretty much it. Storage admins have more needs, mostly around ability to set caps on storage use, ease of seeing who's using what space, etc. Pretty simple, in the end.

But what if we change those assumptions? Let's assume that we're now going to sell to the lone IT guy in a 50-man company. What's different? What about our product is now hard to use? What's easier?

You may never sell to your mythical user. You may never change your product to accommodate this mythical user. The point is merely to shake some assumptions loose. The point is to change your thinking and get you questioning what should really be an assumption and what shouldn't.

What do you do to change your cobwebby assumptions?

Friday, March 27, 2009

IIS Localhost Note

I'm putting this here mostly because I ran into it yet again, and I wanted to remember for next time.

Setup
I installed IIS on a computer, then installed ASP .NET 2.0. I then installed my default website (in this case, it's a product called Brocade FLM). Lastly, I went into the website properties in IIS Manager and set ASP.NET to v 2.0.

The Problem

When I went to http://localhost/FLM, I got a standard 403 (page cannot be found) error. When I went to http://[my computer IP]/FLM, I got the login prompt, but it redirected to a 403 page.


The Fix

In IIS Manager, click on Web Service Extensions.
In the left pane, select ASP .NET 2.0 and click "Allow" to enable those extensions

I'm now able to load my website at http://localhost/FLM no problem.

Not rocket science, but I had forgotten and must have spent 10-15 minutes on this.

Thursday, March 26, 2009

How Long Is Your Forever?

I found this in an old ticket I had logged:

Note: For this bug, "forever" means 75-90 minutes.


I'm apparently quite impatient!

Wednesday, March 25, 2009

Hard to Accept

There are a lot of different names for the units of work that go through development and wind up in QA: features, stories, backlog items, functionality, tasks, bugs, work items, etc. Whatever you call them, they're something that ultimately has to be accepted (tested, verified, confirmed, checked) by the testers in the group.

Sometimes this is really simple: button X used to say cancel and now it should say Cancel. Check the install case, the upgrade case, the doc, any marketing or training materials, and you're pretty much set.

Sometimes this is more complex but achievable: a new login mechanism. There are numerous test cases to try here, but in general it's a straightforward problem.

Then there are the really hard ones. How do you accept a story (feature, backlog item, etc) that simply says, "spend 1 month hardening the HA mechanism in the product"? It's really tempting to say something like, "all existing tests must pass, and the bug count for this particular product area must be <>

I think this is wrong, though. In a sufficiently fast-moving product there may not be a time when all tests pass on all branches, or that window may be really tiny and very late in the product cycle. Do you really want to wait that long to decide whether you really think you've accomplished your goal?

I think instead that stories (features, backlog items, etc) should be treated as unacceptable (ha!). These aren't things that are going to be tested directly, and therefore they shouldn't be accepted directly. After all, what you're really hoping to accomplish is more of a general statement about that area of the product - "It's more stable" - than a specific goal - "It's 2x faster". So treat it as a general acceptance, and fold it into the general testing that you're doing, both during and after development. Also, make sure that any substories that are specific can be drawn out and tested separately. Whatever's left - this general statement - is the thing you have to treat generally and test generally.

There's danger in attempting to make the general too specific; the things you've left out of the generality are the things you're not testing - and that's risky.

I'm not completely sold on this yet, but it's the best thing I've come up with.

Tuesday, March 24, 2009

Don't Always Go Together

I think words are great. There are words that are rather fun, words that are Dilbert-esque in their business speak ("utilize", anyone?),  really pompous words, jargon....

But then you start putting words together, and phrases don't always have meaning, but it gets really fun.

An Oracle splash screen: Productivity with Choice

I have no idea what this means. Without choice, we just twiddle our thumbs? It definitely amuses me!

The InFocus projector startup screen: InFocus. See clearly.

This one is mostly fun because the projector is usually out of focus when it starts and displays this screen.

What are your favorite phrases that don't always go together?

Monday, March 23, 2009

Hope and Plan

It's really easy to want some future state to be, and to dream about how great it's going to be. "We're gonna have a plan to fix any customer issue within 24 hours!" "We're gonna accept every story within a week." "I'm gonna get my commute to under 45 min."

Those are great. And if we stop there, what we have is hope.

If we really want it to happen, though, we need hope with a plan.

A plan is what will bring you from merely hoping for something to actually seeing it happen. Even if it seems silly, you have to have an idea of how you're going to get to your particular future utopia.

So what distinguishes a plan from some lightly-defined hope?
  • A plan describes what will be done and what will be sacrificed for achieving the goal, whereas hope just describes interim goals. For example, if the goal is to accept stories within a week, hope would say, "we're gonna spend 2 hours a day on story acceptance", and a plan would say, "we are extending all our current due dates by 20% to allow for 20% (each person for two half days a week) of time to be spent on story acceptance."
  • A plan describes who will do things. "We" rarely do anything at all (there's somehow a gap between each member of that "we" into which the group effort falls). Instead, Bob does something, or at least "each member" does something. Hope alone is generally something "we" can do!
  • A plan has measurable milestones. We can see if we're making progress or not with a plan. With hope, well, we're either in that future or not.
Hope is great, and sometimes hope is all that a goal really merits. But if you want to improve your odds of actually reaching that future happy state, try turning your hope into a plan.

Friday, March 20, 2009

Case Study in Small Worlds

I've worked in a few major software locales - the SF Bay Area and Boston. In both places it's not uncommon to hear someone remark that, "it sure is a small community here. Don't piss someone off, because they're sure to know the next guy who's going to hire you".

In the land of this being a true statement, I just heard this week that someone I worked with a few jobs ago is now out of work and looking. How do I know? His resume landed on another friend of mine's desk, and I got a phone call. Small world!

What I struggled with a bit, though, is what to say. This guy was.... okay. He was very very young, very cocky, and didn't have the skills to back it up at that point. And this was about two years ago.

So what can you talk about, when you're talking about an ex-colleague?

In general, you can talk about things that are unlikely to change. Things that may have changed, though, should be off limits.

Things to Discuss:
  • Ability to learn. This is huge and I'm pretty well convinced it's innate - either people learn in an environment or not. Do allow for some different results in a different environment, though.
  • Attitude. Attitudes are extremely difficult to change, and in my experience they generally don't. Someone who always had to feel like the smartest person in the room before will almost certainly still be going around trying to feel he's the smartest person in the room.
  • Work ethic. If the person couldn't focus on anything before, the amount of focus is not likely to have increased. If a person regularly came in late and left early, and slacked off a lot, well, you can probably expect that to continue.
  • Talent. Talent to me is a person's innate ability; it's the things that person simply finds easy. Some people are simply talented testers; parsing a system and effectively interacting with it just sort of happens. Others are not; they may be able to test, but it's often a lot harder for them. Same goes for coding and even communicating. Skills can be learned, but it's more of a struggle for some than for others.

Things To Avoid:
  • Experience. Since you worked with the person he may have learned that Linux he was lacking. He may have picked up test automation, even though he never used to write a line of code. Don't ding him based on what used to be true.

That being said, if you're a candidate, you can hazard a guess that your ex-colleagues will find out about and could influence your job search. So even in this job, be mindful of the impression you're leaving.

Thursday, March 19, 2009

No Substitute for a Real Client

We testers spend a good portion of our working lives trying to anticipate users. What will they do? How much load will they throw at us? What do they expect to see?

No matter how good a guesser you are, you, the tester, are probably wrong:
  • You understand the internal workings of your product, so you know what to steer around and what to do. 
  • Your job is to think about this product all day every day. For most users, your product is probably just a tool, and is not the actual point of their day.
  • You've probably had a feature or area of the product explained to you, possibly multiple times (hey, sometimes I don't get it the first time around!). Your customer might have gone to training... maybe... once... a few years ago on an older version of the product.
So.... gloom and doom, we're not going to succeed. But we can get closer, and we can do better. We can always do better.

First and foremost, any time you get to talk to a customer or watch a customer work, do it. I got a chance to visit a customer this week, and it was really eye opening. They had all these crazy theories that I would never have thought of, and some of 'em were really good ideas.

Second, consider that each usage model is composed of several component behaviors. Maybe those behaviors are things you can (or do) anticipate and try yourself.  I've talked about this approach before - looking at behaviors and considering usage from there, rather than looking at usage and considering behaviors. I call this approach testing in the small. The net result is that you may not have considered this precise use case, but you've probably gotten through parts of it, so you're not as far off as it seems.

And last, consider what your customers are telling you indirectly. Do you get lots from client systems? We do, at least, when there's a problem. Let's look at the logs and build up a picture of the usage. How many clients are using this simultaneously? What kinds of operations are they performing? Don't have this kind of info? That's something that ought to be added, and right quick. It's like watching a movie instead of  live action, but it gets you closer.

And in the end when we're testing something, we're not the real client. But we can always be better. We can always get closer.

Wednesday, March 18, 2009

Stopping Later

We're having a discussion about stopping support for one of our hardware types. This is good news - that hardware is older and slower and getting annoying to support. However, even when you make the decision, QA doesn't get to stop worrying about it.

Say we decide to end support for a piece of hardware today. What does this really mean?

It means we stop selling it.

It specifically does not mean that we yank it out of the field immediately. Customers running this hardware still get some time before that hardware is out of warranty and their replacement cycle occurs. During that time, they're still full-fledged customers; they get support and upgrades just like everyone else.

So dev still has to worry about the hardware, at least as much as they've ever had to (translation: most of the software isn't hardware-specific). And QA still has to certify new releases, patches, etc. on that hardware.

Stopping support for something doesn't mean no longer worrying about it. Stopping support for something merely starts a timer that means sometime in the future we won't have to worry about it.

Tuesday, March 17, 2009

Aging Reports

When we talk about measuring engineering - both development and QA - there is pretty much always a need to measure ourselves. There are about as many ways to do this as there are to build software, and you have to be very careful about how you implement it. Any metric can, and probably will, be gamed. So you want to make sure that gaming the metric produces results that you find desirable.

One of the metrics I often like to consider is an aging report.

What Is It?
An aging report is simply a measurement of how long something takes from identification to conclusion. Most of the time, this is considered to apply to defects, but it can also be applied to backlog items or stories, if you're interested in measuring that. The key point here is that you have to be sure you're measuring things the group can control; to measure the team as a whole, look at all states combined. To measure individual groups, look at only states which that team controls.

Setting this up for a defect tracking system:
  • Identify each state in your defect tracking system, and which states you are interested in measuring
  • For each item, you are going to need to know how long it spent in each interesting state. How long was it in the "opened" or "reopened" states? How long was it in the "Ready for QA" state?
  • Keep in mind that a bug may be in some states more than once. You need to count the sum of all these times.
  • You're going to want a script for this if your defect tracking system doesn't do it for you.

Setting this up for stories or backlog items:
  • This is basically the same as for defect tracking systems, except that you may have to define your own states and change how you actually get the measurements out of the system.

Once you have this set up, you need to consider the stats:
  • Largest, smallest, and median show you both outliers and also an average that you can look at for trends.
  • Like most statistics, this is a rolling average, usually over the last 30 days.

So our defect aging report winds up showing us, for example, "average time bugs were Ready for QA but unverified over the last 30 days". Substitute other values as you need!

What's It Good For?
This is a metric intended to show how quickly we can push things through our process. A presupposition is that speed is good when resolving issues or implementing backlog items. While it is most often good for bugs because that's relatively easy to track, this can be used to measure any time a single group has responsibility for making progress. The real trick here is to look for trends rather than a specific value; you want to see the number go down as engineers get faster working through things.

Caveats
As with all metrics, there are some downsides:
  • If you don't count total time spent in a given state (e.g., if a bug gets reopened, you need to count both open times), then there is a tendency for things to be ... eagerly moved.
  • If your development effort is very spikey.

Note that this is one of an occasional series on metrics.

Monday, March 16, 2009

Frustration Inflation

Let's say there's a problem (gasp!). Further, let's say the problem happens more than once. This is a real concern, and particularly when it's difficult or embarrassing to deal with, the people handling the problem are going to get frustrated. Repeated hardware DOA issues - support and professional services will be frustrated. Repeated reopenings of the same issue - QA is going to get frustrated. Repeated reporting of the same (nonexistent) bug - dev is going to get frustrated.

And rightly so.

The thing you have to be a bit careful of, though, is what I call frustration inflation. 

Let's say you're trying to get to the bottom of one of these issues. One of the normal questions you will ask is, "how often has this occurred?" or "what's the frequency of this?". Someone who is frustrated is going to remember the frustrating incidents - the recurrences of the problem - more than the situations in which the problem did not occur. So they will unintentionally exaggerate the frequency of occurrence. You'll start to get answers like, "all the time!" or "like 80% of the time", or "way too often". Take these with a small grain of salt. These statements deserve credence as an expression of frustration, but may not be strictly accurate.

So what do you do when you're in a situation where "frustration inflation" is likely occurring?
  • First and foremost, check your facts. You may be the one frustrated and inclined to discount reporting from those closer to the issue.
  • If it's not doing harm, let the frustrated person vent. There's not necessarily a need to call someone out on this; you'll only increase the frustration.
  • Be ready with facts. If there is some frustration inflation going on, and it gets to a point where you have to address it (especially if inaccuracies are getting to a client or to a boss), then the only way to counteract it is to be extremely accurate. Dates and occurrences help here.
  • Do not attack the person who is frustrated. There is likely a reason for this frustration - assuming we're all professionals here! Keep it accurate, and let the other stuff slide at least until things are calmer.

Friday, March 13, 2009

My Favorite Interview Question

I do a fair number of interviews for various QA positions. And there's one question I ask of anyone who is going to be testing:

Tell me about your favorite bug.

I ask this because it's a hugely open ended question, and almost no one has thought about it beforehand. It's amazing the different responses I get, and just how illuminating they are. People talk about:
  • how they can't think of anything (for the record, this is a bad sign).
  • how very very many there are (also a bad sign - you really can't differentiate?).
  • who found the bug (was it them alone? a team? did they really take the lead here? was it a customer and you did the after-the-fact diagnosis?).
  • what's important about a bug to them (hugely visible GUI issue? funny but embarrassing problem? subtle issue that shows how smart and thorough they are?)
  • why that problem stuck with them (a recent thing that was the first that popped into their head? - not generally good, by the way)
I find this a good opening for conversation, and it can lead in all sorts of directions. Plus, it's nice to get people off a well-rehearsed path.

What do you ask in your interviews?

Thursday, March 12, 2009

Red Flag Words

There's what you say. And there's how you choose to say it. When you're choosing how to describe something, you need to consider the effect your words will have on your audience. Some words, in particular, are what I call "red flag" words.


Don't use these words unless you want your audience - typically a customer - to panic. Of course, no one is asking you to lie, and in the right contexts these words are fine. In front of most customers, though, try to use words that convey the same meaning but in a softer way.

Crash
Why it's bad: Think car crashes, plane crashes.... crashes sound unrecoverable.
What to use instead: "Service interruption in which the software stopped"

Disappear
Why it's bad: Poof! Magic? That kind of magic makes your product sound flaky.
What to use instead: "Not display" or "does not show up"

Panic
Why it's bad: AHHHHHH!!!!!! Even if you do usually mean a kernel panic.
What to use instead: "Issue with the underlying kernel" or "kernel ceases to process requests correctly"

Core dump
Why it's bad: It sounds like it left itself in pieces all over the floor, and it sounds like you can't recover.
What to use instead: "Take a snapshot of what the process is doing" or "diagnostic core" (hey, you're likely using the core for diagnosing what happened)

Unexplained
Why it's bad: If you can't explain it, how will you ever know what fixed it?
What to use instead: "Not fully understood" or "probably"

"I don't know"
Why it's bad: It sounds like you're giving up.
What to use instead: "We don't know yet, but here's what we're doing to find out." Give them some idea that lack of knowledge is hopefully a temporary state.


What words do you avoid around your more sensitive listeners? What phrasing do you use instead?

Wednesday, March 11, 2009

Non-Software Lesson

Last night I went to a speakers series here in Boston. This particular one was a debate between Ann Coulter and Bill Maher. It was fairly interesting, and certainly politically charged. Lots of topics came up, ranging from whether democrats should be glad about their winning margin in the last election, to whether the stimulus bill was a good idea, to arguments about stem cell research, to whether creationism should be taught in schools. (hey, I didn't pick the topics!)

You know what we talked about as we were leaving?

The technical problems the Wang Theatre had.

Not what we thought about stem cells. Not where the various figures and numbers came from arguing about the size of the stimulus bills. Not what the president could possibly do in the first seven minutes after hearing the US was under attack (again, I didn't pick the topics!).

Nope, we talked about how embarrassing it was that the Wang Theatre had problems with two of the three mics on stage.

It was a simple problem, and an easy fix. The mics simply weren't turned up enough. So someone would start talking, a voice from the balcony would shout, "we can't hear you!" and they'd turn it up. What was annoying was that it happened on two different mics and it took at least five, "okay, is that better?" "no" interactions to fix it.

There's a lesson for us here.

One annoyance, mishandled, can taint an entire experience.

If almost everything is useful, and does what the customer expects, one problem isn't a big deal. Mishandle that problem, though - say it's fixed when it's not - and that problem becomes the focus. Because now it's not just a problem; it's a problem that keeps happening. And that's annoying. That's what you start to remember - that repeated failure.

Having annoyances is inevitable. How you handle them is up to you. Handle them properly, or those annoyances will rapidly become what your customer talks about. Let's get them talking about the repeated successes they're having instead.

Tuesday, March 10, 2009

Test Your Procedures

There are many parts of a release: software, hardware, documentation, marketing materials, tools and analysis helpers, etc. In the mess, it's easy to forget something. One thing you really don't want to forget, though, is your support procedures.

Support What?
Support procedures. We all have 'em, some more formal than others. These are the things that support or dev does on a system in the field when it's in trouble. Sometimes they are scripts, sometimes they're a set of commands, sometimes physical actions; it could be a lot of different things depending on your system. You'll find these on your wiki, or in the support file server, or (worst case) by talking to your support engineer.

Usually support procedures take the form of a diagnosis and one or more actions. For example, a support procedure might be as simple as, "if Tomcat hangs and won't accept connections, restart Tomcat with /etc/init.d/tomcat5 restart". It can get more complex as well.

What's To Test?
Well, things change. Some of the time you're going to know. A release may contain a feature specifically to address or change a support procedure. For example, if restarting a system involved five manual steps, the software may have been changed to include a "restart all" option that does those steps for you. Great, your support procedure has changed (and it sounds like for the better!).

Other times you may not know. If someone writes code assuming a certain startup flag is first when it used to be second, a support procedure adding the flag will stop working. It's unlikely that anyone will remember to note this change.

So you'd better test your support procedures rather than just assuming they'll work.

Okay, Okay, We'll Get to It
Great! The real urgency here is when you'll be using the procedures. If you're in a state that a support procedure is necessary, then something isn't right on the system. It could be minor or it could be complete inaccessibility of your system, but either way there's something negative occurring. Now is not the time to have something else - your support procedure - fail.

Better to test the procedures before you need them. That way, when you're servicing your system you greatly improve your chances of success. Happier support, happier customers, happier us!

Monday, March 9, 2009

End of the Minute

I'm highly dependent on my calendaring app. I'll be working away and it'll pop up and tell me where to go (or where to call in). I'm not exactly a power user, just forgetful of the time while I'm in the middle of something. So I use iCal - easy, comes with the machine I'm on, does the three things I need it to.

There's just one small thing....

Notifications happen at the end of the minute.

Let's say I have a meeting at 10:00am. The notification pops up at the END of the 10:00am minute. Which means 1 second later, it's 10:01 and I'm late. Granted, I should probably have left two or three minutes ago, but it's a small office and I can make it into most conference rooms within 45 seconds.

How annoying!

Friday, March 6, 2009

Defect Leakage

In a conversation recently we were talking about problems that occur at customers, and what we can do to help reduce that number. One of the questions that comes up in these situations is whether we're finding new things at customer sites, or whether we're simply running into issues that we've already fixed. In other words, are we finding stuff and just not getting it to customers correctly/efficiently, or are we flat out missing stuff?

The best way I know to figure that out is to look at our defect leakage rate.

What Is It?
Defect leakage is the number of bugs that are found in the field that were not found internally. There are a few ways to express this:
  • total number of leaked defects (a simple count)
  • defects per customer: number of leaked defects divided by number of customers running that release
  • % found in the field: number of leaked defects divided by number of total defects found in that release
In theory, this can be measured at any stage - number of defects leaked from dev into QA, number leaked from QA into beta certification, etc. I've mostly used it for customers in the field, though.

What Is the Goal?
The goal of all of the defect leakage metrics is to identify what kinds of things our internal development and testing is missing. By looking at the rate we can see if we're getting closer to anticipating our customer's usage and environment, or farther.

Why Do I Like It?
Defect leakage is one of my (very few) pet metrics. I really like that it's a directly customer-affecting measurement. If a customer hits a bug we've never seen before we have to do the diagnosis, resolution, and workaround/fix deployment all on the customer's time. If a customer hits a bug we've seen before, even if it's not fixed, it's generally a shorter and less painful process to get a a workaround or a fix deployed at their site. I also like that it's a measurement of engineering as a whole, not just of QA. Lastly, measuring defect leakage requires you to really consider every customer incident, and that kind of introspection helps me see patterns in problems. It really points out where our coverage is lacking and, conversely, where our product is pretty good.

Caveats
As with any metric, measuring defect leakage is not perfect. It certainly doesn't apply to all situations, and there are some flaws to be aware of.
  • It's kind of a pain to report on. For each defect found in the field, you have to figure out if you found it internally at any point in the development process, and somehow keep a report of all this. It's not generally as simple as running a query against your defect tracking system. If you're low volume, this isn't a big problem. At higher volumes it gets a lot harder.
  • It's not useful as a measurement of QA alone. To keep your defect leakage rate low, you have to have development, QA, product management, and support all working together. This is actually one of the things I like about it, but it makes it a bad measure of QA alone. If, for example, product management incorrectly guesses how customers will use the product, your leakage rate can go up even if QA's effectiveness is unchanged.

Thursday, March 5, 2009

Postmortems

By now, at least, in the circles I run in, doing a postmortem after a release is a fairly common thing. We all get together in a room and discuss what went well and what didn't. We talk about what things we want to do in the same way, and what things we want to try doing differently.

We've started doing the same thing for major customer issues. While the issue is going on, just get through it - you're wabbit hunting! But once the issue is resolved and the customer is back up and running, consider having a postmortem.

Just like you can build software better, faster, more reliably, you can solve your customers problems better, faster, more smoothly. So get the team together - support, development, the account rep, and maybe even the customer - and see what you can learn. Questions to consider:
  • What could have prevented this from happening in the first place?
  • What is the customer's perception of how this issue was handled? Were they pleased? What would have made them more pleased with your handling?
  • What changes should be made in the product or in procedures that would have enabled earlier detection of the problem? More precise detection?
  • Could this problem have been caught internally? If so, what can we change to catch these types of problems before they get to a customer?
  • How disruptive was this issue internally?  Is there anything we can do to reduce that?
  • What is the "signature" of this problem? How will we recognize it if we see it again?
Not every issue needs a postmortem, but for the large customer problems, pausing after it's over to think about how to make it better next time can certainly prevent future headaches.

Wednesday, March 4, 2009

Few Words

I spend a lot of time reporting. I do daily reports, I do verbal reports, I do ad hoc written reports. One theme of all these reports is this.... use as few words as possible.

Don't be this guy....all talk and nothing said:


Reports are not books. Get in, make your point, and stop. It's harder to write, but your audience will appreciate it.

Tuesday, March 3, 2009

Volume Does Not Mean Cutting Corners

Often when you're doing a test, you find several things in one area. You may find eight different login bugs, for example, or there may be seven different issues going on when you get the system to an overly loaded state. This kind of bug clustering is generally not surprising.

However, just because you now have a fairly large number of things to log does not mean you get to cut corners on logging them. Each bug must still be able to stand on its own. After all, six months from now when you see something similar you won't look at the cluster of bugs first; you'll look at the single most similar bug first. Plus, anyone else who goes to look at the bug will ask you the basic questions you didn't answer, and then you'll have to go look them up (there goes any time you "saved").

No matter how high your volume gets, do not cut corners.

If it's bugs, don't forget the system configuration and version information. If it's a customer issue, don't forget to get all relevant logs (not just the log of the proximate cause). If it's a bug verification, don't forget to put the other things you tested - regression tests, for example.

You wouldn't cut corners if you only had one thing to do. Don't use having a lot of things to do as an excuse to change your work. Keep your standards up!

Monday, March 2, 2009

If You Can't Figure It Out...

Today it's snowing in Boston pretty good, so there are a lot of people not in the office. One of the people not here is someone who started a test on Friday that ran over the weekend. Now I'm curious about the results of the test.

Fortunately, he sent a note to the team indicating where the test would be running and referencing the problem he was trying to show was fixed. So I know something. From there, I had to figure out the answers to my questions.

And it occurred to me, this is what we do for our customers. Given a problem description and a location (for logs or whatever), can you figure out what's going on? We have questions about what the system has been doing; can we get answers to those questions?

Sure, I can wait until tomorrow or send an email - after all, the tester has the answers about what he was doing. But with a customer that's not always possible; sometimes customers simply don't know the details of what their customers are doing. In the end, the definitive source of answers to my questions had better be the system itself. Any other source is going to be less available.

If I have a question and the system cannot answer it, that's an opportunity to improve the system.

Of course we rely on multiple sources of information. Those sources of information change, though, and some are more reliable than others. Ultimately, you can rely only on the things you can control - your system and its logs. The rest is just something you hope you can get - probably available, but not guaranteed.