Wednesday, September 30, 2009

Failing Is Okay

A bit of background:
We have a test infrastructure that is fairly large and has grown over a number of years to the state it's in today. Like any piece of code that has grown and extended and morphed over time, it's gotten a bit crufty in areas.

The project:
A couple of engineers wanted to add a new feature to the test framework, so that tests could set different timeouts for setup, test run, and teardown. As they started to get into it, they discovered that adding the feature would be a lot easier if they could do some refactoring as well.

So they did. At this point they were pretty deep in the test infrastructure code, and touching a lot of things that are used by a lot of tests.

The result:
Disaster.

We walked in the next day to a hundred or more machines that all leaked out of tests. Tickets were getting autologged constantly, no one could reserve machines, and it took a few hours to clean it up.

The conclusion:
Blowups happen, and in the land of disasters this was pretty darn minor. Our automated tests had one bad night, and our defect tracking system and reservation code got bit of a workout. That was it. No customers were affected, no releases slipped, and no smoke emerged from the lab.

I'd rather see my team tackle the big problems and occasionally fail (and failure really doesn't happen that often) than have them be afraid to try things. It's important to go in there and refactor code that's getting crufty. It's important to extend and enhance the test infrastructure. Let's not let fear of breaking something get in the way of that.

It's okay to fail. Most of the time you will succeed, and in any case, it's better to fail than to not try.

Tuesday, September 29, 2009

Sharing Feedback Systems

Pop quiz:

You have a client, on whose behalf you are building the next Twitter (or whatever). You're a nice, forward-thinking, modern software kind of guy, so you've agreed to take client feedback throughout the project and to put up interim builds frequently to show progress. In addition, you've got a tester who will work on this project as it gets built.

Do you let your client see your defect tracking system? The one your tester is using?

Let's think about this for a minute:

Pros of having the same defect tracking system for internal (your tester) and external (your customer) feedback:
  • No more copying and pasting from emails or one system into another. You've just bought someone hours of time!
  • Fewer duplicates. Your tester will confirm things your customer saw rather than double-reporting them, and (hopefully) vice versa.
  • More comfort. Your customer knows exactly what he's getting and gets some comfort that you're not just writing and waiting on him, but that you're actually using this and working out the kinks before it goes live.
Cons:
  • All your warts? Totally visible. This is really the big downside here.
  • You all have to watch your language a bit. No putting "the customer is always right, even if they're total idiots" in bugs.
  • Less comfort. You mean you're not perfect? If the developers didn't catch this, then what are they missing?

Notice that customer comfort appears on both lists; both of them are probably in play a bit, and which one dominates will depend on your customer. Most or all of the customers I've worked with, though, have fallen into the more comfort camp. They know software in general has problems, and not finding them means you're not looking. And if the customer doesn't see you finding them, then as far as he's concerned, you're not providing that comfort.

My preference is to err on the side of transparency. I think that showing the client you're not hiding anything is part of building a good relationship. I also think that you shouldn't be denigrating your client even in private (you never know what might slip out into public!). At least for the team I often work with, knowing that the client is going to see everything also incentivizes the developer to poke at it a bit harder before putting it out for public review (not true of everyone, I'm sure, but having an incentive to not slack off never hurts!).

If you want to not share with your customer, ask yourself what you're hiding, and what would happen if your customer knew. And then go out and fix those things. The things you hide, those are your problem areas. Being transparent will make you fix them.

There are probably situations where an internal-only system is needed, but I haven't run into one yet.

Monday, September 28, 2009

Your Customer's Wants

There are many many people in this world who care about what we do as testers and as software engineers, in one way or another. A significant subset of these are people we would consider customers. Our customers are the people that consume our work. And they have wants and needs.

Our customers want a lot of reasonable things:
  • information about the product's current state
  • information about the severity and likelihood of an issues they have/might/hope they won't find
  • test coverage information
  • risk assessment of our products
Our customers also sometimes want a lot of things that are impossible:
  • Zero defect releases
  • The ability to "make up for engineering" by testing in some quality (darn it!)
And then there are the things that our customers want that are something of a grey area:
  • a go/no go release vote
  • something to measure how good their testers really are; a "standard" or "guideline" or "best practice" or "certification" or "metric" (the words are different; the desire is the same)
Telling your customer that they can have something they want is easy. The other two categories are much more difficult.

Telling your customer they can't have the impossible is not a fun conversation but, assuming you're dealing with a rational human being, generally goes fairly well. For example, if a customer asks for a zero defect release, you can explain the difference between known defects and potential defects, and describe your find rate and fix rate currently. Do some preparatory math: if your find rate is decreasing at 10% per week, and you're finding 10 bugs per week currently, then in 10 more weeks you will have a 0 find rate, probably. Do the same with the fix rate, add in the regression rate if necessary, and you will have a target date for zero known defects. Then you simply let your customer decide if they really want to ship with zero known defects, or if their release date is earlier than that can happen. Other examples are similar. The impossible requests are generally counter-able with logic and a few numbers.

The really hard part is when your customer is asking for something that's not out and out wrong, but that is a matter of opinion. A good example here is whether "QA votes in the release decision". There are a number of things going on with these grey areas:
  • There's no tester consensus on this one. Some schools of thought say that test is an information providing function only and should not make a decision about the location of the software in the development life cycle (moving through any phase, including release). Other schools of thought embrace the notion of testers as a hub of information and therefore put the release decision squarely in the test corner.
  • This isn't about something industry-wide. For every "testers are information providers" argument there's a "testers have a place at the release table" argument, and they're both valid. These are arguments about role and function within an organization, which means they're specific to that organization. What they do at "other company X" is only marginally relevant.
So what do we do? We're faced with kind of a difficult choice here.

Ultimately, what you do is up to how far you're willing to go, and will be dependent on your relationships with your customers, your own comfort level, and even where your future ambitions lie and how much you're willing to step outside your "tester" label.

You can choose to follow what your customer wants. Take your place at the release table and treat it like any other test. Gather information from all available sources, consult whatever oracle you have, and a make a decision. (In this case, the decision is whether to release, not whether to log the bug, but the underlying process is similar.) Stick as many caveats around it as you want, but ultimately keep pleasing your customer.

You can choose to dig your heels in and refuse to provide anything resembling a decision. Eventually they'll stop asking. Note that normally this costs you politically, but that may be okay with you.

You can try to talk your customer out of it. Explain your views; perhaps your customer will agree. If not, you've at least provided contextual information about how this might not be a good choice that your customer's making. (If you're in the test is information providing mindset, well, you've just providing information about their decision - we're starting to get a bit meta here!). The goal here is to be transparent and to think rationally about this decision. Describe why this might seem like a good idea and why you think it's a bad idea. But ultimately, your customer is making the decision here.

What you do about your customer's requests is up to you. If they're asking for something reasonable or something impossible, that's fairly straightforward. But when you're in a situation where they're asking for something grey, think long and hard before you answer. In the end, you're going to win some of the grey areas, and lose some of the grey areas. Your job is to make the choice transparent. In the end, this is a relationship, and transparency of decision is more important for the long-term relationship than any single decision.

Friday, September 25, 2009

Changesets

I know not all testers do this, but at least where I am, testers write code and check it in. (Yes, I recognize that this is completely normal for some shops, including ours.) Now, when we check in code, we have to behave like good engineers, and that means following the guidelines of good source code management practices.

For example, today I made a change to the crontabs that run our nightly test suites. It was a simple change, just moving some start times around to accommodate some maintenance work IT is doing in the lab. In addition, we recently identified an issue where we would lose some test logs if the machines were rebooted. We traced the issue to the crontabs, where we were tee-ing the logs to /tmp (which gets cleaned out on reboot in our configuration).

So as long as I'm mucking about in the crontabs, I said I'd fix it. The bug fix is also a simple change: just add it to my change set to move times around, right? Wrong.

Instead I'm going to follow two principles:
  • Create atomic change sets
  • Isolate your change sets
Creating atomic change sets means that I put everything related in the same change set. This morning when I was moving around start times, I put all the crontab changes in a single change set, even though it was four different files.

Isolating changes ts means that the first change I checked in only makes the start time changes. It doesn't do anything else. I did a second change set to fix the bug.

The goal of creating atomic, isolated change sets is to ensure that they can later be manipulated easily and effectively. Maybe later I will merge the bug fix change set to a different branch. Maybe my start times are wrong and I need to revert that change set. Because I did them as two separate change sets, I still have that option. Because I did each separate task (changing start times and fixing a bug) as one change set, I can easily do my merges and reverts with no danger that I'm going to get myself into an inconsistent state.

Just like when you're testing, or when you're designing a test, make sure that you code for the future. A little thought now can save a lot of trouble down the line.

Thursday, September 24, 2009

Test Estimate Trick

We've been building software for a while, and we've been using stories for a while. You'd think estimating would get easier, but at least for me it really doesn't. There are several techniques we use to create test estimates, and I thought I'd share another one.

Basically, the issue is that humans tend to be optimistic (how's that for a huge generalization!). So when I'm sitting down to estimate a test, I break it down and I look at all the things I'm going to have to do. Usually this includes things like data generation, boundary analysis, component interaction analysis, test code modification and creation, actually running the tests, writing up results, etc. Then I just add it all up. Hooray! Test estimate complete!

Except not.

Because I haven't accounted for something. I don't know what it is (or I would have accounted for it), but I've definitely missed something. Maybe on one story it's that the data generation takes a lot longer than I thought. Maybe on another story it's some subtlety about how two components interact that I simply never anticipated. Maybe it's a problem in my translation from work time to calendar time (who'd have thought it took a week to find 4 hours for doing this test?). Either way, I find that these things usually take me longer.

So I use history as my guide.

We have a record of all the stories we've done, and of how long we actually spent on them. It's all right there in our wiki (or your Jira instance, or your Test Director instance, or whatever). So we can use it. Let's do a little data mining.

Here's how we get some information:
  1. Estimate your current stories. Do this with whatever model you like, just get to the "this will take X" point.
  2. Go through past stories and group them into "big" "medium" and "small". This is a grouping of test effort here, and it reflects how hard it looked to test. Your gut feel applies here, and you're welcome to include other metrics (e.g., "that team does a ton of refactoring, so their stories are always medium or larger"). Be sure to do this over as long a period of time as possible, so you can flush out any really weird circumstances.
  3. For each story, determine how long it actually took you to test. Use calendar time here: from the day you started until the day you stopped. If you had it in test multiple times (test it, failed, retest it), count them all up together.
  4. Do a calendar-to-estimate ratio. Let's say your calendar was 8 days, and your estimate was 8 hours. Congratulations, one of your estimated hours is a real world day in this case. Calculate this into the form 1-hour-to-?? (e.g., 1 hour is 1 day).
  5. Average the estimates within the buckets. At this point you have a list of calendar-to-estimate ratios. They probably look something like this: 1h:2h, 1h:1day, 1h:3days, 1h:4h. Now we simply average these. Add 'em all up, divide. If you're feeling fancy, throw out the biggest and smallest outliers. The result is a single ratio: 1 hour estimate time is ??? calendar time. This is your real estimate.
  6. Adjust your estimates. Now, go back to your current story estimates. If, for example, on medium stories, your average ratio is 1 hour to 4 hours, and your current estimate was 2 hours, then your new estimate is 8 hours.

Is this precise? No. We're playing the law of averages here. We're probably going to get each individual story a bit wrong; over the course of a test cycle with a number of stories, though, the idea is that the little disparities will wash out and our efforts will approach that average. It's a way to bake in risk and slippage without having to explicitly account for it. You're implicitly accounting for risk and slippage.

Give it a shot - let's see how test estimates go. Over time, hopefully we'll see ourselves improve.

Tuesday, September 22, 2009

Flexible Work Hours

One of the benefits of working where I do (and in many other companies) is the flexibility around work hours. There really is an embrace of work being about what you do, not where you're sitting at any particular hour of the day. This is a blessing and a curse, really.

On the plus side, people can really set up a schedule that works for them, and be happy about it. Are you a morning run addict? Do it and show up a bit later. Natural night owl? Show up in the afternoon and work into the night. Think prime time is bed time? Show up early and leave midafternoon.

On the down side, flexible hours are a rope you've been given and you have to be a bit careful not to hang yourself. If you take "flexible" to mean "don't bother working as much", you're not going to go very far.

There are a few things that can help your chances of success in a highly flexible work environment:
  • Some showing up is beneficial. In order to form the relationships that will make you successful in your job and your career, you need to be around. Get totally off schedule, or work from home constantly, and you'll have a harder time forming those relationships; it's not impossible, just harder.
  • You still have to work. It's up to you to get your work done, and you have the added burden of having to set up a schedule (or at least refine one) that gives you the structure you need to be successful.
  • Consistency matters. It doesn't matter when you show up, or when you leave, but your coworkers should generally be able to count on you being around at about the same times every day (or week or month or year). That way when they're scheduling meetings, they will know to schedule it for 11 because you get in at 10, etc. If you come in at 7am one day and 3pm the next, you're making your coworkers work a lot harder to include you. Of course, doctor's appointments and whatnot do occasionally happen; just try for general consistency.
  • Consider core hours. This is related to the showing up bit. You have to show up when people are there, and generally you need to allow for a few hours of overlap so you can actually have working sessions with your coworkers. It's helpful to have 2 or 4 hours where generally everyone's there. That is when you start to find design sessions, code reviews, meetings, etc. scheduled.
  • Be willing to go outside your normal. Sometimes a meeting will be scheduled outside your normal flexible schedule. Well, flexible goes both ways - you get flexibility in your schedule, and your employer should get flexibility in your schedule, too. That means that if some early bird has scheduled a one-time 8 am meeting, well, go ahead and make the effort to come in, even if you're normally an 11am kind of person. Accommodate others just like you ask them to accommodate you.
As with many other things, maintaining a successful flexible schedule is something that can work really well, as long as you're willing to put in the work and remember to compromise. After all, a flexible schedule is for everyone's ease of use - so use it, and be flexible with it.

Monday, September 21, 2009

Actionable Issues

I was talking with someone at work today, and he was telling me about an issue he's having. He had a problem on one system (bad switch), and he logged a bug and it was dealt with (new switch!). He now has another system with the same type of (bad) switch in it, and it isn't a problem, but he's worried it might become an issue. As a preventive measure, he's gathering logs and starting to collect data; if it fails, he'll have some good prefailure data.

(By the way, I applaud this effort. It shows some real thinking ahead. But that's not the point of the story.)

Anyway, he came to me asking if he should log a bug for this.

And I said no. He is missing one key criteria:

Bugs should be actionable.

Let's say I'm on the receiving end of this bug. There are only a few things I'm going to ask myself:
  1. What's happening?
  2. How is that different from what we would like to have happen?
  3. What do I do?
If you can't answer those three questions, you don't have a bug (or issue or ticket or whatever you want to call it). You might have something that will one day become a bug, but you don't have a bug.

So whenever you're logging an issue, make sure you've elucidated the things the recipient will need to fix the problem. Give the recipient desire and action. One without the other will not help your problem get solved.

Friday, September 18, 2009

Tools

On my Mac OS X laptop, I'm currently running:
  • OS X (duh)
  • Debian Linux in a terminal
  • Windows 2003 Server in a CoRD (remote desktop) session
  • Windows XP in VMWare Fusion
I'm amused by the sheer number of underlying environments we encounter in a given day. (There's also my phone, my iPod, and probably many others I haven't even thought of.)

These are all just tools, and each is giving me something different. I'm using WinXP for Office because I like Windows Office better than Mac Office. I'm using Windows 2003 server as an AD domain controller (test domain in the office). I'm using Debian Linux for compiling our software. And I'm using OS X for mail, IM , and internet.

So here's my reminder to myself for the day: we don't really get to be zealots about tools. We have to be able to work with a lot of different tools and a lot of different systems because our customers do. Our customers use Linux and AIX and Windows. Our customers use AD and they use NIS. Our customers use Firefox and Safari and IE. So we get to give a lot of those a shot.

The chance to play with all these toys - I mean, tools - is one of my favorite parts of being a tester. Find 'em, learn 'em, use 'em. An experienced tester gets a pretty big toolbox, and I think that's a lot of fun.

Thursday, September 17, 2009

Extended Smoke Test Estimation

As we were planning the next release cycle, the QA team where I work was looking at ways we can improve our testing this time around. One of the things that came up is that we were going too deep in certain areas at first, and not hitting the breadth of the product early enough. Thus, it was a bit uncomfortably late in the test cycle before we did much peeking at some of the more obscure features. Granted, we hadn't found anything bad at the last minute, but it did feel like maybe we should make sure we touched everything a bit earlier just for comfort.

So we decided to do an extended smoke test. This isn't a huge change for us, just a restructuring of the test plan to encourage us to touch a feature, move on to another feature, and come back for a deep dive on the feature after we'd gotten some breadth.

But how long should we spend on this extended smoke test? At some point we have to go deeper, and deeper takes a while, so if we spend too long going broad we risk some obscure problem in a really commonly-used feature.

We made a couple decisions early on:
  • This needed to be timeboxed.
  • We were going to be making it up a bit as we went along, so the modified test plan would be a guideline, not a rule.
And then we worked our way into an estimate. First, we went around the room getting gut feels on four questions:
  • What is a smoke test?
  • How long should a smoke test take?
  • How long do you think the whole test cycle is?
  • How many hours a day do you actually spend testing, on average?
Looking at this, we had the following:


Basically, we were all agreed on what a smoke test was, and that the extended smoke test should take about 1 / 8 of the time of the entire test cycle. Further, we figured that the team was getting one man day in a calendar day of testing done (tester 4 is on another project, hence the zero).

Our actual test cycle this time is 8 weeks, so we had to fit this into 5 days.

We've split our test plan up into 15 sections, which means we need to cover 3 sections a day. Now we're getting somewhere - that we can check to see if we're on track. Some sections are bigger than others, so figure "small" sections can have 1-2 hours of smoke test, and "large" sections can have 2-3 hours of smoke test.

And that's how we got to the extended smoke test estimate.

Caveat time: This is what made sense for us, and is intended only as an example. Feel free to modify it to suit your needs.

Tuesday, September 15, 2009

Getting to User Stories

We're loosely an XP shop. One of the things we've adopted is the idea of basing our work off user stories. Pretty much anything we implement, whether it's a new platform or a new feature, becomes a user story. In everything I've read, XP just sort of assumes that stories come into existence. They're created by (or with) the customer and estimated, and that's all in a step called "create story".

But how do we do that step, "create story"?

For us, at least, sometimes the gap between the idea and the actual story that's ready for implementation can be months or longer. (We tend to have a lot of ideas.) So how do we handle all these proto-stories? How do we keep the ideas around? And how do we intelligently work them into stories, assuming they're important enough for that?

We've come up with a process for this, and I thought I'd share just in case anyone else is in the same boat.

A few caveats up front:
  • We've been using this for a couple of years, and it works pretty well for us. Your own mileage may vary.
  • This process looks shockingly complicated, but when I sit down to think about it, there's no step we always skip (and is therefore unnecessary).
  • This process is contained pretty much outside implementation; it's really just something our "customer" (in our case, product management) uses.
  • I'm pretty sure this is not strictly XP. Oh well.
So here's what we do:
  1. Create a story stub candidate
  2. Either accept it (story stub) or defer it (deferred) or delete it (not needed!)
  3. Work the story stubs to add details about what this thing will really do.
  4. Estimate the story stubs
  5. All agree that they're complete (make them stories!).
At some point, then, we have each of these things:

Story Stub Candidate
This is the idea. This is usually pretty general: "Hey, let's use bigger drives!" or "Wouldn't it be cool if the system could email you logs automatically when it was sick." Anyone can create a story stub candidate, and in practice they've come from all over - sales, development, product management, support, etc.

When an item is here, the "customer team" (an internal group proxying for real customers) reviews it, and one of three things happens:
  • This feature exists, so we delete the idea that's not actually new.
  • The feature isn't important now, so we defer it.
  • The feature is useful now, so we make it a story stub.
Deferred Story Stubs
These are all the things we thought of that we decided weren't important (or at least not important right now). We go through them every once in a while to make sure they haven't changed in importance, but otherwise they're mostly there as records.

Story Stubs
These are proto-stories. With enough work, and estimation, one day they can grow up to be real stories. Basically, story stubs are the ideas we've decided are interesting enough to report on. Each of them gets an owner who is responsible for fleshing out the details, defining acceptance criteria, and getting an estimate on the story.

Stories
When a story stub is fully fleshed out, it goes back to the "customer team". The team agrees it's ready and puts it in the product backlog, or else kicks it back to the owner for more information. At this point it becomes a full fledged XP-ish user story, and it goes into the dev cycle from there.


We keep these all in the wiki, and use categories to move stories through this process. Over time it grows to be a pretty big repository of the work we've done and the work we've chosen not to do.

How do you get from idea to story? (or whatever you call story)

Monday, September 14, 2009

Test Dependencies

Full disclosure: This is still in the idea stage. I haven't actually implemented it yet.

Background
We have a suite of automated tests. They basically do the following:
  • reserve some machines
  • engage in prep work (configuring the system, creating a volume, writing some data)
  • perform an action (this is the test itself)
  • check the assertion (or time the action, or whatever we're trying to look for here)
  • tear down
  • release the machines
These run overnight, and in the morning we have a list of tests and their results.

The Problem
Most of the time, this works great. We walk in, take a look at the failures, and go on our merry way (logging bugs, cheering our successes, etc).

Some mornings it's different. Every once in a while, someone breaks a really fundamental thing and we walk in to hundreds or thousands of test failures. This is a really rare event, but we're all humans here, and it happens.

What's happening as this occurs is that someone broke something, typically in setup or teardown, and it affected a whole lot of tests. For example, as part of setup, almost all of our tests create virtual interfaces. When someone breaks the virtual interface utility, every single test is going to fail. One of those failures is a direct test of virtual interface creation (yes, we test our test infrastructure, so there really is a test called test_virtualInterfaceUtil). The rest of the failures are innocent victims.

It really sucks going through all the innocent victims.

The Proposal
I call this a proposal because I haven't actually tried it.

I would like to make a "dependency tree" for the tests we run. Basically, this is something that says "if test X fails, test Y is going to fail, so don't bother to run it". The idea is that we would run tests that produce real failures and not create innocent victims.

There are several gotchas with this:
  • You may mask other failures. A test that would have failed prior to the dependency point, well, now you won't run it. A missed chance!
  • Dependency detection needs to be managed. I'm not sure how to detect dependencies automatically, and manual detection is sort of a pain to maintain. I think there's probably something here by checking calls to failed libraries, but I haven't completely thought it through.
  • A fix it is implied. When 1,000 out of 20,000 tests fail in a night, it's really obvious that something needs to be fixed right quick. When 1 out of 24 tests fail in a night, it's a little less obvious. After all, there was only one failure! (And it's easy to not look at the bottom number). I think this one is pretty easy to overcome just by making each bug also note, "because of this bug, X tests did not run" and triggering our automatic notifications.

This is still in the brainstorming stages, but it's something I'd like to keep poking at. If anyone's doing anything similar, I'd love to hear the war stories!

Friday, September 11, 2009

Autodidact

"I'm an autodidact. I share the insecurities of all autodidacts, that maybe we missed something along the way. Maybe I'm doing it wrong. Maybe if I had some sort of plan instead of a cloud of vague notions. I wait for one of the ideas to coalesce into an idea-shaped particle, and then I sit down and start writing. "

I got this out of a Jon Carroll column. It really hit home. After all, in testing I'm an autodidact, too. Sure, I went to college, and sure I got a degree, but nothing there taught me testing. I've been to some conferences, and taken a few classes, and I read a fair amount, but for the most part, what I learned I've learned on my own.

I'm missing something.

But you know what? Even if I had gone to school for testing, and been trained in testing by someone who knew a lot about the field, I would still be missing something.

I would be missing all the times I'd read an article on some testing technique and said to myself, "hey, that would be really cool if I used it for this totally different thing". (For example, I read an article about a guy who was using the VMWare CLI to start and stop machines for installer testing, and I said, "hey, I could use that to dynamically configure machines in a Selenium Grid").

I would be missing all the times I'd screwed up a test and discovered that actually I'd been testing something completely different - and that was useful information.

I'd be missing all the times I stumbled across a tool I'd never heard of and made it do something with help from Google and from friends.

So yeah, I'm an autodidact. I'm learn a heck of a lot from myself and from all the information I seek out by myself. I suspect most of us do, and I think a lot of us doe a pretty darn good job with it.

Thursday, September 10, 2009

Buckets

I was reading a paper yesterday, and it was a pretty generic "why software development is like X" paper. One sentence, though, stood out:

"Requirements -- probably the most misused word in our industry -- rarely describe anything that is truly required."

That guy is dead on, at least in this one point. "Requirements" is a word we tend to use in a rather slapdash manner. In the strict sense of the world, a requirement is something we are compelled by law or regulation to do. That thing where the SEC says that all stock sales by insiders must be submitted within X days? That's a requirement. That thing where marketing says it absolutely must be blue? Yeah, not actually a requirement.

Other than semantically, it really doesn't matter. There are only three buckets:



Legal requirements (aka "true" requirements) better fall in the "must have" bucket, and they better stay there. Other requirements might fall into the first bucket, but they might move. Things in the "will take" bucket also move around. Later in a release cycle, things tend to move down a bucket - from must have to will take, and from will take to "don't want" (which at this point gets nicknamed "please don't destabilize things").

It's easy sometimes to get caught up in subtleties and semantics, but in the end, keep in mind that everything in the product falls into one of only three buckets. So cut through the details and find your bucket.

Tuesday, September 8, 2009

Git Me Checking In

I was working on a project today and happened to be working from a project stored in GitHub. Now, I'm not particularly familiar with Git, so I kind of stumbled through with a lot of help from Google and some help from a friendly engineer who knows more than I do about this stuff. I'm coming from SVN and Perforce, mostly, so please forgive the analogies.

This is what I wound up doing:

A note on topography
With Git, you have several "code chunks" going on at once.
  • Remote master: This is the actual code that will be deployed somewhere. It's the equivalent of the server in, say, SVN or Perforce. If you're working on a project with people, their changes will eventually end up here.
  • Local master: This is your local copy of that remote master branch. Note that this doesn't automatically update when other people check stuff in. You have to update it ("pull" in git, roughly equivalent to "sync" in perforce). No surprise there. (We'll call this "master" below.)
  • My working branch: This is the actual branch you're working on. Near as I can tell, it's basically branched from your local master. (We'll call this "mine" below.)

To "Check Something In"
By "check something in" I mean "I'm done with it and ready for it to go into the remote master so that the world can see it and so it can be deployed to the staging/production server".

1. Start on your branch
git checkout mine
2. Make sure you've committed your changes
git commit files
You'll see something like this:
[mine b0b6f83] Add namespace help
1 files changed, 8 insertions(+), 2 deletions(-)
3. Switch to your local master
git checkout master
You'll see something like this:
Switched to branch "master"
4. Update your local master so it has everything that others have done since you last updated
git pull
You want to see something like this:
Updating 51dd07f..c19dab1
(files)
5. Switch back to your working branch ("mine")
git checkout mine
6. Pull changes from your local master to your local working branch
git rebase master
You might see something like this, but it doesn't appear to affect anything:
warning: 3 lines add whitespace errors
7. Run your tests and make sure it all still works
8. Switch back to your local master branch
git checkout master
9. Merge all changes from your local working branch to the local master branch
git merge mine
10. Finally, send it all to the remote master
git push

And you're done. Obviously, there are a lot more things you can do at every step here. This is just the path of least resistance I found to get started. Once you have this down, I'd definitely encourage you to play with branches and different commits.

And just for kicks, there are a couple other tricks I picked up along the way:

Reverting a file you changed and didn't want to (before you commit):
git checkout -- db/schema.rb

Making your name and email show up properly:
git config --global user.name "Catherine Powell"
git config --global user.email "myemail@example.com"

Many thanks for all the assistance, and in particular to these blog entries:

Friday, September 4, 2009

Customer Demos

One of the things QA handles where I work is demos and evaluations of early stage features. For example, a potential customer might need a feature and come in to see an early draft of it, or might take a beta system in house to make sure the feature does what he expects. Generally there's a fair amount of pressure on these evaluations because a sale or analyst presentation or something similar riding on it: "If it goes well, we'll buy/present/rave about you."


Okay, take a deep breath. High pressure. You can deal with it. There's only one rule:

Don't do anything you haven't done before.

That's it!

Basically, an evaluation is a big demo. So use your internal demos to prepare for it. You won't be able to control everything (citywide power outage, for example), but you can control a lot of things (which flags to use when starting the product, for example). By trying everything on your own first, you can make sure the things in your control go right.

When we test a product, we try to avoid ruts where we're just doing the same thing over and over again. With a demo, we want to find a rut. We're looking for a well-worn path here, and we're doing it by making sure we've tried everything before the demo.

So my evaluation preparation goes something like this:
  1. Get an idea of what needs to be shown
  2. Create a list of all the things that we'll show (down to test cases)
  3. Add the exact commands I'll be running and data I'll be using to that list. In the demo I'm not going to type; I'm pretty much just going to copy and paste.
  4. Run those commands on the build I will be using in the environment I'll be using (or the closest I can get to that environment). Time these.
  5. Repeat step 4 until I feel like I can do this in my sleep.
It takes a lot of time, but by practicing first, demos and evaluations can be approached with confidence and with a much lower rate of screw ups. It alleviates some of the pressure, at least for me.

Wednesday, September 2, 2009

Kinds of Tools

Tools are everywhere. We use a hammer and nail to get a picture onto a wall. Chimps use a stick to pull termites out of mounds (snack time!). And we use software tools to help us build and test (and ship) software.

One thing I've discovered is that there are a lot of different kinds of tools:
  • Throwaway tools. These are the bash scripts I put together in about 10 min because without them I forget to start the server before I start the client and then the client blows up (darnit!). I chuck 'em when I'm done with whatever test I'm doing.
  • Third-party tools that just work. Wireshark, a debugger, etc. Some of them save me a lot of time. Install and go, and I'm a happy tester.
  • Tools to work with other tools. This is when I'm staring at a tool that kinda sorta does some stuff but needs some help to be used or to do what I want (I'm looking at you, Selenium!). So I write a bunch of glue code to stick the pieces together, add some error handling to attempt to reduce the brittleness, expand (or create) the reporting, etc.
  • Test infrastructures. These are the things that you write as a basis for long-term tests. I'm going to use these over and over for years, across a lot of different kinds of tests, and they have to work reliably. These we design and build as we would a product (complete with tests that test the framework!).
I'm sure there are more types, but these are the ones coming to mind at the moment. I should also note that something I started as a throwaway script sometimes ends up in the test infrastructure, but usually I rewrite it a bit as it goes in.

I find it helps to think about the purpose of the tool as I'm writing or starting to use it. That way I don't spend a lot of time on something that doesn't need it, but if it needs to be done right I at least don't shoot myself in the foot with a hack job.

Tuesday, September 1, 2009

"One" Difference

When we're reproducing an issue, or when we're debugging a problem, we generally try to isolate the variables and then eliminate them one by one until we've found the subset of things that matter. This is, then, the simplest possible way to reproduce the problem.

In the end, you get down to wanting to change one thing at a time. Let's say, for example, that you find a problem on a directory that is compressed, encrypted, being written to by 4 NFS clients and 2 CIFS clients, and happens to be named "my volume". Coming up with a list of potentially relevant variables seems easy:
  • directory name
  • number of clients
  • type of clients
  • compression
  • encryption
Then we just start trying and eliminating variables. Try it again with compression disabled. If the problem still reproduces, then you know that variable is not relevant (hooray!). Try it again with encryption disabled. Launder, rinse, repeat until you've got it.

Here's the problem:
You haven't got anywhere near all the variables.

The problem also happened on a Tuesday. It happened on a system containing four servers that was 35% full. It happened at 8pm and a cleanup process was running on the servers. And we haven't really considered other processes on the same box, network traffic (in a multi-system configuration in particular), etc.

Ooooh...

There are a lot of variables in any sized system. Fortunately, most of them don't matter most of the time. That bug where we can't write to directories containing underscores really doesn't care about day of the week or hardware configuration, or whether the directory is compressed, or anything else.

There are three lessons we can take from this:

Lesson 1: You're never going to be able to change just one variable.
In a system of any real size, more than one thing is going to change between runs. It's just that you'll change one intentionally and others unintentionally.

Lesson 2: Most of the time that's okay.
We all have tales of some doozy of a bug that only occurred on alternate Tuesdays while standing on our heads and clicking with the left ring finger. Those are usually pretty rare. Most of the time something that's going to fail is going to fail because of the interaction between a couple things, or just one broken thing. (Hence: pairwise testing).

Here the best trick I know is to divide my variables into "proximate" and "background", with "proximate" being the ones that I believe are more likely to be relevant here. You can figure out likely relevance by your gut if you've been working with a system long enough. Base it on past bugs, the system architecture, and other things you've been testing on this build. Then manipulate the "proximate" variables and don't worry about the background variables for a moment.

Then, just do each test twice. Think you've reproduced it? Try again, preferably on a different system. If a background variable is relevant (and you haven't picked up on that) it's likely to have changed between your two so-called identical tests. Inconsistent behavior means that you've missed something and need to go digging deeper.

Lesson 3: Think first, then change.
I suspect some people are getting sick of hearing this from me! But I'll say it again since I think I need reminding myself:

Slow down. Think about what you've seen. Then make a deliberate change and proceed.



So what do I do when I'm trying to narrow down a bug?
  1. See a potential bug
  2. Try it again until it's somewhat reproduceable
  3. Compare times when it did happen to times when it didn't happen and come up with a list of differences. These are my "proximate" variables.
  4. Retest, changing one of these at a time and doing each test twice (on two different systems if at all possible).
  5. Repeat for each of my proximate variables until I can make it happen every time.
  6. If step 5 fails, revisit step 3 and continue.