Thursday, April 30, 2009

The Bad Days

We run a large suite of tests every night - it numbers in the thousands. As the suite has gotten rather large we've built some infrastructure around it, and we watch it pretty closely. The suite actually catches things, too - race conditions, regressions, deadlocks - that are hard to find manually.

When things are running well, this isn't a huge deal. It's an hour or so every morning checking to see if there are new failures, etc.

And then there are the bad days....
The day someone checks in a bug in the infrastructure and 500+ tests fail.
The day someone adds a new package, does it incorrectly, and in the morning there are 150 machines stuck in tests.
The day a hard drive eats itself and takes out an entire set of 200+ tests.

Those days are enough to make you think you should scale back your regression suite. They're enough to make you think you should just stop.

Don't stop. Stopping is overreacting.

Instead, ask yourself how you can make this powerful tool (remember, this thing finds some deep, nasty, hard-to-find bugs) a bit friendlier. Maybe you can make cleanup easier. Maybe you can make it easier to test changes to the infrastructure. Maybe you can improve error reporting so you can handle the 200 identical failures in a single stroke. Either way, maybe you can address the problem while keeping the good.

Your first reaction isn't always the right one, particularly on the bad days. In our case it's a large test suite. In your case it may be something else. Either way, don't throw the baby out with the bath water, and don't make a big decision rashly, no matter how bad the day is.

For every bad day, you'll find a good day. Be patient and you will get through it.

Wednesday, April 29, 2009

Paper and Tidiness For Success

QA works in a lot of different areas - our desks, our conference room, other conference rooms, the lab. 

My single biggest productivity tip? Keep it all clean.

My second biggest productivity tip? Keep pen and paper readily available in each area.


Cleanliness first.

We're all adults around here. You'd think we'd keep a workspace clean. But pictures of messy desks and the rise of the organization industry say I'm wrong. Nonetheless, keeping a clean workspace lets us all work together effectively. If there's space on the table for our laptops, we can actually use them. If we have to look through a stack of 50 CDs to find the three that actually have today's build on them, we're wasting time. So keep it neat and we'll find more time to do the actual work (instead of just shuffling stuff around).

And paper.

Most of the time, we don't use much paper. Sometimes, though, it's essential. Have an idea and need to sketch it out? Paper should be right there for you. Caught a glimpse of an error message on a terminal? Quick, grab a piece of paper and write it down; you may or may not catch it again. Having paper around - at least for me - sometimes makes the difference between forgetting and really having the information.

They're small things, really, and low effort. It takes almost no time to keep a workspace tidy (once it gets there). And it takes very little effort to put a notepad and pen in each workspace. It doesn't take a lot of gains to make it worth it. Little things can make work life a lot smoother.

What do you try to do to make your workspaces better?

Tuesday, April 28, 2009

Outside In

When faced with a problem or potential bug, the logical first step is to reproduce it. Simple, right? Just redo the thing you were doing when it happens!

Sometimes that's harder than it seems. When a bug is not simple to reproduce, there are a couple of ways to approach it. Fundamentally, they break down into approaching the problem from the outside in versus approaching the problem from the inside out.

Approaching the problem from the inside out is my usual approach. I start as simple as possible, and keep adding circumstances until it reproduces. For example, if the program crashed when I clicked "Add Node", I would do this on a clean system:
  • click "Add Node" (shoot! didn't happen)
  • make sure I have the same number of nodes already in the system, then click (shoot!)
  • make sure I have the same node names, then repeat all of the above
  • ...
  • ...
  • ...
  • make sure event log collection is going on for that node type, with the same number of nodes of that type, then click (Got it!)
This often works, but sometimes there's a mysterious something going on and you just can't pin down the problem this way. When going inside out doesn't work, consider going outside in.

Finding a problem from the outside means that you don't just blindly do what you did before. Start with the analysis, and then reproduce. It will feel backwards at first, but it becomes more natural over time.

The basic steps to an outside in analysis are nothing you wouldn't do; it's just the order that's different.
  • Identify the proximate failure point. With a crash that's pretty easy. Other times it may be a bit more difficult, but we're not looking for the root cause, only the immediate problem you saw.
  • Look through the logs to find the failure point. Once you know what your proximate failure is, go find that point. In most logs, you're just going to look right at the time it happened. Any errors? What was it doing? If it crashed, are there cores or assertion failures?
  • Figure out why that failure happened. What was missing? What extra something was there? What does the code say? Is there a workflow or sequence here? A dependency? You're only looking to go one step back.
  • Repeat. Keep working your way into the problem, one step at a time. Maybe your crash occurred because something was nil and shouldn't have been. Once you know that, go find what was nil. Once you know that, go find why it was nil.
Eventually you'll have started to understand the problem. You may or may not make it all the way to root cause here. Either way, having worked through the failure chain, you can characterize much more cleanly what circumstances are required to get the error to occur. It will make reproducing the problem more reliable (no one's saying it will be easy now, though - sometimes the setup is difficult!), and it will make fixing it more sure.

It's easy to form habits sometimes: "I found something. I should reproduce it." When that doesn't work, or when it's not giving you the results you need, consider your alternatives. Sometimes you need to turn your plans on their head - and work from the outside in. 

Monday, April 27, 2009

Overall Feeling

I go to a meeting every Monday in which the question is asked, "How do you feel about [next release]?"

That's not the easiest question to answer. We're still in active development, so I certainly wouldn't want to see it shipping! However, there are a lot of nuances between "not started" and "ready to ship".

There are a lot of things to consider when you're thinking about how a development effort is going:
  • What's implemented. You said you'd do n features. Have you done some of them?
  • Burndown. If everything that's ready is small stuff and the harder/larger stuff is left, that might be a problem. Check your remaining estimated tasks against the time you have left to see if you're on track or not.
  • Bugs. Are you finding lots of bugs? This may be a good thing. If you're finding the same numbers of bugs, but two weeks earlier, then you're improving. If your find rate is far higher, though, that's a red flag.
  • Can you get it running? If you're in a good state, you can probably install the software and get it basically running in a QA environment. If you can't do this, you're in a pretty unstable state. If this is planned, that's okay. If it's not planned, we have a big red flag going on.
  • How far are you? Early in the release cycle instability, inability to install, etc. are generally not a big deal. Late in the release cycle, those things start to become problems.
When you're asked how things are going, don't just go with your gut. Take a few minutes and do a little research. Get a good picture of everything - dev, testing, etc. - and only then talk about how the release is going.

Friday, April 24, 2009

Don't Look Right At the Sun

When you want to see an eclipse, you don't look directly at the sun. You look around the sun.

Testing is much the same way. I find most of my bugs in peripheral vision, not directly in the thing I thought I was testing. I wrote about this a while ago: seeing around the problem.

I have a really hard time teaching other people about this. Some people just get it; others don't. How do you teach someone to see a bug when they're looking at something just next to it?

Thursday, April 23, 2009

What's the Least?

I don't remember the last time I was talking to an engineer who had too much time, or too many resources, or not enough to do. We're all resource constrained! (well, at least in my sample set)

So we're all trying to cram more in.


There are two ways to look at this:

  • What's the most I can do with the resources I have?
  • What's the least  I need to accomplish X?

When I'm asked to do something, I realize that I'm looking at one of these two questions. Either my resources - people, time, machines - are static and what I can get done is variable. Or what I can get done is static and my resources are variable.

It's important to figure out which question I'm facing. If I'm being asked what I can do with my resources, then there's no point in trying to change the resources; it's time to take a hard look at what I can actually accomplish with them. If I'm being asked to do something, well, better to say what I need than to fail and not do it. Sometimes, though, it's tricky to figure out which question I'm really facing.

A sample:

Question: We need to improve our CIFS testing.
Which is it?: Resources are probably not variable here (unless you have an open req or some budget). You're probably looking at a question about what we can accomplish in this area without altering our schedule.

Question: Our client is going to walk unless performance is 30% better by the end of the month.
Which is it?: Your goal is set (and probably relatively clear!). Now is the time to ask for the resources you need. In a crisis like this, resources can be made available. Use them. And then keep the client. Even better, exceed your client's expectations - they'll probably be happier than if there had never been a problem (but that's a topic for another time).

Question: When can we ship X?
Which is it?: Trick question. If X is defined, then your resources - particularly time - are variable. X may not be defined, though. For this one, you can (and should) play both sides; try to get the resources you need and push at what exactly needs to be done.


The reality is, most questions are about balancing. There will be some variability in resources (usually time, sometimes money), and some looseness in the definition of what the thing under discussion really is. There are very few things about testing that are inviolate. Stick to your principles, and beyond that, be prepared to balance each situation as it comes up.

Wednesday, April 22, 2009

On Balance

I have a big project I've been working on for a while. It's in that stage where I just want it done. And it's close! It really is! So whatever time I have, I've been pouring into this project. The problem comes when I start to neglect other things: "Oh, I'll just work on that bug when this project is done." "I did the bare minimum on the test plan for the next release; I'll flesh it out when this project is done."

I don't really have that luxury.

It's awful to have a lot of half-finished things lying around. In their half-finished state they don't help anyone. But you can't swing to the other extreme and just work on one thing to the exclusion of everything else (at least, not usually). It's important to find balance. You need to do enough of the other stuff that things don't fall apart, while still making progress on your project.

So where's the balance?
  • Set aside project time in large chunks - for me it's Mondays and Fridays.
  • Meetings are meetings; don't bring project work (or any other work) into them. If everyone pays attention the meeting is actually likely to end faster. That means me, too.
  • Set aside time for non-project work, and stick to it. For me this is Tuesdays between meetings and an hour or two most mornings.
What do you do to balance all your projects?

Tuesday, April 21, 2009

Fixing the Symptoms

Well, I've gone and gotten a cold. I'm sneezy and sniffly (and a real joy to be around!).

Our nightly automated tests also caught a cold this weekend. They showed an inability to talk over certain interfaces on about 25 machines.

When it comes to a head cold like I have, all you can do is treat the symptoms. We know that there is no cure (currently, anyway) for the underlying disease: the common cold. So we try to fix the symptoms - antihistamines, painkillers, etc. When we know we can't fix it, we might as well try to avoid it.

Same thing for nightly automated tests, right?

Wrong.

We know what the symptoms are - some 25 machines can't talk to other machines over certain interfaces. But we don't know what the underlying cause is. Maybe it's curable.

So before we go treating symptoms of the problem, writing code to handle it, let's see if we can just fix the problem. It may save a whole lot of coding, and it's okay to not report on something if that something isn't going to happen.

It's only good to spend time fixing symptoms if you can't fix the underlying problem. If you can fix the problem, don't waste your time on symptoms.

Monday, April 20, 2009

Two Roads Converge

My mom and my husband both have iPhones.

There's nothing too remarkable about that. I was thinking about how incredibly different their paths were to get to iPhone possession.

My husband is what I think of as your typical geek. The iPhone came out and he had to have one. So he got one. Then he figured out what he could do with it. And he's come up with things - email, GPS for the ZipCar, etc.

My mom is definitely not a geek. She had a set of problems - couldn't use her (CDMA) phone overseas, got lost when walking around strange cities, and didn't like being away from email for days at a time. So she looked around and got an iPhone.

Same result. But definitely two different roads converging on the same solution.

I think sometimes as engineers we start to act like geeks. We want it because it's cool. If our product is for engineers, that's great. Usually, though, it's not. So if we're going to make our product sellable, we have to play down cool, and play up the solution. It's not good enough for most of our users to go from solution to justification. Instead, we have to help them get from problem to solution.

Friday, April 17, 2009

Heard In the Office

We have quarterly meetings at which all departments present what they've been doing and what they are going to do in the next quarter. In engineering, each of the dev leads takes a turn.

This quarter's dev lead is a very good engineer, but he's not hugely experienced with PowerPoint. He's been trying to put together a slide presentation, and getting rather frustrated. It' s been a lot of fun listening.

This gem came out today:

Last night I was googling "Powerpoint for emacs users". It didn't turn up much.

Now it will turn up one more thing. Useless, but there!

Thursday, April 16, 2009

Technical Debt: Paying it Off

Let's face it. This isn't going to be easy. Paying off your technical debt is like paying off your credit card debt. You do the work, make the payments, and the only reward is watching your balance go down. You don't actually get any snazzy new features marketing cares about. This means it's going to be a rough sell politically. Your only hope is the promise of future reward: with less technical debt you'll be able to produce features better faster more reliably... you just have to spend some time paying down your debt to get there.

So, how do we start paying it off?

Be a Salesman
In general you can pick and choose your approaches to paying off technical debt. Do some things that work for you, and don't do things that aren't helpful in your particular case. Selling the idea of paying down technical debt, though, is pretty much always going to be necessary. Unfortunately, as an engineering team you don't usually get to pick everything you do. You're captive to the needs of the company as a whole. Marketing needs things, sales needs things, support needs things, engineering needs things. If "paying back technical debt" is going to be one of those things, you need to make sure it gets on the list of tasks and stays high, even in the face of all the other important things. Congratulations, you're now a sales guy.

So, what arguments can you make to help others see the importance of paying down this debt?
  • We need to polish some features. Let's face it, dev has cut a few corners to meet dates and features. Bring up specific and recent examples of those cuts, and emphasize what your users can't do or your marketers can't say. For example, we did a feature that was basically a script that gathered information off each system and emailed it back to support. It's incredibly useful, but it wasn't really done. It wouldn't automatically move to another node if the node it was running on failed; it couldn't detect if it was running on two separate nodes (and would send back data twice), etc. Finishing that off will make support's life a lot easier, and prevent us from having to call up clients and get on the system to restart the thing. The point to emphasize here is that cleaning up technical debt improves the product directly. This same basic argument applies when you're trying to fix bugs.
  • It will make implementing future features faster. Right now implementing features requires you to work around necessary refactorings, etc. If you can do those, it will make it faster - and you need to specify to some extent how much faster - to do certain features in the future. Be sure you also note how often you have to do these things. You're trying to show that you get a net benefit from doing this. 
  • We can test more efficiently, which will shorten our release cycles. When code is cleaner there is generally less test overhead. Running each test may or may not be faster, but you'll spend less time chasing down bugs that appear only in some places and not others. In clean code, bugs are more likely to appear in all code paths, rather than being long dependent sequences of events. The net effect will be fewer bugs found, fewer bugs found late in the process, and fewer bugs found after release.
  • It will make our code more attractive to a partner/customer/buyer. This argument may not apply in your situation, but if you're a startup with an exit strategy, or if you want to raise a round of funding, or if you sign deals with partners or clients that include source code sharing, then it starts to matter.There are popular websites dedicated to laughing at bad code; do you really want your company to show up? Clean code inspires confidence, even before you run it, and that's the kind of confidence you want to instill in someone doing due diligence.

As with all convincing, specificity is your friend. Specificity using examples your audience cares about is even better. With support, using a support-related example showing simplicity and reliability will help. With marketing, talking about the great new features will help, etc.

Just Pick Already
A really good way to avoid paying off technical debt is to spend a lot of time figuring out what to do. Don't do this. Yes, analysis is important, but it must be timeboxed. Spend no more than 2-3 hours figuring out what technical debt to pay off first. Then go do it. Launder, rinse, repeat. The trick is to not get bogged down in analysis.

Also make sure you pick something that you can accomplish. The trick, as with much development work, is to pick things small enough you can do them before you get interrupted. Think "refactor file system management code", not "refactor entire system management framework".

Balance the Customers
Now that you've sold your "paying off technical debt", you need to show improvements to everyone you've convinced. So when you've done a refactoring to make dev/QA happier, follow up with some polish on a feature that didn't quite get done. Your goal here is to be able to show positive effects to multiple stakeholders - marketing, sales, support, dev, etc.

Fix Recent Pain
You're never going to get all the time you want to fix all the technical debt; this is not a fast process. You need to prioritize fixing things that have hurt you recently. So when you're picking something, think about what hurt you recently. It's fresh in your mind and it's fresh in your stakeholder's mind, Any change you make to fix recent pain will help you keep going because the benefit is easy to see.

Whiteboard of Ideas
I keep a whiteboard up in a common area in the office. On this white board anyone can put technical debt as they see it. Got a story and didn't have time to polish the edges? Put it up. Had a problem at a customer site and needed manageability changes? Put it up.

Now, this is a dangerous thing, and you have to police it. This is not a place to put all enhancement ideas. And nothing on here can be vague. Rather, this is your list of accomplishable technical debt to pay off. Think, "refactor Performance tests to use standard base test class", not "make product 5%". For this reason, I generally filter ideas through a small subset of people (team leads) - only if they agree it's actually technical debt do we put it up.

Measure It
You've promised improvements. You've said that spending time reducing technical debt will reduce the number and severity of bugs found. You've said it will result in fewer calls from customers with issues. You need to show it as well, or you won't be as believable next time. So before you actually start work, you need to figure out how you're measuring your results.

Do a bug find rate before and 6 weeks after you've done some work to see if it's really going down. Start mapping your customer call rate before you start work; in three months see if it's gone down. Check your estimation accuracy before and after refactoring - look to see both shorter implementations and more importantly, more accurate estimates. Numbers speak volumes.

What I'm Not Going to Tell You
I'm not here to tell you how to actually go about refactoring, or finishing features, or whatever. Use your development process for that. This is really about recognizing, identifying, and finding ways to pay off your technical debt.

I'm also not going to tell you this will be quick or easy. This is something you're going to be doing for the life of the product - a long time, I hope! Plan for this to be part of a reliable, sustainable product creation process (notice I didn't say development process - this involves everyone).

Technical debt is a big scary thing, but it is conquerable. Put your mind to it, and keep it in the forefront, and your technical debt can be paid down and kept to manageable levels.


Other posts in this series:
Technical Debt: What Is It?
Technical Debt: Warning Signs

Wednesday, April 15, 2009

Technical Debt: Warning Signs

Let's get this one out of the way: You have technical debt.

The real question is how much and how bad it is. A little debt isn't enough to worry about, but a lot can make it almost impossible to accomplish anything. So really it's about finding your technical debt, getting a good accounting of it, and knowing what you have to tackle.

So, what increases our likelihood of technical debt, or shows us that we have debt?

Development methodology
Some development methodologies are more prone to technical debt than others. Very heavy very slow methods that place more value in complete and correct than in being on time are going to discourage technical debt. Think NASA: getting a rocket to the moon and back correctly is far more important than doing it in 2009 versus 2010.

XP and other incremental methodologies are particularly prone to technical debt. They don't do design first, so they are more dependent on refactoring during and after development of features - an area that's ripe for technical debt. After all, that refactoring, well, we could ship without it, right? I mean, the feature works. (Technical debt alert!) These methodologies are also highly interested in customer-visible features, and sometimes that tends to leave out the invisible stuff that needs to happen on the back end.

Team Silos
Once you are large enough to have multiple development teams or multiple development areas this factor comes into play. I suspect we've all seen this scenario: the server team adds a reporting element ("% compressed") and forgets to tell the UI team they've done it. So the UI team doesn't add it to the report filter, and it only shows up when you do the "show all" report. Is the feature usable? Mostly, yes. Is it truly done? Well, no. You've gotten a little technical debt here.

Not Addressing Potential Debt Explicitly
When you're defining a feature, or setting out the tasks you will need to do as part of the feature, this is the time to recognize and call out potential technical debt. For example, if the feature is to add a second type of widget, the tasks for adding the second type should include "refactor the first widget type to use the base widget class". The goal is to make more public the work you need to do to avoid technical debt. Make the decisions about accumulating technical debt explicit decisions; don't make them out of ignorance.

Bugs
Take a look at your defect tracking system. Do you see lots of bugs that say things like, "happens with volume type A but not with volume type B"? How about bugs that are reopened with the comment, "fixed in X but not in Y"? Those are signs of technical debt. Seeing many bugs or large bugs come in after a feature is "development complete" is also a sign of technical debt - complete isn't as complete as maybe it seemed at first.

Code Complexity
Generate a class diagram, a state diagram, and if possible a workflow diagram. Do you see neat little boxes or a whole lot of spaghetti? Spaghetti is technical debt, staring you in the face. If you can't generate or draw a class diagram at all, well, that's a really big problem.

Complexity of code isn't inherently bad. However, large and complex dependencies are often indicators that your model is ripe for refactoring, or that you have several half-thought-out ideas going on.

Date-Based Releases
Once you've committed to a release date and a feature set, it can be hard to change. And to change it because you really want to put a button on one more screen? Not likely. The "we have to ship on X because X is the date" mentality is very common (and rightly so - you can't be late forever because you're chasing perfection). However, to meet that date you're likely to cut corners, especially if you've underestimated how much time the feature really takes, or how much other stuff is going on.


More in this series:
- What Is It?: Defining technical debt
- Warning Signs: How do you find your technical debt?
- Paying It Off: How do you reduce your technical debt?

Tuesday, April 14, 2009

Technical Debt: What is it?

Technical debt. Two words guaranteed to strike fear into the hearts of your senior engineers....

... and evoke a big,"huh?" from everyone else.

Technical debt, in a nutshell, is the stuff you should've done but didn't. It's the refactoring you didn't do when you bolted on that new feature. It's the feature that you did 80% of, but you just never got around to really finishing up. It's the package you should upgrade but didn't want to break anything.

None of these on their own is huge. However, over time it all adds up to a lot of technical debt.

And there the trouble comes. Because the consequence of technical debt is huge: development slows down. Leave it alone enough and development can grind almost to a halt.

Let's take a simple example: We had a concept of "nodes" - individual addressable machines. We added the concept of "grids" - a collection of nodes. They are distinct entities but they have some things in common (a name, an addressable IP, etc). When we added "grids", we should have refactored to make these things work together, but we didn't. At that point, we accumulated a little bit of technical debt. In and of itself, no big deal; it doesn't change how the feature works. However, now every time I want to make a change to a common thing (say, the way the IP is resolved), I have to do it in two places - nodes and grids. Thanks to that little bit of technical debt, things are taking time-and-a-half or twice as long. Multiply that by lots of instances of technical debt, and all of a sudden your half-day jobs take two or three days.

On top of the slowness of development, technical debt:
  • Frustrates developers. And then you get all the negative consequences of this up to and including employee loss.
  • Promotes bugs. Changing things in several places or inconsistencies makes it a lot more likely you'll introduce bugs.
  • Slows down delivery. See above about things taking longer.
  • Makes your code less useful to third parties. Open sourcing something? Better clean it up if you want anyone to look at it. Trying to get bought? Not having too much technical debt is part of what makes you attractive.
Like any other form of debt - credit cards, home equity loans - technical debt is the result of living beyond your means as an engineer. Eventually it all comes due.

More over the next few days:
- Warning Signs: How do you find your technical debt?
- Paying It Off: How do you reduce your technical debt?

Monday, April 13, 2009

Do What You Wanna

Sometimes life slows down a little bit. There's lots to be done, still (there's always lots to be done!), but it's slow enough that you can take on projects. You can clean up that tool that has always had some rough edges. You can work on getting your test Active Directory installation moved to virtual servers. You can... well, lots of things.

It's temping here as a team lead to go through and prioritize what you think is important and hand it out to the team. I'm not saying this is wrong, but one thing that I always try to put on the list is "do what you wanna".

Do what you wanna means everyone on the team gets to pick what they want to do, at least with some of their time. Sometimes someone will pick something I think is minor at best (making a script log a ticket correctly, for example), but that's okay. Maybe I was wrong and it's a good win. Maybe it's just something that the engineer really thought he could do. Maybe it was an excuse to learn a new language or technique.

The point is that there's no need to micromanage everything. Ask for the really important stuff, but don't sweat everything else. There will always be more work. There will always be more time (eventually). Give your engineers some freedom, and make "do what you wanna" a task just like everything else.

Friday, April 10, 2009

Meeting-Free Fridays

In case ya'll can't tell, I'm something of a creature of routine. I have my ambitious Mondays. And I have my meeting free Fridays.

Basically, I've designated Fridays for working hard all day. That means the only meeting I set up is the daily standup we do. Other than that 10 minutes, the whole day is for knocking things out.

The trouble with Fridays is that by the time they come around, I'm probably behind where I wanted to be. I've got some detritus in my inbox, I've got an accumulated list of things I'd really like to get to, and I've got loose ends to tie up all over the place. The last thing I want is that hanging into my weekend - I'll just wind up either working or worrying about it all weekend.

So Fridays are for tying up loose ends. My typical Friday looks like this:
  • Check on nightly automated tests from the previous night. This is pretty much every day.
  • Clean out my inbox. Go through and file or respond to each email that I haven't yet handled. This is generally the biggest one, because I'll have accumulated several emails of the "please review this" or "can you try to reproduce issue Y?" variety. If I can't get to zero, it's been a bad week.
  • Get projects to a stable spot. I'm not trying to finish things here. I'm just trying to get them to a stable spot so I can come in Monday and really get stuff done.
Meetings can be useful things sometimes, but we all need time to just work, too. For me, that's Fridays and Monday afternoons. You can pick any day or half day you like, but try giving yourself a meeting-free time to work. You'll be amazed at how much you can do.

Thursday, April 9, 2009

Why As Much As What

When a feature comes into QA for the addition of acceptance criteria, it usually talks about what's going to happen. It describes what the feature is going to do.

Okay, that's nice, but I'm not accepting what it will do.

I'm accepting the solution to the user's problem.

Absolutely we care about what this feature will do. But we also care about why we're implementing this feature. What is the user's need that this is trying to solve? We could build the best car in the world, and it won't help the user get from New York to Paris, thanks to that darned ocean in the middle! Part of our job in reviewing and accepting features is to make sure we're meeting our customer's need, and we can't do that if we don't know what the need is.

A caveat here: we are probably not the best people to be guessing what our customer's needs are. That's where product management (we hope!) really excels. I'm mostly looking for whether there's another thing that will prevent this solution from accomplishing what product management hopes it accomplishes.

For example, if the customer need is "I need to see statistics on how much data we have processed in the last 24 hours", product management may have come up with a new screen in the GUI that shows these stats. This won't help if our customers can't get to the screen thanks to the access control features.

We're looking for the gotchas and the hidden preventers. That's only possible if we know what the net effect is supposed to be, not just how product management is hoping we get there.

So when you look at a feature or a story or whatever you call it, ask yourself not just "how will I know it's done?" but also "how will I know I've solved the real problem?".

Wednesday, April 8, 2009

Watching Someone Else

On some occasions I get a chance to get on the phone with senior support or QA from another company. When I'm really lucky I get to watch them try to debug a problem.

Treasure these opportunities.

It's incredibly illuminating watching someone else debug a problem. Even if it's not your software, there are lessons to be learned in other people's thoughts, as expressed in mouse clicks, key presses, and spoken (or muttered) imprecations. Think of all the things you can look at:
  • What is their architecture? Sometimes you care, sometimes you don't, but it's interesting to note that someone is using Java, or that app is all Perl. In particular if you're trying to use scripts or an API it can help to know about the underlying architecture so you can walk around the minefields posed by the language (after all, all languages have minefields!)
  • Lessons for next time with this app. If you're going to have to interact with the app again, it's useful to know what they look at to debug the problem. You may not have all the tools you need, but you can probably use some of their commands to see where they look for things. Did they check a log file you didn't know existed?
  • General debug lessons. Where do they look for things that aren't related to their app? Are they suspicious of IIS or tomcat? Do they immediately go to the registry or to wireshark? How do they trace that suspicion - what events are interesting and what do they dismiss? In what order do they take debug steps? Sometimes you can learn efficiencies or new tricks just by watching someone.
When we're working without outside influence, it's easy to fall into ruts - both as an individual and as a team - and that means we start to miss things. Watching someone else is a good chance to break out of those ruts a little bit. So when you're watching someone debug, don't automatically alt+tab and do something else. Watch for a while... you never know what you might learn!

Tuesday, April 7, 2009

MySQL Versions and ETL

I've been working on migrating the company from RT to Jira for defect tracking. I've got Jira all set up, I've got the scripts we use for logging bugs automatically reconfigured. I've got the users created and passwords coming out of LDAP. The only thing left is to migrate the old issues from RT to Jira.

This should be a straightforward ETL operation, right?

Well, yeah, should be.

I decide I'm going to do this in Ruby (hey, why not), so here's what I do:
  1. install the Ruby MySQL gem (sudo gem install mysql)
  2. pick an ETL library (ActiveWarehouse)
  3. follow the instructions
Here's the problem: we're running RT 2, which is old. It runs on MySQL 3.23, which is also old (unsupported since 2006). And the standard Ruby gem simply doesn't work with it. Heck, nothing in MySQL 5 works with it.

Here's what happens when you try to log in with the MySQL 5 client:


catherine-powells-macbook-pro:etl cpowell$ mysql -u rt_user -p -h rt rt2
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor. Commands end with ; or \g.


It hangs there for some undetermined period that seems to be about two hours. After that, you're in and it works fine; obviously, this is going to slow down the ETL process a lot. 

What's interesting is if you get on the system from another client and do SHOW PROCESSLIST, you get this:

| Id | User | Host | db | Command | Time | State | Info |
| 615 | rt_user | vpn-21.permabit.com | rt2 | Sleep | 75 | | NULL |
So you've logged in; you're just... hanging.

I finally figured out that it's some version incompatibility. A MySQL 5 client - either the Ruby MySQL gem or the client you get when you install MySQL - will do this every time. MySQLAdmin, curiously, doesn't have this problem.

I must have spent 5 hours searching before I figured this out. So if there's anyone out there with a MySQL client that hangs on login, if you're using an old MySQL (3.23 for sure, not sure what others), try using an old client as well. I wound up using an old Ruby-mysql library.

As for the ETL project, well, now that I've solved this I can actually really start it.

Monday, April 6, 2009

On Not Being Involved

As a new QA Manager, it was really easy to succumb to temptation and to truly believe that "nothing goes out of here before I look at it!"

I like to think I'm wiser now.

Sometimes the best thing you can do for a project is to not get involved. Yes, you are the QA department. And yes, the quality of the software that leaves your company reflects on both your company and on you specifically.*

But.

Sometimes it's okay for quality to have a different bar that you haven't touched personally. Sometimes you can't add value.  And in those cases, you're just adding overhead and it would be better if you stepped away (and found something where you can add value).

Consider not getting involved if the project is:
  • A demo. Particularly if the demo is controlled (aka done by someone you know to a set script), then you don't necessarily have to get involved. The lack of variability and the incentive to the demonstrator to have it go well will help protect you. The people doing the demo and the people building the demo can do the testing that's needed.
  • A prototype. A prototype isn't designed to be shipped. It's designed to show whether something is feasible at all. It's a long way from prototype to shipping product, and lots of things are likely to change. So don't worry about it too much; once the prototype is done and actual design/implementation begins, then you can get involved.
  • A dependent component. Let's say a new hardware platform is being qualified for the software you produce. Qualifying that hardware platform may not be the best use of your time; sometimes your vendors and your lab team can do this for you. This is a borderline case; your specific situation will determine whether you ought to get involved, but it's worth asking yourself what would happen if you didn't stick your nose in.
Knowledge is comforting, and being involved in projects lets you help control them and mitigate their risks. However, you can't be involved in everything; there simply aren't enough hours in the day. It's better to explicitly not be involved than to try to be involved and do a rush job of it. So ask yourself sometimes whether this really concerns you, and know that sometimes it's okay for the answer to be "no".




* It's not fair, but I think this is true. If you work in QA for a company that ships a low quality product, you'll be tainted by that, even if it's not your fault.

Friday, April 3, 2009

Scaling Up

We sell large systems. These "storage in the sky" things are really really easy to put data in, and deleting is not actually a common operation for most of our customers. After all, an online archive system is pretty much made for companies that have to keep data around for 5 years, 10 years, 20 years just in case they get audited. (SOX was great for storage companies!)

That's great, but it poses a rather large testing dilemma. Can we really test a system as large as sales can sell?

The short answer is not really.

Let's say our system is 250 TB. Even if we were to do nothing but pump data in 24 hours a day at 20 MB/s per machine, it would take about 151 machine days to fill it. Multiply that by the number of different releases you have to support and the myriad interesting tests (fill with small files! fill with huge files! different directory structures!), and you've got a really big job on your hands. The hardware costs alone are enormous. I'm also pretty sure sales can dream bigger than that.

So, we can't actually expect to do everything our customers do on a system level. What now?

After all, big system and full system are both boundaries, and are areas where we would expect to find bugs.

There are a number of things we can do to help identify and prevent defects related to large and/or full systems:
  • Code inspection. Stop treating this like a black box, and start looking for where there might be breaks. Think queues, memory allocation, references to data, etc. You're already looking at the code (I hope); ask yourself as you look what happens if item X gets really large or really numerous.
  • Unit tests. Let's say I find in my code inspection that we're creating a pointer to every piece of data written. I don't have to actually write all the data; I can create a test that simply creates the pointers and then exercises them (retrieves one, deletes some, etc). Much faster, much cheaper to run, and it'll show me how that particular data structure scales.
  • Added constraints. If you can't scale the system up, you can sometimes scale the environment down. Using Java? Set the heap to half the size you normally run. Ship with 4GB memory? Try running it on a system with 1GB of memory. That way you hit the constraints a lot earlier. Your false positive rate is probably higher, but it can expose some edges.

The common theme here is faking it. Treat the system by its component parts instead of as a larger mass. System-level tests are great, but sometimes the more effective test is one that attacks the potential underlying problem directly. You have to think a bit harder to anticipate the potential problem, but you'll test it more effectively, faster, and cheaper in the end.

Thursday, April 2, 2009

Power Through

I have a great team.

Not because they're solid testers (they are).
Not because they can write a defect explanation for a client and for the developer fixing it... and have it make sense to both of those very different audiences (they can).
Not because they can code up a perl module to handle thousands of tests and write tests for that module just in case (they do).

I have a great team because they have absolutely no qualms about doing any work that needs doing. 

We just moved offices, and most of the team's time has been spent helping IT get our lab up. 

Everyone's desk need to be wiped down to get rid of the dust? No problem. 
Racks need rails and machines and cables and power? Sure, point me at a rack. 
QA switches need VLANs? All right, I know I saw a serial cable around here somewhere.
Temps need something to do? Okay, let's see what else needs to be done and get 'em started.

And all of it cheerfully.

Sometimes being a good tester isn't about testing. Sometimes it's about recognizing that the best way you can move your product forward is by helping out another team for a while. It's about accepting that your role isn't to test; it's to do everything you can to help ship good product.

Thanks, guys.

Wednesday, April 1, 2009

Reporting Structure

Someone asked me this today: how do you structure a QA team? Let's talk about some background so this question makes sense.

The Scenario
This is an "agile company" and they have approximately 16 developers. There are no testers yet (they're hiring), and they're bringing in one QA manager. They're splitting into teams of total size between 5 and 8 people and following basically a SCRUM model.

The Question
So the question is how should the reporting structure work? Is QA part of each team? Is QA a separate team?

Let's Think About This
There are two basic schools of thought here that I can see. The first thought says that in a SCRUM (or agile or agile-like) environment everyone's on the team, and that means the testers are on each team. The second school of thought says that testers are a bit special and that they should be on a separate team.

We are all one team
  • Team means team, not "team, but...". The whole team, including the tester, should be aligned in focus and goals. This also fosters increased trust among team members, a "we're all in this together" mentality.
  • Depth of product knowledge. Having the tester constantly with the team gives that tester a deeper insight into the product or portion of the product he's working on. Focus provides depth, and that means you'll test the product better.
  • More likely to hit goals. If your tester is dedicated to your team, then there's less chance that your team will miss a goal because your tester had to do something else for another team. A dedicated resource - any dedicated resource - is less likely to fail.
Testers are a team
  • Bus factor goes up. If you have a test team, you have a group that can back each other up. John is sick? No problem, Matt can test that feature today. This doesn't work if there's one tester per team, and this is huge across vacations, illnesses, and testers eventually leaving.
  • Discipline thinking. Being around other testers, talking to other testers makes you a better tester. If you're on your own surrounded by developers, your test thinking and techniques are likely to get stale.
  • More flexible coverage. Not all projects have the same testing needs, or even the same level of needed testing, at all times. So one dev team may have very few testing needs while another dev team may be in a heavy testing point. With a test team, you can scale up and down with teams by simply shifting resources.

Bottom Line
The bottom line is that it depends on your situation.  I think it can work to put testers with teams (option "we are all one team") if you:
  • have a group of strong, confident testers
  • foster cross team communication so testers continue their discipline-specific communication
  • have large enough teams that you have more than one tester per team
If you don't have those factors, then in general I would err toward creating a test team and having testers work generally but not exclusively with a given development team.