Monday, August 30, 2010

turbo = true

I have a client who is starting to look at the performance of their product. They've spent a good amount of time building the product, adding features, creating documentation, etc., but they haven't done anything explicitly to make it faster. And now it's time to make it faster.

A group of us working on the "performance improvement" project got together to talk about how we'd start going this, and the first question that came up was this:
"Well, which configuration are we going to do?"

This is a legitimate question. This software can be deployed in two different modes and several different sizes. Which one should we try to make faster first?

Here's a little secret from every single performance optimization effort I've worked on: It doesn't matter. If you haven't done any performance optimization at all, your first attempts will almost certainly help everything. It's like a global setting "turbo = true" in your code, that first round of performance enhancements.

Of course no one intentionally writes slow code, and we don't intentionally design it to be slow. But inefficiencies creep in. Maybe they're the fast way to implement something, maybe they're the easy way to implement it, Maybe it's a more junior developer who doesn't yet know that you don't do a linear search here because this data structure can get large. Whatever the cause, there's probably a bit of sloppiness, and fixing that will help performance across the board.

Pick a configuration to work on, absolutely. That will help the performance team produce comparable results and provide some focus. But don't stress too much over the choice. At least at the beginning, you're likely to improve more than the thing you're focusing on. Later on this will change, but for as long as it lasts, accept the turbo button you're enabling in your product.

Friday, August 27, 2010

Test This

If I had to write a script for testing an elevator, when I got to the section where I had to push buttons, I would write something like this:

def pushFloorButtons(floor)
pushButton(floor)
confirmDoorsOpenOn(floor)
end

for 1..10 do |floor|
pushFloorButtons(floor)
end


My script would never have anticipated the elevator I got on today:

And yes, I did get off at 2.5. It looked a lot like 2. Just another parking garage.

Thursday, August 26, 2010

Always an Opinion

Let's say we have a new engineering leader - a QA lead, a support manager, or a development manager. One of the things that usually happens when someone is promoted to lead is that there is a lot more available information. There are more meetings that involve other groups. There are email threads that expose information previously unseen.

A few examples:
  • the new QA Manager now sees all support tickets automatically
  • a new tech lead for a product now gets an email describing marketing plans for that product
  • the new development manager goes to a cross team weekly status meeting involving product management and several other development teams
Now we have someone with a lot of new information and a feeling that they need to "step up" and "show they can do this job". This combination can lead to a dangerous new attribute:

They always have an opinion.

You an tell this is happening because all of a sudden your newly minted manager is replying to every email thread and speaking up in every section of every meeting, even when those things are at best tangentially related to his area of expertise.

For example, the new tech lead gets cc'd on a communication indicating this release is going to be formally launched at a major industry conference. He replies, "What percent of our intended users are going to be at that conference? Do we know how we're going to measure if this is the right conference so that we can connect to next year's conference?" The question is valid, and almost certainly well-intentioned, but the tech lead is way out of his knowledge area at this point. In addition, the tone of the response is probing and almost aggressive; sure, it's "just a question", but it's a question that indicates doubt, with connotations of "I doubt your decision". If it happens once or twice, then it's no big deal - just someone learning about the new things to which he has been exposed. If it happens frequently, then there's a problem.

This kind of probing questioning outside a person's core area is a sign of a lack of trust.

Whether or not our new tech lead actually doesn't trust marketing to do its job (or sales, or support, or other dev teams), frequent responses and offers of opinions and doubts on topics outside his core area certainly makes it look like he doesn't trust his new peers.

My advice to the new manager is this:

Sit back and watch for a while. Keep in mind where you came from; that's your core area about which you will almost always have opinions, and that's probably why you're here. But in things that are new to you and that are outside your core area, just watch and listen. You don't have to have an opinion yet - you just have to learn.

Over time you'll figure out how to tell when a decision is good or not in these new areas. You'll also discover how to tell when the outcome of a decision matters and when it's not important enough to raise a fuss over. Then - and only then - it's time to start speaking up.

Show trust. It's not necessary for you to weigh in on everything. Let everyone else do their job, too, without your interference.

Monday, August 23, 2010

Why The Metric

I fell into a discussion on the software-testing mailing list last week and over the weekend, about software metrics. One assertion that was repeated a few times was the idea that "we need metrics because management wants them".

Let's examine that. Why does "management" want a metric? (By the way, anyone with Lead, Manager, or Director in their title: you're officially management. Welcome to the party.)

Here's the first thing I learned about metrics when I joined the ranks of management: the number isn't what matters. What matters is the information within and around that number that lets me make the decision I need to make. It's a sound bite and when done well it conveys an entire tone and history that I can easily consume.

A metric is simply information distilled into a manageable chunk.

So if you don't like providing metrics like "number of showstoppers found per release" or "defects per thousand lines of code", that's fine. Find another way to provide "management" with what they need.

It's rather like the "quants" who use detailed statistical analysis of past market information in an attempt to predict future market movements. They're really just creating metrics with a whole lot of math behind them. And those metrics are for one thing only: to predict the likelihood of future events. In the "quant" case, they want to predict whether stock X will increase in price. In our case, we want to predict whether product Y will ship on the currently-targeted date with Z level of acceptable post-release work for handling any problems.

Without using metrics, then, how do we tell management what they need to know?

Let's take as a given the following:
  • any number the team can measure and is tied to their "worth" (bonus, likelihood of remaining employed, continued donut Fridays) is a number the team will eventually figure out how to game
  • "management" wants a measurement that isn't reliant on a single person's "gut" or ability to divine status and release readiness. I don't want my "magic release predictor" to leave the job because then I'm really out of luck.
  • measurements are proactive, taken and identified prior to a release decision

Notice that there are certain things we have specifically not excluded:
  • qualitative (i.e., non-numeric) metrics
  • the reliance on our skills and expertise.
Metrics can have room for humans; it's not all numbers and systems (or it doesn't have to be).

Here's a metric:
Every week the engineering (dev, test, prod mgmt, etc) team takes an anonymous vote: "would you release the software today?". When that number passes some threshold (say, 85% yes), then the engineering team feels pretty good about the product, and that metric turns to go. (You could even baseline this by asking that same team, "Knowing what you know now, would you have released the last version? the version before?". After all, some people are so risk averse, they'll probably never say yes. Others are so risk accepting, they'll probably say yes immediately.)

Here's another metric:
What was the total cost of defects found in the field for the past few releases (say, number of defects, plus cost to handle, plus % of customers affected)? Is that number going up or down? If it's going up, then we've got a problem. If it's going down, then this team is doing better and better. Let's keep doing what we're doing.

Are these metrics that can be gamed? Sure - see the above assumption that any metric can be gamed. Do they have risks? Sure - and so does everything we do. If we want to stop hiding behind numbers, then let's do it. But let's recognize what management needs - information so that they can make the best decision they know how to - and let's figure out how to distill our knowledge, thoughts, and opinions down for them.

We're communicators, and a metric is an opportunity for communication.

Use your metrics as a distillation technique; that's all they are - a sound bite. It's lossy, but it's still communication. Embrace the chance to provide information.

Friday, August 20, 2010

The "Ehh" Phrase

J Michael Hammond happened to mention on a mailing list that he had created a heat map and in doing so had semi-intentionally popularized the phrase "pukey green".

"Pukey green" is a great descriptor.

We've been around long enough to know that "red" is bad or failing. "Green" is good, or passing. That permeates our builds, our tests, our status reports - it's everywhere.

"Pukey green" is great for that precise reason: it's green, so it must be good. But it's pukey, which sounds bad. This is the verbal equivalent of a shrug.

I'd use "pukey green" to describe something that just doesn't feel right, even though so far it looks okay. On the outside, it's working fine, but there's something I can't yet put my finger on that makes me not trust it.

"Pukey green" items are the ones that you can't say why you shouldn't ship them, but you just know you're going to get a lot of customers complaining about it later.

Thanks, good Mr. Hammond, for making my day, and giving me a really vivid way to describe that "yes, but..." feeling.

Thursday, August 19, 2010

Don't Preclude Principle

We're working on a rather large project at the moment. It's fun, it's from scratch, and it's basically engineering driven, since our internal engineering team will be using it. We have enough ideas for it that we could build for at least six to nine months.

There's no way we can wait nine months before we ship it. We need to be using this thing within about two months. Earlier would be better.

So as with any product, we have to trim down our feature set. We'll have to ship something first, and then add to it. This is causing a lot of angst because we know that our consumers will spend weeks building out file formats and changing it later is going to be hard. We don't want to have to ask our customers to scrap the work that they've done, but we can't make them wait until it's perfect.

We use the Don't Preclude Principle:
You don't have to build it yet; you just have to leave yourself room to build it later.

We don't actually have to build everything we're going to wind up needing. We simply have to decide what the format will look like when we're done. Then we build pieces of it, and make sure we leave holes for the pieces we aren't doing yet. This gives us a much cleaner upgrade path. In the case of this file format, it means that right now we have something that simply ignores the values in some places. Later, when we build those features, we'll figure out what to do with the values. It's a simple thing, and only a tiny amount of work now to ignore something, but it'll prevent us from having to migrate formats later.

Think first, trim second, build third. Don't preclude your future ideas, but don't worry about building it all now. Just think far enough ahead that you know what you're not doing as well as what you're doing.

Tuesday, August 17, 2010

Generalization Words

I've written before about words that trigger certain reactions. There is an additional class of words and phrases that we need to be very careful about: generalizations.

What We Say: "It fails randomly"
What We Mean: "I ran this test three times. It failed twice and passed once, and I really don't understand why."
What's the Problem: Random means without known pattern, really. Using the word "random" here is an overgeneralization; you need more attempts to see if there's a pattern or not.

What We Say: "It's slow"
What We Mean: "It ran at 50 MB/s. I was hoping for 100 MB/s."
What's the Problem: Slow means something different to everyone. Sometimes a little slow is okay, but very slow is not. Sometimes slow is what you think, but your expectations are what is in error. Here it pays to be more precise: "about half the speed I usually see in X other test".

Be careful of generalizing too soon. Many of us engineers - developers, QA, support, etc. - thrive on detail. Skipping that detail and going to generalization simply makes you sound either lazy or panicky. So when you're using words that are imprecise:
  • random,
  • slow,
  • sometimes,
  • weird,
ask yourself if you can characterize it more specifically. You'll get a lot more help from the people you're working with that way.

Friday, August 13, 2010

Test Plans Are Scary

I've been in a situation at work recently in which we've had to provide a test plan to our customer. This isn't a big deal; we're happy to provide our test plan. Interestingly, we sent this to project management for review before it went out, and got a bunch of feedback that wording should be changed in various areas to make the test plan less scary to customers. Here's the thing:

The test plan is supposed to be scary.

Your test plan describes all the nefarious things that you will do to the software before those nefarious things can happen in the field. You're going to put it under heavy load to see where it cracks. You're going to subject it to repeated hardware failures. You're going to feed it bad parameters and missing parameters. Sure, you'll test the happy path, too.

Your test plan teaches your software and your developers just like training teaches a soldier. You certainly hope these things don't happen in the field, but you train for them just in case.

Make sure your test plan looks like this:



Don't let it look like this:


For better or for worse, a scary test plan is comforting to your customers. It tells them that you're prepared for and thinking about when things are going as expected, but also that you're prepared for when things don't go as expected. Life happens and sometimes it's scary; let your test plan prepare you for it.

Wednesday, August 11, 2010

Really an Error?

We have a problem this morning at the office in the lab. We have DNS entries that are basically laid out like this:
hardware-type-1: 10.0.21.1XX
hardware-type-2: 10.0.21.2[01-75]
hardware-type-3: 10.0.21.2[76-254]

In general this works rather well; it sure makes it easier to figure out what type of machine you have, where it is in the lab, etc.

Unfortunately, we just installed our 76th machine of hardware-type-2. And we're out of IPs in our DNS scheme. Oops. Time to change our DNS scheme. It's not really a big deal, and it's not really an error. After all, when we set up this scheme, we had a total of 5 machines of hardware-type-2 and didn't think we'd ever get more than about 50.

All we have here, as my colleague John calls it, is a case where "good planning meets contact with the enemy".

Tuesday, August 10, 2010

When to Interfere: A Decision Tree

Sometimes when you're working with someone, you see a project start to go a bit wrong. Maybe it's not proceeding as quickly as it should, or maybe it's someone trying something new. As a manager, how do you know when to step in?

This is the heuristic I use:


It's pretty simple: if they're going to be publicly embarrassed, if they're never going to finish, or if it can't be fixed, speak up. Otherwise, keep your mouth shut and let the person doing the project learn.

Monday, August 9, 2010

Positive Change

Sometimes good things happen, and how we deal with the good changes is just as important as how we deal with the negative changes.

Case in point: I just promoted someone on my team to QA Lead. (Congratulations, John!) He's been around for a few years as a QA Engineer, and when the slot opened up, he stepped up and asked for it. He's going to do really well at it: all in all, this is a positive change.

It's still a change. And we still have to be a bit careful about it.

There are a few things we need to look out for here:
  • Making the transition (aka getting through the change itself)
  • Broadcasting the change
  • Providing autonomy
  • Providing support
So what did I do? In this case, I left. I went away for three days on vacation and didn't answer any emails or phone calls that the new QA Lead should be able to handle. Instead, I just forwarded everything his way. We made the transition by refusing to allow the old ways to work; he did everything the QA Lead should be doing, and I did none of it. Because he was publicly responding to requests, we broadcasted that this was his area now. And by simply forwarding things without telling him what to do, he got autonomy in deciding how to respond. Privately, any questions he sent me, I answered, thus giving him a safety net in deciding how to respond.

In a nutshell, we need to make the change actually happen. In order for the new way to take effect, the old way needs to be put behind us. So we took away the old way (me) and left only the new way (the new QA Lead)... publicly. Privately, we established an open line between the old way and the new way, so that the new way had all the information he needed.

When we make a change, even a positive one, we need to address the change directly to make sure it goes smoothly. Let's keep the positive, well, positive.

Thursday, August 5, 2010

Statistics As Oracles

We use oracles when we test - things to which we can compare the behavior of our system. Usually an oracle is something we consider correct (just like the one at Delphi). Sometimes, the challenge of a test is identifying an oracle.

And sometimes there is no oracle that is right all the time. Sometimes the best oracle we have can only tell us if something is probabilistically correct (i.e., whether it's "plausible").

For example:
Let's say we're testing a system and that system distributes data among machines according to the first letter in that data. Let's further say that the data is all letters and there's a lot of it - billions of entries.

-------------------------------------------------
| 1: A->F | 2: G->N | 3: O->S | 4: T->Z |
-------------------------------------------------

We ran the data through and want to know if it got distributed correctly.

How do we do it?

There are a lot of ideas we can try:
  • grep in each of the buckets! Here's a hint: at any volume, you're just going to hang up your computer - this one doesn't really work.
  • run a sample data set through. This one works for small scale tests but won't tell you if you've got a problem in the field or a subtle bug at volume. This is a great first test, but not an only test.
  • statistics. This one is powerful, although not deterministic.
Let's talk about creating an oracle from statistics. If, for example, each of our entries is an English word. We can look up the frequency of the first letters of English words. That can be an oracle. Your letter counts should be about equal to the overall frequency.

Because this is a statistics-based, or probabilistic, oracle, it won't be perfectly correct. You may be off by a half percent or a percent. The point is that over time and over enough entries, you should expect your statistics to about match your oracle. And thus you have a probabilistic oracle.

Bugs

I was flying home from CAST 2010 yesterday, and I found a bug. (Guess those testing conferences wear off on you!) I was on a two-part flight (Grand Rapids -> Baltimore -> Boston), and had upgraded to an exit row seat on one of them, at a cost of $20. The second flight was a standard seat.

This is what the terminal showed me when it came time to pay for the upgrade:




For the record, I was only charged $20. Go Airtran kiosk!