Tuesday, December 27, 2011


I work for startups, mostly on web-based software and services (SaaS). Tools are everywhere, and we need to get a lot done. Fast. Our tools have to work for us, not against us. The right tool makes life a lot easier and gets features out the door faster. The wrong tool slows the whole team down, wastes money, and wrecks morale.

So what do I use?

Before I talk about specific tools, let me lay a bit of groundwork. When I'm working for a client, I frequently use their toolchain. When I'm running engineering, we frequently use the toolchain I describe below. Most of the projects I work on are web or mobile web applications, or software APIs (think SaaS).  We're almost always a distributed team, and we usually use some variation of agile methodologies (SCRUM-like or kanban-like). The important part to take away is that we're generally working on multiple things at once, we try to keep those things small, and we push new features or changes into production when they're ready, without waiting for a defined release cycle.

Feature Tracking:
Product: PivotalTracker
Cost/Level: $18/month will hold until your dev team passes 5 people, then $50/month
Summary: I'm not in love with this one. It works pretty well, but there are a few features I wish it had, like: (1) distinguishing between done and deployed; and (2) a UI that doesn't feel so darn squished.
Other candidates worth considering:
  • Jira: Too heavy and too pricey. Annoying to configure, and really encourages excessive time spent on tracking ("ooh! pretty graphics!"). It's useful if you have nervous and overbearing management involved.
  • Lighthouse: Good for very small projects with savvy business types. Good email and source code integration and very simple, but little to no workflow or enforcement of procedures.
  • Trajectory: I've heard good things, but not used it. Somewhat niche.
Bug Tracking:
Just use the same thing you're using for feature tracking. Don't overthink it.

Source Code Management
Product: GitHub
Cost/Level: $12/month will hold until your dev team passes 5 people, then $22/month
Summary: It's hosted. It's stable. It's Git, which lets the dev team do amazing things.
Other candidates worth considering: Just use GitHub

I can't make a single recommendation for hosting, since it depends so much on your product and your team. There are a couple of scenarios that might make sense:
  • If your goal is cheap, you're on Rails, and you don't have a sysadmin type on staff, use Heroku.
  • If your goal is reliability and you can do your own administration, use Amazon AWS and put your database on an RDS instance.
Product: Heroku
Cost/Level: varies based on usage, but the add-ons add up
Summary: This is cheap and it's the easiest scaling I've ever done. It's also well configured, which  makes a big difference. Big downsides: frequent downtimes (especially if you use the shared database), and it will only backup once an hour, so data loss is a possibility. Also, use the kumade gem if you're using heroku. The deployment model of heroku alone is pretty bad; kumade helps.

Product: Amazon AWS + RDS
Cost/Level: varies based on usage
Summary: This lets you start small and will scale with you. It's also very reliable (and can be made more reliable if you have more money to spend). Interaction with Amazon is a solved problem, so scaling and deployment are simple problems. Big downsides: you have to do your own configuration, which means you need some sysadmin skills to set it up.

Development Environment
This one varies based on your product and personal experience. I mostly work on web apps, and my preferences look like this: Ruby on Rails, TextMate (vim is fine, too), cucumber, rspec. As long as there ARE build and testing tools in place, let the dev team pick what they like.

Build and Deployment
This is another one based on product and personal preferences. Just make sure there IS one. Oh, and the deployment should be something that's run automatically from the build system. The answer here is a tool, not a manual process.

That's my stack. If I've forgotten an element, let me know and I'll be happy to talk about it. If you disagree or recommend a different tool, let me know. I'm always curious to see if there are better tools for my toolkit!

Always And Never Usually Aren't Either

One of the quirks of coming to software development through test and through business is the healthy fear it provides of absolutes. Things that should "never" be true or are "always" bad turn out to be true on rare occasions or sometimes beneficial. This holds in the software we write. For example, you should never  use the singleton pattern because it adds a hidden global dependency.... But a logger should can be a singleton because you really want to unify logging into one place that's configured by multiple things. This also holds for practices and procedures. For example, we should never use a test script.... But perhaps there is a situation where that would be useful, like in an automated smoke test where the script is in the form of code. There are no best practices at all ever... But I have yet to find anyone who thinks that keeping code in source control is a bad idea (maybe its a best practice!). Regardless of the topic within software, any time you hear an absolute - always or never - be suspicious. When someone talks in absolutes, there's usually a reason they're speaking with such exaggeration. Sometimes they're simply naive. Sometimes they're insecure and trying to hide it. Sometimes they're just seeking attention through dogmatic pronouncements. No matter the cause, be smarter than that. Notice the absolute statements, recognize that they say more about the speaker than about the statement, and move on. Things that seem absolute probably aren't.

Thursday, December 22, 2011

Describe the Unusual

Documentation is a part of my job, like many engineers. I write documentation for end users, training material, documentation for other developers (API docs, anyone?), bug reports... all sorts of things. And like many documentation creators, I can talk for a long time about some of these things.

Saying too much is a very good way to frustrate your audience. They don't want to know everything; they can't absorb it all and they lose the important parts. So follow the cardinal rule of documentation:

Describe the unusual.

Let's assume for the moment that documentation is about describing interaction. It attempts to explain to the reader how to successfully work with the system - whether that interaction is through an API, a GUI, or whatever - and how to understand what the system is doing (i.e., error states, message meanings, etc).

Describing interactions is a huge thing, which is why it's so easy to write a whole lot of documentation. But that's the wrong thing to do. The truth is, most of the time you can assume some basic knowledge, and just describe what's different.

For example, I'm working right now on a logging module for a system. The system right now does ad hoc logging; each component logs in its own way. My logging module will provide centralized logging so we get consistency in log level, storage location, format, etc. When I'm documenting this, what can I assume, and what should I write about?

My audience is other developers who work for this company. They pretty much understand the system, the language, and general development concepts. Great!

So I describe the things that are unusual:

  • instantiating this (custom) logger I wrote
  • requirements around uniqueness and sharing of loggers (class instances)
  • whether it is thread safe (yes) and process safe (no)
  • deployment requirements and the configuration format, for the IT guy
  • log size and rotation rules, which I got from the IT guy
I don't describe the things that developers generally expect from logging:
  • the existence of log rotation and aging
  • when to log at various levels (DEBUG vs INFO, etc)
  • the format of the log entry
I can skip what people already know, which makes the documentation much shorter and makes the important parts more readily apparent. It's easier for the consumers to read and much more likely to be actually useful.

So when you have to document something, whether it's a multi-day tutorial or a simple comment describing how to use a method, focus on the things that make it unique. Describe the unusual.

Tuesday, December 20, 2011

Breathing Room

One of the fun parts about the end of the year is how many big initiatives tend to pile up. The culmination of a big project - whether that's a new website, a major refactoring, a release, or something else -  often results in a release. In many ways this makes sense. It is, after all, the end of the year, which is arbitrary but a really common breakpoint and time to assess accomplishments. It's also a relatively quiet time, with many people on vacations, so usage tends to be relatively light (except if you're retail!).


(Ha! I always say "but".)

There's a catch. Big projects are fine, and putting them out at the end of the year makes sense in many settings. The risk is in trying to put out too many too quickly. Any big project comes with risk. When you do many of them at once, you compound your risk, and make it harder to figure out what broke when you're troubleshooting.

If at all possible, give your big projects some breathing room. Push them out and then go work on something small for a brief period. It doesn't have to be long - a few hours or so - but it'll make life a lot easier. Use the breathing period to:

  • get a new baseline of system behavior and identify behaviors and patterns caused by your large change
  • reflect on what you've done and tie up any loose ends
  • review the next big change in light of the new system behavior, configuration, etc.; does anything need to change?
Breathing room is good for humans and good for systems. Take it, if you can.

Thursday, December 15, 2011

Dependency and Redundancy

My software is, of course, perfect. (HA! but go with it). The trouble is that the software I write uses other software; I have external dependencies. For example, Textaurant uses Heroku for hosting, Twilio for text messaging, New Relic for monitoring, and Airbrake for error aggregation.

There are a several good reasons to have external dependencies like these, including:
  • Development speed. Using providers for pieces of the solution lets us focus on the core value we provide rather than the common.
  • Better reliability. People who specialize in monitoring, for example (like New Relic), are going to be a lot better at monitoring than someone who doesn't think about this all the time.
But.... there is a big downside.

When your external provider goes away, you're in big trouble. If Heroku goes down, my app is unavailable. If Twilio goes away, I won't be sending any text messages. It doesn't matter that the outage is on the service provider end - to my customers, it's just the application they use not doing what it's supposed to. And that's my problem.

So we have dependencies, which are really useful and also introduce risk. What can we do about it?


Let's take a simple example. My hosting provider had an outage on December 3rd that took down our application. What could have prevented us from being unavailable, even as Heroku was having a No Good Very Bad Day? We could:
  1. Add a second hosting provider, writing into a shared database (or db cluster). Properly load balanced, the hosting provider who was not affected could simply have taken over all hosting duties.
  2. Fail over to a secondary hosting provider as soon as we realize we're down, again using a shared database or database cluster.
  3. Use local data storage in the browser to allow users to keep working. It wouldn't provide full functionality, but it would have given us 85%+ of our features, which is a lot better than simply being down.
There are two common themes running through these options: redundancy and cost. We can increase redundancy.... as long as we're willing to pay for it. How far you go toward ensuring redundancy is tempered by how much time and money you're willing to spend. In the end, it's up to you and to your particular needs. Just consider your dependencies... before they go down and make you consider them in a panic!

Tuesday, December 13, 2011

Selective Vacation

It's never a good time to take time off. I'm pretty terrible at vacations in general. And yet, faithful readers will notice that I haven't written here in a week, which is a long time for me - that's practically a vacation!

One of the consequences of being a consultant and having several clients it that someone's almost always on a deadline. If client A just got through a big push (and is now taking a collective few days to decompress), that's frequently when client B is just facing down a big deadline. It makes scheduling vacations... tricky.

So I take selective vacations.

A selective vacation is a vacation... from part of my life. This past week it was a vacation from my blog. Sometimes it's a vacation from a client, or not starting a contract immediately. Sometimes it's a vacation from writing articles. Sometimes it's a vacation from cooking dinner! And sometimes it's a vacation from being home - I spent last week in California visiting family and working.

There are a lot of benefits to a vacation: a change in routine can spark new ideas; getting away can stave off any potential burnout; health benefits accrue from lowering stress on vacation; fun vacation photos or stories spark conversations with your coworkers. At least for me, vacations also provide an appreciation of routine (I do like my routine!). I get a lot of these benefits from a selective vacation - it sparks creativity, and gets me out of a routine temporarily. I also find it very relaxing - not only do I not have all the normal pressures and deadlines, but I'm doing enough work that I don't get stressed over missing important client calls or deadlines.

The point is that if you're like me and find it hard to stop everything and take a vacation, then don't. Take a selective vacation. It offers a lot of the same benefits of a real vacation, but it's a lot more doable.

Tuesday, December 6, 2011

Your Software's Personality

Software is a very cold and abstract thing, really. It's nothing we can touch or interact with. So we tend to anthropomorphize it. We think some software is cute, and other software is a workhorse, and still other software tries hard.

For example, Twitter is cute. It has fun little error messages. It features pastels, lots of colors, a good amount of white space, and cheerful graphics.

At the other end of the spectrum, Photoshop is a workhorse program. It has eight bazillion buttons and menus (I know, I counted), and it feels really powerful if you can just get it to do what you want. Of course, getting a program that big to do what you want is something like knowing how to use a workhorse - it takes practice!

As software engineers, designers, product managers, etc., we can help shape our software's personality. We can encourage users to think our software is fun, or simple, or dense and informative - all by the decisions we make while implementing it.

The first step is to decide what kind of personality we want. Are we building a consumer game that should be fun and irreverent? Are we building statistical analysis software where we want it to be prescient and nonintrusive? Are we building software for highly trained users, where we want it to be very consistent and provide hints without getting in the user's way?

Once we understand what our personality is, then we can find ways to express that personality through software. Consider:

  • The tone of the text
  • Graphics and colors
  • Layout - how dense?
  • The presence (and intrusiveness) of help and guidance
  • the workflows - wizards or menus? how short or long?
Your users will give your software a personality - it's up to you to make it the personality you want.

Wednesday, November 30, 2011

On Tone

I was reading a blog post the other day that included a number of very similar comments. The blog post itself wasn't hugely important (feel free to read it here). The important part was that the blog post author was showing an implementation of something, and that implementation could have been simplified by using a different API call. Several of the comments pointed out the existence of this other API call.

This is all pretty simple, and quite common, but let's look at HOW they pointed out that the author's implementation could have been improved. Here are three comments that all say the same thing:



They take quite different tones, even though they're saying the same thing. Comment 1 is very short and neutral in tone. Comment 2 is quite aggressive, even belittling (or teasing, depending on how thick your skin is) the author. Comment 3 is much more gentle, posing feedback in the form of a question, even though the question almost certainly presumes an answer of "no advantage. I should use DictReader.". Question 3 is also the only one that provides a link to the referenced API call.

None of these comments is inherently better or worse than the other. Using the tone in comment 2 has the risk of making others think you're kind of a jerk. Using the brevity of comment 1 probably works best when it's safe to assume some level of knowledge (e.g., that the user can go find the docs for csv.DictReader). Comment 3 is the least likely to offend but the most likely to make the speaker look tentative or soft. The point is more that  you can express the same information many different ways. Take into account your relationship with the recipient of the comment and the type of comment, and use that to find an appropriate tone.

Monday, November 28, 2011

Cute On Occasion

There is a trend in consumer software - and to some extent in business software - to be cute. That can be a cute logo, or fun field naming in forms, or humorous error pages. It's all in good fun, and can frequently help personalize software. After all, software is kind of a remote, sometimes dull thing. Why not have some fun with it!?

I'm all for fun with it. I'm even all for fun with errors or error pages (see the Twitter fail whale for example).


Be careful not to take cute too far. Cute is only fun when it's occasional. When it's frequent, cute just becomes frustrating.

So when you're going to do something cute and fun, that's great. Before you go for it, though, ask yourself: "how often will my users see this?". If the answer is "not very often", then go for it! If the answer is, "kind of a lot" or "every day", then don't be cute.

After all, it's all fun, until it isn't. Keep it fun.

Monday, November 21, 2011

Watching the Logs Go By

I was sitting at a client site the other day, watching our production logs scroll by. And then the client boss came by:
"You're just sitting there!"
"I'm watching the logs go by."
"Yeah, just sitting there!"

Not exactly. I'm learning from the system in a normal state. Understanding what's normal is the first step to figuring out what's wrong... when there's something wrong. For example, knowing that the widget didn't frobble and that must mean the frobbler crashed... well, I can only know that if I know that the widget normally frobbles, and specifically if I know that's in the log. If I didn't know it was usually there, I wouldn't notice its absence. To take another example, if I'm looking at normal logs and noticing that third party API calls are taking about 3-4 seconds, then there won't be any errors in the logs, just the usual timestamps and info messages. However, that might be a problem - maybe those API calls should be taking 1-2 seconds - even though the system is behaving "normally".

Take some time to watch the system as it behaves normally. Only by understanding what normally happens can you then figure out what is abnormal in a problem scenario.

Friday, November 18, 2011

Jammit Lessons

I just put Jammit into production in one of my Rails applications.  We had... well, kind of a lot.. of assets, mostly JS and CSS, and it was getting both hard to work with and rather taxing on my servers. When one page load takes 30 requests, then it's time to get some asset management in place. We're still on Rails 3.0 (a couple of gems not available for 3.1 yet), so I went with Jammit for now.

Overall, the move was easier than I expected. However, there are a few things that tripped me up. That's what today's blog post is about: the lessons I learned implementing Jammit.

Lesson 1: LOVE
First of all, I would like to complement the Jammit guys. It's supremely easy to use, and it just plain works, once you figure out what you're looking for.

Lesson 2: Use external URLs for compiling
Many of my requests are for images, almost all of them under 10K. I embedded these images in my css files. Jammit makes this easy; you just set embed_assets to on, then make sure your images are in your CSS (i.e., background images instead of image tags in the source). Then you just type:
jammit --base-url "http://my-server"

I got tripped up, though: the base-url must be an external url or IP. If you specify localhost, the images don't embed. Solution: use an external IP.

Lesson 3: Embedded images look like requests
When I did get the images embedded, they still looked like they might be requests in my browser. Here's a screenshot (note: this is a work in progress).

Note that all of the "data:image/png" lines look like requests. They're not. These images are successfully embedded. I was able to confirm this using a network packet tracer. Solution: don't panic!

Lesson 4: Don't use what you don't need
This isn't related to Jammit, but as I was doing this exercise, I noticed some assets we were importing but not using. Solution: take the opportunity to clean up a little bit!

Jammit isn't the be all end all; tragically, it has not yet cured world hunger. But if you're on Rails 3.0, it's a great tool, and my hat goes off to those who built and maintained it. Quick, easy, and with only a few quirks - thanks, guys!

Wednesday, November 16, 2011

The Power of Plain Text

Like it or not, email is still a common way to communicate. I get all sorts of emails, from newsletters to personal mail to diatribes (those are fun) to emails from clients or people I'm working with. Because I work with software, a decent amount of that email contains code of some sort: a method; a stack trace; an error message; a soap or json object serialized and printed. So what do I do with it? Frequently, I copy it. For example:

  • copy an id into a command prompt so I can find it in a log
  • copy a method into a class so I can try to run it
  • copy a serialized object to a buffer so I can diff it with something else

Now, we can argue about whether email is the most appropriate form for all of this, but that's a bit academic; I don't really control what other people do or send. Rather, let's look at one useful point:

This is all a lot easier if the email is plain text.

With plain text email, I don't get weird formatting issues. I don't accidentally grep for HTML. I copy-paste instead of copy-paste-delete. I can get you whatever information you need a whole lot faster.

So go ahead and send whatever you want in email. Just send it in plain text, please.

Monday, November 14, 2011

Your Test Code Should be Defensive

One of the things you learn when you first start to write code that will be used by others or with other code is this: "write defensive code." It's shorthand for not trusting inputs or external dependencies, but checking them before you try to use them, and it involves things like validating inputs, calling external services asynchronously, etc. Well, for all you test automation engineers out there:

Test code should be very defensive.

After all, we're running test code because we want to make sure that an external system (the system under test) does what we expect. If we knew beyond the shadow of a doubt that it worked, then we wouldn't bother running the test. So, we're only running the test because we don't know with 100% confidence that it will work. And that's when we have to be defensive in our coding. We're already looking for things "that should never happen", so let's write our test programs so they can handle them.

Make sure you at least consider whether you  might need:

  • Asynchrony and/or timeouts. Does the system not returning mean your test will hang?
  • Null checks. "That should never be null" is a lot easier to trace if you check when you first see it rather than waiting for it to blow up somewhere later in your code.
  • Try/catches. "The program should never throw an exception", key word "should"
  • Results inspection. Just because the program returned something and didn't error doesn't mean that return matches what you thought.
Basically, your goal is to either produce a "pass" or a legible failure. Keep in mind that automated tests ted to (1) have a long life; and (2) be run mostly by people or systems other than the ones who wrote them. Just because you the test author knows what it means, that doesn't mean it'll be obvious to the QA engineer evaluating the output of the results a year from now (even if that QA engineer IS the test author!).

Keep your tests defensive, keep your error messages useful, and let your tests have a good long life.

Wednesday, November 9, 2011

Trusting the Code

I'm working on a project now that involves writing some automated system tests. These are pretty simple, in the grand scheme of things - functional smoke tests and small load tests. I write them, run them against QA until they pass (fixing the test and/or the system along the way), and then hand them off to the QA team for future use. Pretty straightforward.

And then.... building trust begins.

You see, after the tests have been handed them off to QA, they don't always work every time. Sometimes they fail because a new build broke the system. Sometimes they fail because we're pointing it at a different environment that is not configured correctly. And occasionally they fail because there's a bug in the test.

Regardless of why the test fails, when it fails, the first question is: "Is there a problem, or is the test just wrong?" This is where trusting the test enters the picture. It takes time and repeated usage to build trust in any code base - whether that's a system or a test.

So when something happens that's unexpected or undesirable - a test failure, a system response - don't be offended if the first thing you, the author, hears is, "Is it your code?". That's just trust that's not quite there yet. Figure out what's going on, and over time the test (or the system) will get better, and trust will come. Eventually, you'll stop getting asked, "Is it your code?" and start getting asked, "What went wrong?" - that's trust.

Monday, November 7, 2011

If You Had Nothing To Do Today...

If you had nothing to do today.... what would you do? If all of your assigned tasks magically disappeared,  or if you couldn't get to your source tree/test system/primary work environment.... what would you do?

Give yourself 30 seconds to think about it.

This is your subconscious talking to you. If you can't come up with anything, then that's a sign that you're either in the most focused job ever, or you're bored and kind of disengaged. After all, there's the stuff you're supposed to do, sure. Those are external requirements and needs - things that "management" has decided is important.

There are also your internal requirements and needs and ideas. These are the things that don't make it onto the backlog, or that you haven't told anyone about because they're not really bothering anybody else. Those are the things you would do if you didn't have any external requirement or needs. And if there's nothing - nothing at all - that you want to do better, no itch you want to scratch, no idea you want to try out, then you are in a big rut.

The external requirements are the world talking to you about this thing you're doing. The internal requirements are the whispers that say, "I care about this thing I'm doing." They're the things that turn you from hands on the keyboard to an engaged team member. And an engaged team member is almost always a happier team member.

So get engaged. And if you're not, then if you had nothing to do today..... you should find something interesting, either where you are now or on another project.

Wednesday, November 2, 2011


Last week I went to several events, including a testing conference, a networking/demo event, and accelerator event, and a symphony performance. One thing I heard at each of these was the concept of "professional".

At the symphony: "She [the soloist] looks like a cake topper! So unprofessional!"
At the testing conference: "Well, I really think that the onus is on us to be professional, even in ethical grey areas."
At the networking event: "So, are you a professional entrepreneur, or is this a side thing?"

There is a whole lot of drama wrapped up in the term "professional." The word simply means "paid to do something", but it has a lot of underlying meaning that basically amounts to "conforms to my expectations of what a good example of this____ would do, say or act". To call someone professional or unprofessional is to judge them.

So before you go using this "professional" shorthand, ask yourself what you're really trying to say. There are more precise words out there. Use them.

Tuesday, November 1, 2011

Agent of Disruption

It sucks, but it happens sometimes: your client loses confidence in you. Often this is the result of an event - a production downtime, or a big bug, or an embarrassing demo, or a completely blown deadline. Sometimes, it's a compendium of little things. Regardless of how you got here, you've got a client who no longer trusts you.

You won't want to lose the customer, so what do you do? How do you win back their confidence?

Slowly. By doing trustworthy things. By not making the same mistake.

And how do you buy the time you need to restore confidence?

Enter the Agent of Disruption.

This can be someone external or internal. Most of the time it's a person, but it can also be a tool if the situation warrants it. In the end, the client has lost confidence not only in the particular situation but in the environment an the team that produced that situation. Therefore, to regain confidence, the environment and/or the team needs to change. The introduction of a disruptive agent shows recognition of that need for a systemic change, and is the first step toward restoring confidence. It is a non-verbal indicator to the client that you really understand their concern and are in alignment with the breadth of change needed. Actual change takes time, and it takes even more time to see he effects of change - which is what rebuilds confidence. A disruptive agent buys you time to make that change.

Of course, you have to let the disruptive agent actually disrupt. Simply introducing the agent doesn't solve the underlying issue; it just buys you time to do so.

If you have a client who has lost confidence, then you need only three steps:
1. Bring in a disruptive agent to show commitment to fixing he problem and restoring confidence
2. Actually fix the problem
3. Be patient. Confidence is business speak for trust, and trust is far harder to earn the second time around

None of us likes to be in this situation, but if it happens, we can earn back our client's confidence. We just need to show we want to do so, and then do so.

Friday, October 28, 2011

Geek Captcha

I got this captcha on a site the other day.

My little geeky heart went pitter patter. Too bad I couldn't find the pi key.

Wednesday, October 26, 2011

Numbers Without Comparators

I just took a quick online quiz, and I got 7 out of 10. Go me! I'm awesome!

Am I?

If the average score is 5 out of 10, then yeah, I'm pretty awesome.
If the average score is 9 out of 10, well, I'm maybe not so awesome. (At least in this one way!)

I have a number - in this case a score: 7
I even have a scale: out of 10

But without knowing about how other people did, I don't know very much about how I did. Let's generalize:

Numbers, even on a scale, are generally pretty meaningless without comparators.

If you find yourself looking at a number - whether it's a score, a latency figure, a max users figure, or whatever - ask yourself how this number compares with similar numbers. Only then will you really know what you're looking at.

Monday, October 24, 2011

Remote README and debugging

I've been doing a fair amount of work lately with people who are remote. We're not in the same city or even in the same time zone, but we're working on the same project and using each other's code. It's not always easy, but we're making progress.

In order to do this successfully, we have two simple rules:

  1. Write a README
  2. Show me what you see
Let's talk about a README for a minute. I know that when I'm done writing some piece of code, it's intuitive and obvious how to use it.... to me. That doesn't mean it's obvious to anyone else on the planet. So I write a README that tells people how to use this thing I've created. The more background I share with someone, the shorter the README can be. If I'm adding a small feature to a Rails application and my colleague is a Rails engineer who also knows the application, I can probably have a README that says, "I added a spec to show it - check out the commit." He'll know to go look at my latest commit, where to do that, how to understand the rspec test I wrote, etc. With less shared background, the README gets more detailed. For example, I wrote a python script recently and handed it off to an engineer to run. The catch, the engineer doesn't know python. So my README was very detailed, including how to set up Python, how to check out the project, how to run the exact command line he needed, and an example of a successful result. Why? There's not very much shared background, so I can't assume he would do the same things I did, have the same setup I do, or read output the same way. This is no denigration of the engineer I was working with; he simply has a different background. (You should have seen the README he sent me for the .NET utility he wrote - I had none of that background!)

All in all, it's pretty simple. Write a README telling someone what you did. The less they know about how you think and how you do things, the more detailed the README should be.

There's some bad news, though. No matter how good your README, someone's going to have trouble following it one day. This is where the second rule comes in. Always show me what you see. It's a classic user call for help: "I tried it and I got an error!". We engineers do it, too, and it really doesn't help solve the problem.

So if we run into a problem and need help, we have to show each other what we're seeing. On a command line, that means we run some basic helpful things and send the console output to the user. Let's take that python script (which didn't go right for the user the first time despite the great README). In order to help me figure out why it was working for me and not for him, he sent me a basic console output that showed this (names changed to protect the innocent):
$ pwd
$ ls
$ python --version
Python 2.7.2
$ git status
# On branch master
nothing to commit (working directory clean)
$ git pull
Already up-to-date.
$ command -to -run the script


This way I have a lot of information. I can already tell:
  • what version of python he's running
  • the exact error he got
  • where his source tree is located and if there's anything funny about it
  • that he's on mac, not Windows
  • that he doesn't have any interfering local changes
  • how he ran the script, including arguments and options
When we're remote from each other, communication is somewhat high latency; there are minutes between communication at a minimum, and sometimes more. We don't want to go back and forth many times; fewer is faster. So, he shows me what he sees and that makes it much more likely I'll figure out what's different in fewer tries. And then we can both get back to work faster!

Wednesday, October 19, 2011


For the first time in a long time I'm working on testing a system in which I can't get at the logs. All I can see is what the end user sees - what data went in, and what the results output are.

It's immensely frustrating.

It's also hugely educational.

I hate being blind to what's going on under the covers because it makes me very dependent on the developers who can get at the logs. I'm unable to be as precise as I would like to be in reporting results, simply because I can't distinguish different behaviors. For example, if I get a response from the API that contains no results, then I don't know what happened. It could be any one of the following:

  • if I fed it data that shouldn't have gotten results
  • I called the API incorrectly
  • it wasn't done yet, and I just needed to be a bit more patient
  • an error or bug occurred
I have no way to tell whether there's a problem and what kind of problem it might be, which means I'm running to the developer for every test that doesn't return exactly what  I expect. Codependence is not normally my style!

On the other hand....

What I see is what external consumers of the API see. The customers using the product don't get to look at logs, either. So if there's not enough information to figure out what's going on, then customers may have the same problems that I'm having. I've learned a lot about the usability of the API, which it turns out was maybe not so good. And we've been fixing it, making it more transparent what's going on - whether it's bad calls, or data that doesn't end up with results, or a bug in the underlying system.

So even if you're not blind, like I am, spend some time pretending you are. Ignore your logs and your code traces. Being blind is sometimes as illuminating as seeing.

Monday, October 17, 2011

Hire Slowly. Fire Fast.

There's an adage that they teach you in manager school:

Hire slowly. Fire fast.

This is a polite way of saying that a bad hire costs a whole lot of time and potentially money, so be careful about bringing them on and don't hesitate to ditch someone who "isn't working out" (translation: "He WHAT?! AGAIN!?" followed by optional weeping).

On the surface of it, this is sage advice and I completely agree with it. In practice, though, it's darn hard.

Looking at the hiring side, the safest hire is no hire at all. Obviously, that's not going to work; you need to hire someone because the current team can't do all the work all the time. You need a skillset, or at least another pair of clueful hands.

Looking at the firing side, getting someone to leave is emotionally wrenching for everyone involved: the employee, the manager, the rest of the team. Occasionally things turn truly nasty, and threats of litigation erupt. If you do it too often, too, you'll give your company a reputation for being a bad place to work. Firing is really not something you want to do.

So what's a manager to do?

There's no one answer. There are, however, partial answers:

  1. listen to the "hire slowly" part. It's better to be understaffed than to have to fire too often.
  2. Use contracting. When a contractor leaves, that's just the end of a contract; it's not firing. It's much less wrenching on the team and doesn't lead 
  3. Contract-to-hire is a valid way to go. It's harder to hire, but it gives everyone an out if things don't work (that means the potential employee can leave, too, so make sure you have a good work environment so he'll want to stay!)
It's best to hire a great person who will complement and enhance the team, but it's impossible to really know that until you've gotten into it. In the meantime, be careful, and give yourself and the candidate as many chances as possible to decide this isn't working.... until that happy day when you both decide it is!

Friday, October 14, 2011

heroku gem readline error

This is a quick one for all the Ruby types out there. I installed the heroku gem onto a fresh Amazon Linux box today, and then this happened:

[ec2-user@newbox ~]$ heroku
:29:in `require': no such file to load -- readline (LoadError)
from :29:in `require'
from /usr/local/lib/ruby/gems/1.9.1/gems/heroku-2.9.0/lib/heroku/command/run.rb:1:in `'
from :29:in `require'
from :29:in `require'
from /usr/local/lib/ruby/gems/1.9.1/gems/heroku-2.9.0/lib/heroku/command.rb:17:in `block in load'
from /usr/local/lib/ruby/gems/1.9.1/gems/heroku-2.9.0/lib/heroku/command.rb:16:in `each'
from /usr/local/lib/ruby/gems/1.9.1/gems/heroku-2.9.0/lib/heroku/command.rb:16:in `load'
from /usr/local/lib/ruby/gems/1.9.1/gems/heroku-2.9.0/bin/heroku:13:in `'
from /usr/local/bin/heroku:19:in `load'
from /usr/local/bin/heroku:19:in `


The gem install missed a dependency. This fixes it:

gem install rb-readline

That was easy, but took a good 10 minutes to figure out. Here's hoping you don't stumble over the same thing!

Thursday, October 13, 2011

Following Convention

I was discussing a problem with an engineer I work with recently, and we basically came down to two ways we could solve the problem. They were basically equal; each had benefits and drawbacks, but neither was obviously a better choice.

There is, however, one big difference: one of the ideas followed standard conventions. The other violated them.

Let's talk about conventions for a second. Conventions are the well-worn paths in software. They're the habits that engineers pick up. Creating a branch per major feature and merging into master (thus keeping master pretty stable) is a convention of a lot of developers using Git. Giving a gag (read: really ugly) desk decoration to the developer who most recently broke the build is a convention followed by many engineering teams.

Conventions are a feedback loop. Tools help create conventions, and then conventions dictate what the tools do and how they work. Over time this makes it a lot easier to follow conventions. The tools support it better (fewer workarounds needed!). New developers joining the team can get up to speed more quickly.

So if our choices are equal in every other way, then choose the one that follows convention.

"It's easier" really is a good reason.

Tuesday, October 11, 2011

The Superlative

I've been interviewing devs and testers frequently lately, and I noticed I was doing something that in retrospect seemed kind of odd:

I almost always asked about superlatives.

"What's your favorite bug?"
"Tell me the thing you most disliked about the test framework you built."
"Describe the coolest project you ever led, and who you worked with on it."

Most, biggest, best, worst, least. All of it is extremes.

Extremes are memorable, but most of life is NOT an extreme. Most of life is somewhere in the middle. You're only going to have one best and one worst at any point in time. You're going to have lots of in between.

So why don't I ask about that? After all, most of the time working with this person will be spent on the stuff in the middle.

Let's talk about the best and the worst, but let's also talk about the things that happen the most.... all the stuff in between. There's no adrenaline rush, there's no despair, there's no exultation and cheering. I want to know you and I can work effectively together when all those extremes are absent. What do we do on an average Thursday? Now THAT's interesting.

Monday, October 3, 2011

Done Right and Done Fast

As a happy homeowner, I occasionally get to do fun things like hire contractors to come in and make the place look better. (Note: Sarcasm present. This is not actually fun. It's both expensive and truly frustrating.) Our last project just finished last Friday, and involved replacing damaged sheetrock and flooring from a water leak in the upstairs, followed by painting, restaining the floors, and hanging a new light fixture. Overall we worked with a good group of people, and I'm pleased with the end result.

But so much of it was like a software project.

We basically did this:

  • demolition (okay, this part was kind of fun)
  • testing for water damage
  • new sheetrock
  • paint
  • floor sanding
  • floor dying/sanding/staining
  • new light fixture
  • touch ups
Fairly straightforward and almost totally linear. If this were a software project, it would be the easiest thing to plan, ever. But then life intervenes.

For each step, we'd get a call from the project manager: "I can have a guy there tomorrow, but my best guy's on another job tomorrow and he can't come until the day after tomorrow." Translation: Do you want it done fast? Or done right?

Or I'd be talking to a worker and he'd say, "I can try the patch with this [weird-looking thing that's what would happen if a drill and a blow dryer had a baby], and it'll be okay but it might warp a little, or I can just let it air dry and come back tomorrow." Translation: Do you want it done fast? Or done right?

(And no, I couldn't pick both. See? Just like software!)

Now the easy thing to say would be: "Do it right! What's another day?" That's easier to say when you're not living in a construction zone. It's also easy to say when you don't have guests coming.... real soon now!

Is this sounding like software yet? Compromises to meet a hard date, check!

So we chose. We can do demolition fast. Testing we didn't get to control. Sheetrock had to be right; that's hard to change. Painting wasn't hard, so we chose the guys who were available (and who did a perfectly acceptable job). The floor had to be right all the way - we walk on it all the time, and any variations in color or finish are both very expensive to fix and really noticeable.

In short, we did some fast, and we did some right. Just like software.

Keep in mind you don't have to make a single universal choice. You get to pick every time you have a choice. So pick frequently and pick what's right for that decision. You can't have fast and right, but you can control what's fast and what's right - and that will give you the closest to your ideal.

Wednesday, September 28, 2011

High Volume Morning

Some days I'm away from my computer for most of the day. Meetings, events, travel time, even the occasional vacation all conspire to keep me away from my computer and my inbox for a chunk of time. I'm also pretty religious about keeping my inbox small. These two things obviously do not go well together!

I get to work the next morning, or get to my desk after a morning of meetings, and I have a bunch of emails. Some just need to be read and filed, some need a lot of work, and some need quick responses or quick actions.

So I declare it a high volume morning. Here's how it works:

  • I scroll through my inbox and pick out something that looks easy and/or fast.
  • I try to do it.
  • One of two things happen:
    • I finish it (yay!) and dump the email out of my inbox
    • It's harder than I thought, and I make a task in OmniFocus for it and dump the email out of my inbox.
  • Repeat for about 3 hours, or until my inbox is empty again.
This is a little bit outside the norm of how I work, since I'm usually pretty good about putting together a task list and working my way down it, doing big hard things when I'm most productive (which for me happens to be mornings). But...

But I just plain feel better when my inbox is under control, and when people haven't had to wait for days for things that simply don't take very long to do. After all, it only takes about 15 minutes to check the receipt time for an order for a client - why should they have to wait just because it's lower priority than the large project? 15 minutes is a "slip it in between breaks" duration. Enough of these and I'm not making progress on the bigger projects I'm doing, but that's why I time box it.

Sometimes it's okay to break routine, and to get through all of the simple stuff.... after all, in the end it's about balancing the easy with the hard, the urgent with the "should happen at some point", and making it so your work environment - including your inbox - is a happy place.

Monday, September 26, 2011

The Closest Thing To

My work experience is almost entirely with smallish companies, up to about 250 people, tops. I've also had the great fortune to have very large companies as clients (10,000+ employees), with engineering teams topping 100 people for a single project.

One of the things I've learned is that when you have 100 people on a software project, you get to have specialists. You have people who are really great release engineers, or SCM management types, or build engineers, etc. On smaller teams, we don't get that.

Instead, we get the guy who is "the closest thing to" the job we need. For example, I'm no UI designer, but with one of my teams, I'm the closest thing to it. That means I get to cut the images for the website. Am I as good as someone who does this and only this? Nope, but I'm better than anyone else on the team. So I cut images, and I'm glad to do it.

It's one of the hardest things to learn your first time on a small team: you aren't perfect, but if you've got some background in the area, you may be the closest thing we've got. Congratulations, those tasks are now yours. It makes for an incredible learning opportunity, and can expose you to many different aspects of software engineering. So have fun, accept that you'll be doing things that aren't easy for you to do, and learn. Maybe you'll find a new natural talent!

Wednesday, September 21, 2011

Summary Hints

When we as an engineering team start implementing something new, we almost always start with a summary. This is often followed by discussion, explanation, UI designs, requirements specifications, stories, tasks, and a whole lot of other stuff. The summary looks something like this:

Create a performant dashboard that shows all the widgets and lets you frobble them.

Here is where you get your first hints about which parts of the system really matter, and which parts are niceties. If it's in the summary, it's important. It may be important because it's promised to a customer, or because they've been burned in the past by not specifying something, or because it's what makes your version different from a competitive product, or any one of a number of other reasons.

In our example summary, what's important?

  • Performant: This almost always just means "can't be slow" rather than implying "it's incredibly fast", but it also usually means that the person asking for the feature has been burned by something slow. Ask about this - performance will matter, and just saying, "it works on my fast connection with my pretty solid dev box" is courting rejection.
  • All: "All" rarely means "every single one all the time", but it does mean you'll be looking at showing a large chunk of data on the screen. This brings up fun UI and performance considerations. It usually also implies filtering, searching, and other "sounds easy but takes time to do" considerations.
  • Dashboard: This is a very common term for "our user is going to sit on this page and pretty much never leave it". Updates, reloads versus AJAX-style refreshes, etc., are all going to be considerations.
When you're hearing about a new feature, listen very carefully to the summary. It'll usually tell you about most of the major risk areas and time sinks, and it's the heart and soul of what the user really wants, instead of all the things they think of that sound good. The summary is the part that you need to get right.... and that'll give you a happy customer.

Monday, September 19, 2011

From Zero to Multi-Mechanize

I'm working on a new project and the preference is for Python for test code. The current piece of the project is creating some load tests. Now, there's no good reason to write my own load test framework from scratch, so I did some research and started evaluating Multi-Mechanize.

Getting it set up on my Snow Leopard system wasn't completely trivial, so here's what I had to do:

  1. Upgrade to Tkinter 8.5.10. This is needed to get 64-bit Python to run properly with matplotlib.
    1. See http://www.python.org/download/mac/tcltk/ for background information.
    2. Download ActiveTcl 8.5.10
    3. This is a standard .dmg file, so double click and follow the bouncing prompts to install. Take all the defaults.
  2. Upgrade Python to version 2.7. This is needed to run the latest version of matplotlib.
    1. Grab Mac OS X 64-bit/32-bit x86-64/i386 Installer (2.7.2) for Mac OS X 10.6 and 10.7 from python.org.
    2. This is a standard .dmg file, so double click and follow the instructions in the installer. Again, take the defaults.
  3. Install numpy. This is a prerequisite for matplotlib.
    1. Go to the installer site. Do not click the link at the top! It's tempting, but that's for OS X 10.3 and won't work. (And by tempting, I mean I did this.) Scroll down and grab this one instead:  numpy-1.6.1-py2.7-python.org-macosx10.6.dmg
    2. Look! Another .dmg file. You got it. Double click, bouncing prompts.
  4. Install matplotlib. This one you'll have to actually compile on your own.
    1. Do not download the .dmg file; that's also for OS X 10.3 and won't work. Instead, grab the source from here:  matplotlib-1.0.1.tar.gz
    2. Unpack the source and put it in a useful (temporary) location.
    3. Open matplotlib-1.0.1/setupext.py in your favorite text editor.
    4. Line 832 looks like this:  (Tkinter.__version__.split()[-2], Tkinter.TkVersion, Tkinter.TclVersion))
    5. Remove the [-2], so that it looks like this: (Tkinter.__version__.split(), Tkinter.TkVersion, Tkinter.TclVersion))
    6. In matplotlib-1.0.1 copy setup.cfg.template to setup.cfg
    7. In setup.cfg, uncomment line 10 and change it to read: basedirlist = /usr/X11
    8. Install! python2.7 setup.py install
  5. Install mechanize. Needed for, surprise, multi-mechanize.
    1. Download it: mechanize-0.2.5.tar.gz
    2. Unpack the source and put it in a useful (temporary) location.
    3. Install: python setup.py install
  6. Install multi-mechanize. Last step, promise.
    1. Download it from the project site.
    2. Unzip it and put it in a safe location. This one isn't an installer; it just runs, so watch where you put it.
Overall the process is tweaky but not difficult. Just make sure you get the right packaging for everything (32-bit versus 64-bit, versions of OS X), and it works just fine. Now off to play with a new (to me) load testing tool!

Friday, September 16, 2011

Pick One Measurement

I've been working with a new client who is attempting to use a variation on Scrum. They have two week sprints; and each sprint starts with the whole team getting together and talking through the stories on the backlog, estimating them, and then committing to them. Each sprint ends with a demo and retrospective. They're using Rally, so they have a burndown chart that shows their progress through the sprint, and accepted stories, etc. This is all fairly standard.

They're also making a very common mistake: measuring the same thing two ways. They measure points and they measure hours.

Stories are sized by points: 1, 2, 3, 5, or 8 points. Points are relative indicator of how large the story is. A 1 point story is pretty small, and a 3 point story is larger than a 2 point story but smaller than a 5 point story. An 8 point story is code for "we dunno, but it's huuuuge!"

Tasks are estimated in hours: 1 hour, 2 hours, etc. This is purely free form; any value is allowed.

Here's the thing, though. Stories are made up of one or more tasks. This effectively means that stories are estimated at some number of hours, where the number of hours is the sum of all the task estimations. (Rally, incidentally, is happy to do the summation for you.)

Now stories have both points and hours, and they're both trying to get at the same thing: "This is how much effort we need to accomplish this story".  We end up with a list like this:
Story 1: 2 points, 10 hours
Story 2: 3 points, 11 hours
Story 3: 2 points, 6 hours
Story 4: 1 point, 8 hours

That's mostly just confusing. The 1 point story is going to take longer than a two point story? Two stories that are the same number of points are going to take fairly different amounts of time (6 hours versus 10 hours)? That's a lot to reconcile.

If this were source code, we'd be cringing and saying, "But it's not DRY!" It's roughly equivalent to having a do_stuff and do_stuff_2, where do_stuff_2 is a copy of do_stuff but uses integers instead of floats for the math. They're almost but not quite direct copies of each other.... confusing!

Same goes for our sprint stories. We're trying to accomplish one thing: estimating effort. Let's accomplish one thing one way. It'll cut down on the confusion, and reduce the amount of time we spend micromanaging stories and tasks in an attempt to achieve consistency.

Pick one. Points or hours, it really doesn't matter. Just pick one and only one.

Wednesday, September 14, 2011

Getting By in Portuguese

I spent the last week in Lisbon, Portugal. My husband was there on business, but I wasn't, so I was free to go wander around the city, see all the tourist traps, stuff myself silly with all sorts of delicious food, etc. All in all, a great time.

There was only one drawback: I don't speak any Portuguese at all. I did learn "obrigada", "bom dia" and a few other key phrases, but effectively I was mute. Now, I got lucky in that there were many helpful people who spoke English, and I was able to get by. I was also helped by context; for example, when I was buying a bottle of water, I knew the next phrase in a simple purchase was probably going to be the price, so I knew to get out my money. Gestures and other contextual indications let me complete transactions successfully.

Now what does this have to do with software?

It's all about language. I got by but didn't thrive in Portugal because I couldn't speak the common language. The same is true in software, and particularly in software management.

If you speak the common language - pipelines and queues, or REST APIs, or whatever your jargon is - you can thrive. If you don't speak the common language, you'll never do more than get by on context and people who are feeling helpful.

So learn the language - and thrive.

Wednesday, September 7, 2011

Weird Jenkins Thing

File this one under "I have no idea why this did what it did." Be warned, I have no explanation for this.

I have Jenkins running on a Linux box, with a Windows 7 slave box. I'm using a fairly standard (I think) setup on the slave box:
- Jenkins 1.428 (slave) running as a service as the local user
- msys-git (aka git bash) for Git
- git plugin version 1.1.12

In my general configuration, I set up "windows git", which points to "C:\Program Files (x86)\Git\cmd\git.cmd".

In my project, I specified my clone URL and branch. Everything else was set as the default. I'm using SSH, so my clone looked like ssh://git@myrepo/myrepo.git.

In Git Bash, I:
- created an rsa keypair
- added the public key to my git repo
- proved to myself that I could clone the repo

Hooray! All set up! I fired up Jenkins, clicked "Build Now", waited 0.31sec and watched it fail with this error:
ssh: myrepo: no address associated with name
The remote end hung up unexpectedly

Huh? Yeah, it looks like a DNS problem.

The solution:
  1. add myrepo to the hosts file
  2. restart the Jenkins service
I have no idea if this will work for you; I only know it worked for me. I can't say I love the solution, but for now it'll do. And to anyone else who sees this problem, let me know if you find a more elegant solution.

Wednesday, August 31, 2011

Code Reviews: What I Look For

I'm currently working with a number of contractors, and all of us are working in the same code base. We each work on a separate feature branch, and when it's done, we merge to master and deploy to staging, and then to production.

One of the benefits of this is that it's easy for us to review each other's code. We just grab the feature branch and start looking. Usually we're developers reviewing each other's code, but sometimes we'll get a tester to take a look, or even one of our fearless marketing types (she's usually looking at language and naming).

Now, a lot of books and checklists will tell you useful and good things to look for:
  • relatively short methods
  • good encapsulation
  • conformance to style guidelines (get a tool to do this, if possible)
  • all the tests pass
  • error checking and input handling
  • readability and understandability
  • locations and placement (use of libraries, MVC conformance)
  • etc.
These are all great things. But wait.... there's more. We do code reviews because they sometimes offer us other benefits, like:
  • an example of an elegant way to solve a problem (if there's an elegant solution in that particular code snippet - and it's not uncommon!)
  • a basic understanding of how the new feature fits in to existing features. This is useful when we go to do another new feature later on - I'll probably remember that I should go look at this feature because it might be affected.
  • an interesting library (or gem or whatever your language calls it) that's worth looking at for other things
  • common code I might not otherwise have noticed was there for me to use
  • a chance for the developer who wrote the code to ask questions or point out worrisome areas
Bob Martin's Clean Code has a lot more information, and I highly recommend it. Then go forth and actually look at your code base. There's a lot of neat stuff in there!

Monday, August 29, 2011

Budgeting for Engineers

It's budget time (oh boy!). Budgeting is its own kind of fun, because it involves not only understanding where your past and current expenses are, but also guessing what your future expenses will be.

And that's hard.

There is an entire financial discipline in forecasting and budgeting, and I'm not going to get into all of it here. For many of us, budgeting isn't our day job, but it's something that most engineering managers wind up doing at some point. It almost always comes to us phrased as, "So how much money do you need for the next 6 months?"

First response: "I dunno. What do you want us to build in the next 6 months?"
Analysis: Completely true, and totally unhelpful.

More helpful (and still true!) response: "Sure, let's talk about assumptions, priorities, and constraints."

Some tips:
  • Figure out one thing that you will use as the basis of growth. This could be number of users, number of logins, number of credit card transactions, whatever. Base everything off that one number (how many devs you'll need, how much your hosting costs will be, etc.), and your budget will stay coherent.
  • Separate assumptions. Put them in a separate tab in the spreadsheet, and calculate everything you can from those assumptions. That way when you discover a bad assumption, it's easy to fix.
  • Put in all the detail you can think of. No matter how small and silly, put in all the details. Specify your defect tracking system and source control, not a line item for "dev tools". Yes, it will be a very long list.
  • Create a summary. Take the subtotals from your detail and use them to populate a summary version of the budget. This is what you'll show people, and then use the details when they start asking questions.
  • Add 10% for unexpected things. Something's going to happen that none of us thought of. Give yourself a buffer for that.
I'm an engineer. I spend a large chunk of my day on code, not on budgets. But the budget is important, and I owe it to myself and to my team to do it properly. A budget is a negotiation, and just like any negotiation, the more prepared you are, and the more prepared you look, the better you'll come out of it.

Here's hoping these tips can help other engineering managers get through budgets with as little angst as possible.

Wednesday, August 24, 2011

Make Starting Easy

I've just started doing some work with a new client, and one of the things I'm doing is helping the team get started with some basic testing. The team consists mostly of programmers out of academia - they can write some code but have never written commercial software. This is in no way a derogatory statement; it's simply the current state.

One of the attributes of this team is that they know there is this thing called "software testing" that sounds important, and they know that there is "automated testing" that sounds like it would be really useful, but they're not at all sure how to go about doing it. Again, not a problem; that's why I'm here.

Testing software is a big problem space. People spend entire careers in software testing, or in specialties within the field. That's intimidating. For those of you non-testers in the audience, saying, "teach me to test" is something like saying, "teach me to be a developer" or "teach me accounting". This team doesn't need to know everything about testing; they're not going to be career software testers. They're just trying to make sure that they ship a product they can be proud of, and to make sure that their system does what they expect.

So how do we get a team of eager and smart people into software testing?

Make it very very easy to start.

So we started by talking about testing, and noting that there were a lot of different kinds of tests and many different terms. And then we ignored all of those terms and kinds of tests. Oh, we'll get back to many of them over time, but for now, it's not important.

Starting is important.

Doing one test, understanding why that's a test, and learning what we will get out of doing that test: that's what's important. And that's all that's important right now.

So we started with the very basic core principle of testing: you want to know something about this system. A test is a way of formulating that as a question and finding an answer. That's it. For now, it's really that simple.

So we're starting with the questions that we understand, that we're already trying to answer. And we're talking about how to answer them effectively, and how to formulate the questions so that they are answerable. You know.... testing. We don't care if it's a unit test or a performance test or what technique we're using. We just care that we're getting information out of the system, and that the information is effective and useful.

When you're faced with a big learning area, whether you're the student or the teacher, don't panic, and don't try to digest it all at once. Recognize that the hardest thing is starting. Then make starting as easy as possible. Nuance, techniques, and skill can come later. For right now, just find a way to get started easily.

Monday, August 22, 2011

Jenkins and Git

I've been setting up Jenkins for a client. While setting up a project, I ran into a very strange problem in which Jenkins (or more precisely, the git plugin) couldn't find git.

Here's what I did:
1. install Jenkins
2. create a project (call it "Foo")
3. say, "whoops. forgot the Git plugin"
4. install the Git plugin through Jenkins
5. restart Jenkins
6. configure the repo in Foo
7. build

I kept getting the error:
Error trying to determine the git version: --version
Assume 1.6

It couldn't see the git binary. I set the git binary in the main Jenkins config to "git" and to "/usr/bin/git", with no change. I also confirmed that the jenkins user could run the git binary, which it could.

The Fix:
To fix it, I deleted the project and created a new one. After all that, apparently you have to install the Git plugin before you create the project.

Lesson learned.

Thursday, August 18, 2011

Who/What/When/How/Why of Testing

I've been working with a client who has a group of academic engineers. These are very smart people who haven't ever built a product or worked in commercial software. So they asked me, "Can you teach me testing?"

Well, yes. I can teach you about testing. It's going to take a while. Ultimately, most of these guys don't need to become software testers, though. They just need a framework for figuring out how to figure out if their system does what it ought to do. The more sophisticated test design can be handled by the actual test team. These engineers just have to get far enough that they can work effectively with the testers.

Here are the very basics of what I told them:

Who should test?
Anyone can test. QA engineers (aka testers aka QE engineers) are simply people who specialize in testing activities. That doesn’t mean they’re the only ones who can do it! Use QA engineers as sources of knowledge and ideas just like you’d use a software architect, or a build engineer. They’ll do a lot of testing, but with a little help, you can also test.

What should you test?
Since testing is about gathering relevant information about potential and/or actual behavior, many things can be tested. Frequently, portions of software and systems as a whole are tested. However, software designs, UI designs, hardware, and even requirements or documentation can be tested. In short, if you (or someone involved in a project) want information about something, test it!

When should you test?
The earlier you test, the more time you have to react to what you learn from testing. So if you can test part of something, that’s fine - it’s early feedback. Testing continues for as long as you will do something with the information. That means you can continue to monitor and test, even after you’ve shipped the software. After all, you can still use what you learn, just in the next release.

How should you test?
Pretty much anything is fair game in testing. There are some guidelines and techniques that we can cover, but if it gets the information you need in a repeatable and sound manner, then go for it.

Why should you test?
You should test to get information that you can use. If you don’t have a use for the information you’ll learn from a test, then don’t bother doing the test. If you don’t know what information you might get from a test, then you need to better define your test (translation: go talk to someone who tests for a living).

For many testers, this is "back to basics" information. For someone new to professional software engineering, though, basics are a great place to start!

Tuesday, August 16, 2011

Why Textaurant Runs Rails

Textaurant is a webapp. It displays, routes, and gathers information. It sends text messages, and we emphasize the user experience on mobile browsers. The number of technologies we could have used to build Textaurant is really quite large. We could have gone with Java, or .NET, or Python, or PHP, or any one of a number of languages and frameworks. Any one of them would have been valid and viable choices.

We went with Ruby on Rails. And here's why:
  • Speed
  • Support
  • People
Textaurant is a startup, and (surprise!) we have competitors. We beat our competitors by giving them better value faster. On the technology side, that means we give them more features, more reliably, faster. Development speed is a huge consideration. Ruby on Rails is a fast development framework. My team and I can get features out there quickly. We can tweak them quickly. That makes us responsive to our customers, which helps us get and keep happy users.

There's a lot of support for Ruby on Rails; it's a rich development ecosystem. We use Heroku for hosting, for example, because it's optimized for Rails applications. That means that we can provide a stable, scalable (within reason) environment that is optimized and secured for our application. And we get it all for just the cost of hosting. There is also a strong community that makes reusable gems for everything from sending text messages (thanks, Twilio) to authentication. That means I don't have to reinvent the wheel, and makes development faster. (I should note that we also try to give back to the community by creating, maintaining, and/or contributing to gems - it's only a rich ecosystem because we all contribute to it.)

Finally, we use Ruby on Rails because of the people. There are many developers who can write Rails applications, and many testers who can test this type of application. This gives us a wide talent pool to draw from, and also lets us honestly say to people who work with us that they're developing and maintaining a useful skill set that they can use in the future, for Textaurant or wherever they end up. (Tangent: I had a job early in my career in healthcare IT. We saw a lot of developer candidates who had spent five years learning and working in a custom language called Magic for another healthcare company. They didn't have much in the way of Java skills, and that hurt them in the job market; many of these candidates got passed over, even though they had industry experience. I don't want to do that to anyone who chooses to work with me.)

Will Textaurant use Ruby on Rails forever? I don't know. Like any technology choice, Ruby on Rails has strengths and weaknesses. At some point we may run into limitations of the language and/or the framework and make a change. For now, though, we're using the technology stack that best fits our needs, and hopefully that fact will never change.

Friday, August 12, 2011

Notes on Feature-Based Releases

Let's say for the sake of argument that we would like to do feature-oriented releases.

Why We Might Do So
There are a number of reasons we might choose to do a release based on its contents (features) rather than a date. These include, for example:
1. because a customer is blocked by a bug and cannot wait until the next release
2. because a customer or potential customer is blocked by a feature that is required to complete (or continue) their development
3. because we want to show a customer or potential customer that we can turn around requests quickly
4. because a feature is likely to need change and we need to provide an early copy to customers (some or all) to collect feedback

Notes on Speed
An implication behind feature-oriented releases is that they are likely to be more frequent than date-based releases. This is mostly because date-based releases generally include several features, and are typically larger overall change.

Number of Versions in Field
There are some downsides to doing feature-based releases. In particular, it winds up with far more releases in the field. In a situation like a hosted web application, where the company controls production, this isn't a problem. In a situation where software is shipped and upgraded by customers (consumer installed software, app-store-based applications, many enterprise applications and libraries), the number of releases in the field can be a concern. Too many releases in the field causes a major drag on future releases, mostly due to upgrade and similar tests having many more variables.

There are a few ways to handle this, from aggressive release end-of-lifing to support policies forcing upgrades. This is a bit of a delicate line to draw with customers, though.

Increasing Speed
Let's assume that the time to actually implement, debug, refine, fix, test, and release any one feature is roughly the same regardless of what type of release we're doing. The way to increase the speed of releases, then, is to reduce the other work that goes into a release. This other work can be characterized as:
1. integration of other features, in particular of multiple in-progress features
2. documentation, including release notes and updates to presentations, marketing materials
3. availability notification of releases (e.g., upload to a customer portal)
4. once-per-release validations and measurements (e.g., performance tests, or packaging validation)

Of these, by far the largest are numbers 1 (integration) and 4. Number 1 in particular is often a large unknown: two features that are fine alone may not integrate cleanly. When both are still in progress, they are likely to integrate even less cleanly. For number 4, with very frequent releases, it can often be necessary to perform these validations once per several releases, depending on what they are for (e.g., do in depth performance testing every 5th release or if there is some concern about the specific feature being released).

Use of Branches
One of the most common problems with a feature-based release is when there are multiple features in progress. Feature A may be complete and ready to go, but feature B isn't ready and the tree is unstable as a result. Feature A is in essence held hostage by feature B. For this reason, doing feature-based releases, we usually designate a "release branch" (typically main or head) and "feature branches" (one per feature). The workflow for a feature looks like this:
1. create a branch for that feature
2. do all the feature work
3. merge from the main release branch (to pick up any changes or other features that were finished while you were working on this one)
4. fix/test your feature
5. repeat 3 and 4 until complete
6. merge to the main branch
7. sanity check and release

This adds some branch management overhead, but pays off in the ability to keep a stable, releasable main code base.

When working with multiple releases and in particular with APIs where deprecating features or APIs is a consideration, special attention should be paid to how many releases occur before a deprecated element is removed. As with all release types, the goal is to balance simplification of development (by removing support for deprecated items) with customer ease (not having to rework their integration). One approach is to do periodic "roll up" releases that remove deprecations from previous releases. These are frequently the releases to which you will push your slow-upgrading customers, who are likely to skip interim releases.

For example, we might release like this:
4.1: new feature A
4.2: new feature B
4.3: deprecate old feature X
4.4: new feature D
5.0: roll up release, remove deprecated feature X
5.1: new feature E
5.2: deprecate feature Y
5.3: new feature F

Feature-based releases have a number of benefits. They're also a very bad idea for some situations and some groups. Take a look, figure out if it's right foryou, and good luck either way!

Wednesday, August 10, 2011

Some Thoughts on Heroku

I've recently started working with Heroku for the first time; we're hosting a Ruby on Rails app on a shared database there. I've had sites hosted on Amazon EC2 for years, so I'm used to working with virtual machines in cloud environments, but Heroku is a new twist for me. What follows are some early impressions.

What I Like:
  • Easy administration. Our traffic happens to be extremely spiky (Friday's evening traffic is approximately 100x Monday's noon traffic, for example), and the ability to add and drop capacity very quickly is huge. It's literally a GUI slider, so even the non-technical business type can do it when I'm not available.
  • Low systems administration overhead. There are a lot of things involved in successfully administering a system: making sure there's redundancy, setting up and maintaining (and testing restores on) backups, keeping the machines up to date, etc. This isn't entirely gone under Heroku, but it's significantly reduced.
  • Rollback. I've only had to roll back once (to make sure I knew how!), but it was pretty darn simple. And when you're rolling back a change, you're already in a negative frame of mind, so it's really helpful to at least have an easy way out.
What I Dislike:
  • No database access. We're on the shared database, so I can't actually log in and run a SQLquery. It's not a big deal most of the time, but it turns out that I'd become a little bit addicted to having this information when I look for data-related issues or want to see query plans to track down a performance bottleneck, etc. I don't like the feeling of being blind. I should note that this limitation does not exist if you're running on a dedicated database.
  • Deployment. See below.
  • Rollback. Yes, this one is on both lists! It's easy to roll back code. Rolling back things like database migrations is a whole different kettle of fish, and I have the same problems here that I do with deployment.
  • A general feeling of blindness. I always feel like I'm missing something when I use herokulogs, like it's harder to grep through. I don't know that it's actually any better or worse than less and grep on a machine, but it's taking some getting used to.
  • Worries about uptime. Just like everyone else, Heroku isn't perfect, and we have seen some downtime. It's pretty rare, but it's still worrisome.

A Note on Deployment:
With Heroku, you deploy using git. For me to deploy to staging, for example, I do this:
1. check out the staging branch (in our github account)
git checkout staging
2. merge whatever I need to
git merge blah
3. run the tests as a sanity check
bundle exec rake rcov
4. push my changes to github
git push origin HEAD
5. push my changes to heroku
git push heroku-staging staging:master

On the surface, this is pretty slick. I'm already using git for source control, so deployment is as easy as setting up a new remote and pushing to it. For anything that's more complicated than a simple git push, though, it gets a lot messier. Let's say I have a migration to run as part of this change. Then my push looks like this:
heroku maintenance:on && git push heroku-staging staging:master &&heroku rake db:migrate && heroku maintenance:off

Adding other things, like jammit, gets even messier. So overall, it sounds great, but I'm not a big fan. Things like chef and puppet and capistrano exist for a reason and are great to use, so I'm not sure we needed yet another way. I've wound up writing recipes to do most of the pushing for me so I don't miss a step, though, which helps.

Putting my money where my mouth is, are we sticking with Heroku? Yes, at least for now. The quick and easy scale-up and scale-down is huge, that trumps the quibbles. Oh, and the quibbles really are pretty minor.

Overall, Heroku is not for everyone, and I encourage anyone considering it to look very hard at things like cost, uptime, service levels, backup policies and timing, etc. Make an informed decision, but after my experience so far, I would at least consider Heroku next time.