I have a couple of machines that have been hit by this latest Amazon outage. I have one machine in particular that's still out as of this writing. Now, it's not killing me; I can get around it for a few days (good thing, too!). Still, I'm grumpy about it.
And you know what I'm grumpy about? How updates are displayed. Amazon's doing a reasonable job of providing updates and estimates, which I applaud. So why am I annoyed? Every time I want an update, I do this:
1. go to the AWS Service Health Dashboard (already open in a tab)
2. hit refresh
3. scroll back up the page to the line that I'm interested in
4. click the "more" link
5. scroll down to see the latest update
It's a really small thing, and takes maybe 10 seconds total. But the scrolling bothers me; it could be better. Granted, it would shave off maybe 5 seconds, tops, but it's still just a little sloppy, and that tips me over the edge into grumpy about it.
Now, I'm not going to argue that failures are good. I will argue that they're going to happen. My boxes at Amazon EC2 will go down occasionally. My boxes in the office will suffer a power outage (true story: someone backed into a power substation once and poof!). Services will become overloaded, response times will drag, bugs will happen. Given the amount of software and systems we encounter in a given day, the fact of failure is going to happen.
So make failure clean. Make it as enjoyable as possible.
Twitter is a great example of this. People whine and complain when Twitter goes down, but their failure message (the famous fail whale) is cute. It's so cute it's spawned a cult following. People make cakes:
People make necklaces. They make Flash animations. All for a failure message!
It turns the conversation from the failure to something more positive, like your cute error message. That doesn't excuse failure, and it doesn't mean you should accept failure. It just adds a positive note to a negative experience.
And that's doing failure right.