Wednesday, January 28, 2009

Problem Types

I've been thinking about scheduling and risk lately. Taking development and test together, there are some features that are far more risky than others, and I've been trying to figure out how to classify and explain this for others - product management, project management, etc. Most of these are things we just know - adding a field to an existing form is less risky from a scheduling perspective than trying to make the whole form load three times faster. That is, we can have more confidence in an estimate for the first than for the second.

I've started breaking down new features (or work efforts, to be precise) into three separate areas:
  • Fully understood problems
  • Partially understood problems that can be worked in parallel
  • Partially understood problems that are linear
Fully Understood Problems
These are the easiest and less risky. Going into this, you know everything that needs to be done to get it out the door. It's usually in an isolated or at least well-designed piece of code, test coverage is good, etc. Often these are problems that are iterating on a current feature or piece of functionality. These are well-understood because we have essentially done them before.

For these, you can do an estimate and be pretty confident that you'll make it. Your risk in these scenarios comes under non-product areas (developers getting sick, build system dying, etc), and those are risks that you can quantify based on your past development history.

Examples: adding a field to a form, creating the fifth skin for an application, adding a new type or class of data, etc.

Partially Understood Parallelizable Problems
These types of issues are more risky than fully understood problems. Going into this, you know that you don't have all the information you need, and lack of information increases your risk. However, these are problems that can be approached from multiple directions at once.  This means you can increase your knowledge and decrease your risk to a certain extent by throwing resources at the problem.

For these, you need to add an additional padding to your estimate. You can go ahead and do an estimate based on your history of these types of problems, but be aware that optimistic estimates are not the best route here. On top of your usual risk profile, you have the "there be dragons and I have an army" factor.

Examples: a new hardware platform, a new feature written in house, etc.

Partially Understood Linear Problems
The last type of problem to address is a feature or work effort that you don't quite understand and that needs to be worked on linearly. These are the features that you can't attack from a lot of different angles - you fix a problem to uncover the next. Throwing resources at this problem is unlikely to help; it's the classic baby problem (one woman can have a baby in nine months, but nine women cannot have a baby in a month). The best mitigation strategy here is to reduce your cycle time as much as possible; the time between uncovering a problem, identifying it, and fixing it should be shrunk as much as possible, because you don't know how many of these cycles are between your team and their goal. These problems also respond well to a "war room" approach, in which you create a dedicated team to solve this problem with all needed resources - dev, test, IT, etc. - and (physically) isolate that team so this is all they work on.

These problems have potential to be farthest from your original estimates and the hardest to fully estimate. Again, you can look at your team's history based on these types of problems, but you need to assume that it will take longer than you think. On top of your usual risk profile, you have the "there be dragons and I am but one man" factor.

Examples: performance improvements, build failures, etc.

So when you're creating your schedule and assessing the risk, ask yourself what kinds of problems you're facing. If you can classify the kinds of problems you can solve, you can determine how solid your estimates are and therefore how risky your schedule is.


  1. Hello Catherine,
    Thanks for sharing your reflection on scheduling risks. In my opinion this can be a good approach on implementing the considered risks in a time schedule. Classifying the risks and the chance of occurrence can help to reserve time when it happens.

    Perhaps I don't understand you that well, what I miss in your approach is classifying the unknown understandings. As classification is done based on what we already know what can happen. About this knowledge we have some understanding.

    How would you deal with the unknown unknowns? Is there some space in the classification to reserve time for to make the unknown unknowns move towards the known unknowns?


  2. Jeroen,

    I think of it rather simply: the less you know about how you're going to implement/test/ship something, the greater the chances that your estimate is way off. So getting your estimate to be more accurate is a matter of starting to work on the problem - through research, prototypes, starting implementation, etc.

    A non-software example (I'm a cook and it's getting on toward lunchtime here, so forgive the food analogy):

    I can roast root vegetables with a great degree of accuracy. I know it takes me 20 minutes to chop up enough root vegetables for 4 people, 15 min to preheat my oven, 5 min to make a glaze, and 1 hr 10 min to actually cook the vegetables. Further, I know what I can parallelize (chopping, preheating, and glaze making). So I can tell you that it will take 90 min to make root vegetables, and I'll be within about 5 min of that every single time.

    Now let's say that instead of roasted root vegetables, I'm going to make cassoulet. I've made cassoulet... well, never. I sort of know what I'm doing - I need to cook some beans, make duck confit (or buy it), and cook the whole thing up with some sausages and other meats. I can put together an estimate - about 24 hours for the beans, plus about 6 hours for the actual cassoulet. But there are a lot of unknowns - how long to really get the beans creamy, how a pot I should be using, how long it will take to get the meats to really release their juices into the sauce, etc. Higher risk, less accurate estimate.

    So how to deal with the unknowns in my cassoulet?
    - research (aka read a recipe or several)
    - prototype (aka make roasted sausages and meats separately)
    - begin implementation (aka make the beans, then revise estimates an hour or two into the second day)

    Software, I think, is much the same way. Estimation is not a one-time thing; we refine our estimates as we gain more information. The onus is on us to determine how to get that info, but that's an exercise for a later comment.

  3. Hello Catherine,
    You gave an amazing good example. I don’t think I would be able to explain it better with cars or building a house. I certainly will remember this one.

    It supports also my thoughts. How I understand your classification is to use those to define the risks and based on the impact and classification you can use it to assign some expected time to it.

    What I didn't see is the time reservation for the unknown unknowns. You already mentioned that you have to gain more information to refine the estimates. I can imagine that you also counted this refining into the assigned expected necessary time. In my opinion that can only be done for those problems you already might expect.

    Normally I used some additional slack time for the unknown unknown problems/risks. Would you think claiming time for unattended problems might be an option?

    I think based on knowledge of the system or the number of questions you still have to ask to the system a risk is there you will find problems which you didn't find yet.

    Would the next category an option to add: unattended problems?

    with regards,

  4. Jeroen,

    That is a very good point. When we can say how much we know or don't know about a certain problem X, then we can classify its risk level. When we don't know what we don't know, well, then we're in real trouble.

    I'll have to think about that one some more.