Last week, I went to a talk by Mike Cohn on “Agile Estimating and Planning”. Good timing: I’d been thinking that I should get around to reading his book on the subject. Which I won a copy of at the drawing after the talk; apparently my recent remarkable good luck has (correctly) decided that I have enough iPods and should start winning other things instead.

I’d gotten my previous take on the subject from others’ books and from a presentation by Ron Jeffries; back when (a previous incarnation of) my team used to estimate regularly, we followed Ron’s 1-point, 2-point, 3-point idea. (Well, we did until we dropped 3 and added 1/2, but the result is almost the same thing.) Mike Cohn, however, uses much larger numbers: 1/2, 1, 2, 3, 5, 8, 13, 20, 40, 100.

That’s for story points; he recommends estimating tasks within stories in terms of hours. (I can’t remember if Ron talked about estimating tasks at all.) And he made a good point about why you should use artificial points instead of real time units for your story estimations: if you use real time for both, you’ll be tempted to expect, say, the time estimated for the tasks making up a story to add up to the time for the entire story. Which makes sense, except that you estimate a story before you’ve broken it up into tasks (you don’t do the latter until somebody has decided that you’ll work on it), so when you do the task estimation, you’ll have thought much more about what’s involved in implementing the story. And you can’t convert between “hours that you’ve thought a lot about” and “hours that you haven’t thought much about”, which you’d be sorely tempted to do if you use hours in both situations.

Mike came about his expertise on the subject honestly, by the way: he was VP of engineering at a company that had adopted Scrum, and that had a fair number of teams working on not-very-long-lived projects. So teams had to estimate stuff, and he imposed the rule that, each project, you had to do something different when you estimated. There was enough discussion going around that people had an idea of what teams with accurate estimates had done in the past, but the rule meant that they couldn’t just stop and declare victory, they had to keep on trying to find ways to improve. A nice example of evolutionary process improvement.

Anyways, after the talk, I asked him about his versus Ron’s recommended range of estimation values. Part of his answer was that maybe the right thing for somebody working in the trenches is different for the right thing for a VP of engineering. More generally, people are going to ask the team how long it takes to implement a feature that’s larger than a simple story; they need a way to answer that. Which is a good point – I don’t have that clear a view on how Ron recommends estimating features larger than a single story. (I should ask, shouldn’t I?)

There’s still a tension there that I’m not entirely comfortable with. Unless you go with long iterations (and Mike prefers two-week iterations, which already doesn’t seem long enough), I don’t see how you can fit stories that vary anywhere near a hundredfold in length into a single iteration. Now, stories at the extremes (especially the large end) are bad, but still, a 1- to 13-point range (or whatever) seems too wide to me to fit within an iteration. But a story that can’t be done within a single iteration isn’t really a story, is it?

So maybe there are there levels needed: features, stories, tasks. Each with their own (non-convertible, as above) estimations. But that’s too much estimation. Given that, I’d actually be tempted to drop the task estimation instead of the feature estimation: isn’t it kind of pointless to spend time how many hours a task will take? Just implement the damn thing! In the previous incarnation of my team, we did break down stories into tasks (we should get back to doing that, it was useful), but we didn’t estimate individual tasks, and I never felt the lack. Maybe I was missing something, but it still seems funny to me.

Actually, though, it’s entirely possible that we were subtly shifting things by a level (and making them too long, to boot.) Because the truth is that a lot of our stories were technical: we weren’t clever enough (and weren’t working with a Customer representative to give us a nudge) to break work up into small, customer-visible units. So maybe what we called stories were really tasks? I don’t think that’s quite accurate, but there’s enough truth to that to make me nervous; something to think about more.

Since Stuart brought it up (see his blog post on the talk if you want another take), I might as well talk about another question I had. Mike presented some very interesting examples (you can see his slides, by the way) of studies that showed that, when people were given extra, irrelevant information, their estimates for tasks increased. (My favorite example was when group A and group B were given exactly the same text, but in one case on a single piece of paper while in another case spread over seven pieces of paper.) To which I asked: that’s neat, but which estimate is more accurate?

I freely admit that I asked this solely out of methodological purity: even though the studies didn’t give any evidence about the relative accuracy, I know which way I’d bet. (Well, one of the studies sort of did give evidence: they gave three teams the same tasks, but told team A nothing about expectations, team B that the customer hoped it could be done in 500 hours and team C that the customer hoped it could be done in 50 hours. All teams insisted that the hopes had nothing to do with their estimates, but team A ended up with an estimate of 456 hours, team B with 555 hours, and team C with 99 hours! Scary, that: a trap that I fall into all too often myself.)

But, the more I think about it, the less sure I am which team’s estimate is the most accurate. Take, for example, the study where team A was told to estimate requirements 1-4, team B was told to estimate requirements 1-5, and team C was told to estimate requirements 1-4 but were also given the future requirement 5 for purely informational purposes. In this case, A and B both estimated 4 hours (even though B was told to estimate strictly more work than A) while C estimated 8 hours (even though they were told to estimate the same work as A)! Looking at that, I don’t see at all why I should believe that A is the most accurate – they give the same answer as B, which is within the margin of error but clearly odd. What seems more likely to me is that A and B are estimating in terms of “hours we haven’t thought much about” while C is estimating in terms of “hours we’ve thought more about”, which we learned earlier can’t be converted to each other!

Anyways: good talk, a good reminder that we should get back to estimating once matters get a bit more under control, and I ended up with a book and enough sets of his planning poker cards that we can use them in our future team meetings. If you have a chance to hear him, I definitely recommend it.

Post Revisions:

There are no revisions for this post.