One thing that’s been bothering me at work recently: our response time to bugs is absurdly slow. Even bugs that are marked as high priority take a while to get worked on; bugs that aren’t marked as high priority may well never get worked on.

Now, some of this is a classification issue: maybe a bug was incorrectly marked as high priority, and there are a lot of bugs open that shouldn’t be there in the first place. But a lot of the bugs that are open really do need to get fixed, and, as the product gets deployed more, there will be times in the future when we’ll run into bugs that really need to get fixed quickly. We’re capable of doing that now (we showed that during a recent trial, for example), but, even so, shouldn’t we spend more time practicing responding quickly, to make sure those skills don’t atrophy?

So, I think, we should rethink our bug prioritization system: we should make sure that high priority really means high priority (i.e. somebody starts work on it immediately), and we should also make sure that there’s a meaningful definition of medium priority (maybe it doesn’t get fixed this week, but it should get fixed this month). That would be a good first start.

Once we’ve gotten that under control, though, and are disciplined enough that high priority bugs are a rare event, quickly solved, we should try to react quickly to a larger class of bugs. After all, from a value stream perspective, the time spent waiting before we start fixing a bug is pure waste. If we’re not sure whether or not it’s valuable to fix a bug, then fine: we should wait until we have more clarity. But if we are going to fix a bug, what are we gaining by waiting? Why not just fix it immediately? We’re not saving work overall by delaying the fix: all we’re doing by waiting is building more debt that we’ll either have to pay off before the next release (if we fix it in time) or after the next release (if it makes it out into the wild). Neither of those is productive in general.

Of course, there are more levels to this problem: in particular, we shouldn’t be inserting defects into our code in the first place. (We’re getting better at that, fortunately.) And we don’t want to use Bugzilla as a substitute for our product backlog: there should be some control over how features get scheduled for implementation. And it can be hard to maintain a steady implementation pace if you’re getting constantly interrupted by bug work. There are solutions to all these problems, however. (E.g. for the latter, a two-part strategy of not writing defects in the first place and allocating slack in your schedule in the second place.) For now, from my point of view, our most urgent issues are (first) reducing the defect backlog and (second) improving response time.

Fortunately, other people agree. Some of my team members have been nagging me for months (years?) to make sure we don’t work on features at the expense of bug fixing, and my boss is more concerned right now with making sure we don’t have problems in the wild than with getting more features added to the product. And metrics have improved over the last week: if we really devote effort to fixing bugs, they do go away. But we have needed to devote the effort: maintaining a constant low-level hum wasn’t good enough, we needed a medium- to high-level hum.

The real test will be once we’ve worked bugs down to an acceptable level. We’ve built up technical debt; once we’ve paid that off, will we switch to a high level of productivity without building up new debt, or will we backslide? I’m optimistic that we’ll do the former: we’re getting pretty sensitized to bugs, and we’re aware of the problems that bugs cause to our normal development activities. On a basic level, they mean we have to waste time each morning investigating red bars on acceptance tests; on a slightly less basic level, the presence of those nondeterministic red bars means that, if you’ve implemented a new feature, it’s hard to be confident that you haven’t made mistakes, because you don’t have a completely clear good/bad signal from the tests.

And once we get bugs down to an acceptable level (zero, say), we can try to still leave some (not all, just some) of the former bug-fixing time in our schedule as slack. Now that I’m convinced (or at least strongly suspect) that slack is a good idea, I want to give it a try, but not without some external signal to let us know when we should stop slacking off! And the presence of bugs sounds like a great external signal to me.

Fun stuff, all this.

Post Revisions:

There are no revisions for this post.