Archive for the ‘Lean / Agile’ Category

random links: august 26, 2007

Sunday, August 26th, 2007

detailing carpets

Thursday, August 16th, 2007

I’ve been on a bit of a Christopher Alexander kick for the last couple of years. At first, I started reading his most famous books, but those were good enough to leave me curious about what else he’d written. Not all of which is great, but enough is to keep me going.

Still, it’s taken me a while to get around to his book on carpets. For one thing, it’s out of print, and it took me a while to find a copy at a price that I’m willing to pay. For another thing, I don’t particularly care about carpets! But I finally found a copy, which I started reading earlier this week.

And I was completely blown away right from the start. Mainly because of the pictures: I was completely unprepared for the colors that are used in the carpets that he gives as early examples. But he also starts right off with a point that is really resonating with me right now:

What is often called the “detail” of the building—its fine structure—is not some kind of icing on the cake, but … the essence of what it is, and how it makes its impact upon us. (pp. 7–8)

The small stuff matters just as much as the big stuff; the power of the big stuff comes in large part out of the small stuff.

Translated into programming terms: keep your code clean, make every line as expressive as it can be. Which is something that I really enjoy doing, and I’m quite sure that I have a lot more to learn in that area. In particular, I’m fairly sure that there’s a lot more power to the notion that unexpected structure can arise somewhat spontaneously by determined refactoring than I’ve experienced so far. I don’t really believe that determined refactoring is all you need for good structure, but there too I have a lot to learn: where do you need more design, and what sort of extra design do you need?

(Side note: if I’m remembering correctly, Lisa Crawford told me once that Gustav Leonhardt spent a lot of time working with his students on how to best play short musical phrases. This sort of ability worked great for him, but apparently left some of his students able to play pieces in ways that sounded good at the small scale but didn’t work out so well in the large.)

But I’m not getting nearly enough practice at this. In my pet project at home, I’m trying to keep the code expressive and free from duplication, but there’s only so much you can learn from a thousand lines of code. I’ve had some interesting experiences at work, but there’s a lot of legacy code there, and I can’t responsibly spend large amounts of time cleaning it up.

Having the team as a whole spend small amounts of time cleaning it up is something that would be responsible for me to push on; I haven’t been very successful in balancing that against other activities with more immediate payoff so far, but I want to keep on trying. Even if that is successful, though, what I’ll get out of it will be different: I’ll learn a lot about balancing competing short-term and long-term demands, which is great, but I won’t have the experiences of uncovering structure myself.

What to do? One possibility is to try to work on a bigger project myself at home. I don’t feel like I have any big ideas bursting out right now, though. (Do any of my readers? I’m open to collaboration…) Another possibility would be to spend some time on evenings and weekends cleaning up the work code base. That sounds more reasonable; it still raises issues as to whether or not I’d be acting responsibly in doing so (since I’d be doing so for my own benefit, and I wouldn’t want that to hurt my team, e.g. by setting a bad example or depriving them of pleasure/learning opportunities), but in the balance I think that would be okay.

Or I could find a medium-sized open source project with an interesting code base to get involved in. That’s also an intriguing idea, and one that could have other benefits (and, to be sure, down sides); I have a few ideas in the back of my mind, but nothing concrete yet. I’m certainly open to suggestions on this front.

I’ll think it over for a few months: balancing the demands of existing programming projects, learning Japanese, reading blogs, blogging, playing video games, and of course reading books is quite a challenge as is. So, realistically, something has to give, which probably means that I have to wait until I’m done with my current personal programming project, and I already have another couple ones sketched out after it. But maybe in three or six months I’ll have managed to carve out a bit more time.

Back to the book. I’m not as in love with the rest of the first part of the book as I was with the beginning. He’s spending a lot of time talking about centers; I’d already seen those ideas in more refined form in The Phenomenon of Life. Still, it’s interesting enough, and I’m now pretty curious about the catalog of carpets that makes up most of the book.

I now have all of his earlier books except for The Linz Cafe: that one I have been unable to find at a price that I’m willing to pay. I just put in an interlibrary loan request for it, though. Once I’m done with it, I guess I’ll reread The Phenomenon of Life, and then move on to the rest of The Nature of Order.

learning japanese: a month and a half in

Wednesday, August 15th, 2007

I’m on the fourth chapter of my Japanese textbook now, enough for a new set of difficulties to surface. All of which ring vague bells from a decade ago; I’m trying to do things right this time, which means that I need better strategies for facing these difficulties than I had last time.

One problem: when I claim I know a vocabulary word, when I move it from the “review regularly” stack of flash cards to the “mastered” stack of flash cards, I want that to mean that I really do know the word! But, for an uncomfortable number of flash cards, what is really going on is that I can reliably, upon seeing the front of the flash card, recite what is on the back of the card. Which isn’t the same thing.

Some aspects of that problem show up no matter what language you’re learning. For example, I usually only do my cards in one direction, so I regularly drill going from megane to “glasses” but not in the other direction. Also, their are grammatical issues: to really know a verb, you should be able to conjugate it at will, and recognize it in any of its forms.

Those particular problems aren’t that big a deal for me yet. I haven’t learned too much grammar, and I’m doing a pretty good job so far in being able to go from English to Japanese even though I’m drilling Japanese to English.

What is a big deal is the presence of kanji. This increases complexity in a few different ways. For one thing, I have to go between three forms of the word (kanji, pronunciation, and English) instead of just two forms (Japanese and English). And, of course, a single kanji character can have multiple pronunciations, which may or may not have multiple readings, and which may or may not be signalled by adding some kana at the end. (After some experimentation, I’ve decided to exile all the extra kana to the back of the card, instead of leaving it on front.)

That’s the obvious problem, but there’s also a more subtle one. When I see a vocabulary card, I see something I wrote by hand, taken from a limited number of other vocabulary cards that I’ve written. So when I see, say, the kanji for bijutsukan, what I really see is a card with three kanji characters on the front, where in this case I happen to have written the kanji characters a little smaller than would be ideal, and a little bit off center. And, honestly, that enough is almost enough to allow me to uniquely identify the vocabulary card from among my current set, especially if one of the radicals in one of the kanji seems familiar for some reason.

But, of course, that doesn’t mean that I know the word at all: if I saw those same three characters in a Japanese book, I would have almost zero chance of recognizing them as bijutsukan, and for that matter I’d be equally likely to mistakenly think that some other sequence of three characters might represent bijutsukan. I now appreciate what kids learning to read and write English are going through when they see a sequence of letters and guess that it’s some other word that happens to start with the same letter or two and is more or less the same length: they don’t have any deeper grasp of the phonetics of written English than I do of the radicals that make up a kanji character, and in both cases we quickly get overwhelmed by the task of really understanding how a word is written.

So what do I do about this? Part of my solution is to simplify the problem. I can adopt a classic agile planning technique: recognize that there isn’t a strong correlation between the difficulty of a task and its business value, and that, when chosing between two equally tasks of equal business value, you’ll get the quickest bang for the buck by doing the easier one first. What that translates to in this case is that, all things being equal, I should try to memorize words made up of as few kanji characters as possible. So one is best, two might be okay, especially if I’ve seen one of them before, three is unlikely to be a good idea. And not all kanji characters are created equal: given a choice, I should choose characters made up of as few radicals as possible, to increase the chance that I’ll be able to really know the whole character. (As opposed to, say, having the left side of the character trigger a memory in me.)

That alone isn’t good enough, though: it doesn’t leave me with a strategy for dealing with important but more complicated characters/words, and doesn’t directly address the complexity of what it means to learn a character. To really learn a character, I should be able to write it out myself, and be able to reliably tell it apart from similar-looking characters, characters with, say, the same radical on the left and on the upper-right but a different one on the lower right.

The answer to both of these aspects of knowledge is, for me, the same: I need to learn to love radicals. Once I really know the radicals, I won’t have to, say, recognize and reproduce the thirteen strokes making up a complicated character, I’ll just have to recognize and reproduce the three radicals making it up. That’s not a simple problem, given that there are about 200 radicals to grapple with, but it’s at least a tractable problem. Especially since the radicals in a character aren’t chosen arbitrarily: radicals have meanings on their own, so you can frequently build up the meaning of a larger characters out of the meanings of its radicals, and radicals can at times lend their pronunciation to the pronunciation of the entire character. So there’s real structure to work with here; as I buff up my radical credentials, it should become easier and easier for me to learn more and more complex characters.

And, fortunately, I’ve recently acquired an excellent book on the subject. It does a great job of showing how the characters evolved (and is historically accurate, as far as I can tell), and of gradually introducing radicals and showing how they add meaning in more and more contexts. So I’m gradually adding characters from that book into my stack of cards to memorize, even if I haven’t run into those characters in my textbook, and trying to remember the evolution of those characters in the bargain. Should make learning characters more fun, and easier.

That’s the main problem; there are a couple of other problems that I’m running into as well, though. One is that there are too many new words in each chapter for me to be able to memorize. I was worried about this three weeks ago: it seemed like my stack of unmemorized cards was getting longer and longer. Since then, I’ve been doing a pretty good job of moving cards into the memorized stack, but I don’t want to ignore the problem. (Especially since I’m now adding vocabulary cards from a source other than my textbook!)

Part of the solution is to simply not memorize every new word in each chapter. Each chapter introduces maybe 80-100 new words; I’m pretty sure that I can get away with only learning 40 or 50 of them right then. So I’m picking the ones that seem particularly likely to be important, or particularly likely to be easy to learn, and I don’t sweat the other ones for now. And if, in subsequent chapters, I keep on encountering a word that I didn’t memorize when it first showed up, then I can always learn the word later. It’s not completely clear that this is a scalable strategy - maybe, once I get to chapter 15, I’ll have to memorize 5 new words from each of the previous 15 chapters along with an extra 50 words from that chapter, which would suck - but I think it’s worth giving a try.

The second part of the solution is basic queue management: the problem here is an unbounded queue. And if you don’t want to have an unbounded queue, then put a cap on it! So I could adopt a rule that I can never have more than, say, an inch of unmemorize vocab cards in the box. Once I reach an inch, I have to do something else until the stack goes down: some combination of memorizing a smaller proportion of words in each chapter, taking longer to go through each chapter, and learning to be more effective at memorizing words. I don’t have an exam schedule or anything that I’m working towards: I want to do this right, and to do this right I need to balance my capacities, my time, and the number of words that I’m attempting, instead of letting artificial pressures skew my attempts at the cost of a loss of effectiveness.

So far, all the problems I’ve talked about have been about memorizing words, but it’s also starting to get a little harder to put everything in the chapter together. In the fourth chapter, for the first time, I had a bit of trouble doing all the exercises in the chapter the first time through, because of a combination of not having all the grammatical details, the usage details, and the words at my fingertips. I think that, for now, the best approach is to acknowledge that this is a potential issue, and be alert for warning signs. So I’m planning to go through the exercises in this chapter until I can do them all easily; if that means it takes three weeks to get through the chapter instead of two, that’s fine.

I imagine that further non-vocabulary issues will crop up as I go along: needing to memorize conjugations, for example. It’s been a while (almost 15 years! Ouch) since I’ve had to deal with that sort of thing, but I was once adequate at memorizing grammar, so I assume I’ll be able to do it again, and I don’t think Japanese holds any particular horrors in that area. And further holistic issues will appear: getting practice in reading actual books (and finding a suitable gradual series of books to practice that), practicing spoken Japanese. I imagine that, once those become urgent problems, outside guidance will be essential; fortunately, outside guidance shouldn’t be hard to find around here.

Fun stuff.

weinberg on incremental construction

Sunday, June 24th, 2007

I’m a fan of authors on construction whose works I can read in a programming context. On a related note, here’s a bit from Gerald Weinberg with a building/programming analogy that I like. (Quality Software Management, v. 4: Anticipating Change, pp. 216–217:

Imagine building a house by bringing all the parts to the lot, then having everybody run to the foundation and put their part in place, after which people walk around and see if the lights work or the floor collapses. There is no house test in house building to compare with the system test in system building. There are, instead, many incremental, intensive tests all throughout, especially when something is added that

  • other people will depend on
  • will be invisible (like wires and pipes in walls)

At every stage, the house must be stable. When it may not be, scaffolding is added so that the system of partially completed house plus scaffolding is stable. When the house becomes stable on its own, the scaffolding is taken away. Examples of scaffolding include concrete forms, extra framing, power brought to the site, and portable toilets.

Using the Stability Principle, we see that testing is not a stage, but a part of a control process embedded in everystage. What is often called system test is not a test at all, but another part of system construction, perhaps better named “system integration.” People are reworking errors in previous parts, and building the systems as they do.

Don’t get me wrong, all analogies are suspect, and I’m sure you would run into problems if you probed this one too far, but I liked it nonetheless. Incidentally, he uses “test” in a much broader sense than I normally do, including activities such as code and design reviews in the name.

I like the format of the book: it’s fairly free-form, but he frequently sprinkles in “Phrases to listen for” and “Actions to take”. The phrases in this example:

The following phrases warn a manager that the process of building while using stable phases has been or is about to be violated:

  • Just wait till it’s all done, then you’ll be surprised.
  • We’ll clean that up in system test.
  • The testers will fix that.
  • Of course we don’t have what we need, but get started anyway.
  • They can clean up the design when they write the code.
  • Ship it. The customers will tell us if anything is wrong.

My favorite of the phrases to listen for are those with a parenthetical note saying something like “(Warning: you may be saying this)”, as in this example from a section on fear:

  • You will do this. It’s nonnegotiable. (Listen carefully: This may be coming out of your mouth.)

The point, or at least one point, of the phrases is that people’s actions are often incongruent with their beliefs and/or with stated plans and goals, and that people have a way of making statements designed to lull the listener into not realizing that. So what you should be alert to are frequently statements that are soothing on the surface, instead of statements that are alarming on the surface.

I won’t give the complete list of actions from this example; an excerpt:

DO NOT allow tests to be skipped or postponed to later stages. Whatever is pushed to the end of the cycle will be sacrificed to the schedule.

DO be aware that tests take many forms. …

In general, reasonable practical advice.

finished book queue; rorty

Thursday, June 14th, 2007

Looking back, I had my lean book-buying revelation more than a year ago. As I said at the time, “right now, I have … lots of books to read before I can start buying again”, and while I have hardly sworn off from buying books since then, I have made an effort to read down my stack of unread books.

And now I’m done: there are no books left in the stack of unread books that I’d bought before that time. Well, the main stack: there are still some French-language books from my last trip to Paris, not to mention the complete Pali canon (in a 45-volume edition with elephants on the spines) and maybe a hundred or so other unread books firmly entrenched on my bookshelves. Those latter stacks had already been filed away as “sunk costs” for some time, but the main stack contained books that I really did intend to read.

If I’d been asked to predict at the time what the last book in the stack was that I’d get around to, I probably would have guessed correctly. (I hope that the shock of my finally getting around to reading the book wasn’t the last straw for its author.) Which raises a question: am I ever going to get around to reading the third volume of Rorty’s philosophical papers? If not, is that a good thing or a bad thing? In the old days, I would have tossed it onto my next Amazon order along with 20 or 30 or 40 other books, and gotten around to it eventually. Now, though, I won’t get it until I feel like I really want to read some more Rorty right now. (At which point I’ll probably check it out of the library rather than buy it, actually.)

My guess is that I will read the third volume eventually: I’ve read enough philosophy over my life that I think I’d be a bit sad if I stopped completely, and Rorty’s quite interesting and readable as philosophers go. And I’m enjoying the second volume; it’s not life-changing or anything, but pragmatism is interesting enough to me that I really would like to dig into it somewhat more, because I get the impression that Rorty is saying lots of things that I really do agree with. Also, it’s complicated enough that I’d need to revisit his writings eventually, to test my vague memories with what he actually says, and to see where I should think more.

So that’s half of the question. What about the other half? I think that there, my answer is: if I don’t ever read it, that’s okay. I’m happy and confident enough with how I spend my time these days that I don’t think I need to worry about feeling insecure about not getting enough highbrow culture.

We’ll see how it plays out. While typing this, I’ve been browsing his works at Amazon, and there are a few interesting looking ones there that I didn’t know existed. (And I didn’t know a fourth volume of his philosophical papers was published earlier this year.) So maybe the answer is: I’ll actually read more Rorty sooner rather than later. Still not sure when I’ll get around to the next volume of his papers: it looks like it will take me about three weeks to finish this volume, and I don’t lightly spend that much time on a single book these days. But some of his other books are less dense.

We’ll see how it all plays out: in the meantime, I’m just happy that I have one fewer big queue in my life, and I’m also happy that I’ve stuck with this queue-removal plan for more than a year and it’s turned out well. Though at least one decent-sized queue remains, but that’s the topic for another post…

rejection in person; printf debugging

Wednesday, May 16th, 2007

One of the least pleasant aspects of hiring is rejecting candidates. (More actively unpleasant for them than for me, to be sure.) It’s something which, until recently, I did almost exclusively over e-mail.

Sometimes, rejection over e-mail makes sense. I typically put candidates through up to three stages of filters. (Not counting the initial resume screen.) The first stage is a sanity check over the phone: is there some obvious reason why this situation is a misfit that I didn’t figure out from the resume? Almost everybody makes it through this stage; in the few places where a candidate doesn’t make it through this stage, it’s because it’s clear to both of us that their goals aren’t a fit for this position, so we agree that it’s not a match, and it’s simply not a matter of me rejecting them. (One could argue that I should make my phone interview stricter and reject more candidates at this stage; for better or for worse, I’m not doing that yet, and it’s tangential to the issues in this post.) The third stage is a half-day team interview; in this situation, my team discusses the candidate as a group after the interview is over (usually the next day), so I’m simply not in a position to deliver an immediate verdict to the candidate.

The second stage is the tricky one. This is a one hour in-person interview, where I ask some questions and have the candidate do a bit of programming. Sometimes, after the end of the interview, I’m genuinely not sure whether or not I want to bring the candidate back. More frequently, however, I am sure, and the answer is “no”. I can’t remember having second thoughts about my initial no reaction at this stage, so why not deliver the news then?

I think my actions started to bother me because of some blog posts that I’ve read recently, but I just looked at the usual suspects, and I couldn’t find any relevant posts. Certainly one reason why I’m thinking about this, though, is that I’m in the middle of rereading Gerald Weinberg’s Congruent Action. I can’t claim that I’m acting congruently in this case: I don’t believe that it makes job candidates any happier to have to wait and wonder for a day or two. Nobody likes being rejected, but you might as well get the news as soon as it’s available. The only reason why I’m delaying is to avoid in-person awkwardness; it’s just a sign of lack of courage on my part, with no obvious benefit to anybody.

So today I rejected two candidates in person. And, actually, it turned out rather well: both were disappointed, but both took it well, and both asked for advice about what they could have done better. In retrospect, it seems that I haven’t been saving myself any grief: all I’ve been doing is elimimating a potential learning opportunity for the job candidates. (Which raises the question: what learning opportunities might my actions in this regard be costing me?) Nice to have immediate positive reinforcement like that: I will try to continue to behave this way in the future. (And I should spend some time thinking about how to politely reject people in person.)

To be sure, I don’t claim to have any great pearls of wisdom on what the candidates could have done better. But, in time-hallowed blogger tradition, I won’t let that stop me from sharing my thoughts on the subject. The interesting thing about this case was that both candidates failed for the same reason: neither was completely hopeless, their initial attempts at a solution were both reasonable, but both floundered significantly when debugging. (Learning about people’s debugging skills is actually my goal in the programming question: I’m always a little disappointed when people solve the problem without making a mistake, because I know that, no matter whom I hire, they’ll make programming mistakes with some regularity, and I want to see how they deal with that.)

And, in both cases, it seemed like they didn’t know what they were looking for when debugging, or indeed when testing the program before discovering that debugging is warranted. (I tell them to implement a function and do enough testing to convince themselves that it was correct.) I suspect that both of them could be helped by taking a more scientific approach to debugging.

Sometimes, scientists are just gathering data: poking around, seeing if they see anything interesting. But science really progresses when people are making and testing hypotheses. Make a prediction that is concrete enough to be testable, to be falsifiable, and then test it. If the test results match your prediction, that’s always fun; if not, though, you’ve still learned something concrete, and can use that to make further hypotheses.

And this works for debugging, too. At the very basic level: if you think something is wrong with your code, you should know what you expect to have happened, so you can tell whether or not something really did go wrong! But it helps to make concrete predictions at a more refined level than that: “we know that something went wrong overall because we saw X instead of Y. If the problem isn’t in this part of the code, then expression E should have value V, whereas if it is there, then E should have a different value”. If you do this a few times, you should be able to zero in on the problem quite quickly, much faster than you could have by just looking at the code and hoping for inspiration.

This helped crystallize one thing that I think is wrong with printf debugging. Many people’s first reaction, when confronted with misbehaving code, is to sprinkle printouts throughout the code and hope that enlightenment results. This can be great, if you have specific hypotheses about what the output of those messages should be. I think that many people, though, don’t really have specific hypotheses in mind, just a vague feeling of values that they’re interested in. When this happens, printf debugging doesn’t lead to answers very quickly: the messages give you a lot of data, data that can easily lead you down all sorts of unproductive paths, data that can fool you into thinking that you’re learning something when you don’t really know what that data represents. (I suspect that debugging with a debugger is less likely to lead to this problem, if only because it takes more effort to generate lots of values, so you’re more likely to spend time thinking about what values you want to generate.)

Maybe I’m unfairly characterizing printf debugging, but I will stand by the value of concrete hypotheses when debugging. Which raises the question: if it’s so good when debugging, can we use the same ideas when writing new code, code which we don’t yet have reason to believe is incorrect? The answer is yes, of course: that’s exactly what test-driven development is all about.

groovelily; regret

Monday, May 7th, 2007

I learned about the band GrooveLily from an episode of Next Big Hit. I wasn’t paying too much attention when the song, “No Room In Your Bag”, started: a patter song over a drum backing. But then some chords on the piano came in, the instrumentation started getting richer (electric violin, yay), and I started realizing that I rather liked the lyrics. (Not to mention the singer’s sliding into falsetto.)

Quite a song; it’s stuck in my head since. I hesitate to link to a myspace page, but it seems to be the best place to listen to the song. (There’s also a live version available on the band’s web site, but the instrumentation is worse, so I won’t link to it here.) About marriage, gender structures, jobs, academia (in part), art, kids: all things that are dear to my heart.

And about making choices, choices with serious consequences, yet not being paralyzed by the consequences of those choices. The title of the song comes from the chorus:

You make a choice, you make a call.
You may rise, you may fall.
You will pay for what you get.
You’ve got no room in your bag for regret.

So: you make choices. They have consequences, potentially serious ones, and won’t always turn out the way you expect. But, if they don’t, wishing you’d chosen differently isn’t going to do squat for you.

I’m not sure why this is rattling around in my head so much right now. I don’t want to give my readers the impression that my life was full of bad choices, choices with unpleasant consequences, because that simply isn’t true: I’m quite happy with the way that basically all of my major life choices have turned out. (And I don’t spend time worrying about the minor ones, either!) But the meme does seem to be showing up in my environment a fair amount; one bit I may post about later, but I’m also thinking of a discussion on the XP mailing list about the prime directive for retrospectives.

Some people like the directive, some people don’t, and I’m not sure myself which side I come down on, but we can all agree that certain aspects are positive. It’s not that you don’t look at the effects of your past choices - if you’re not going to do that, then you’re not holding a retrospective! But that doesn’t mean that you should spend your time beating yourself up over bad consequences. (Or beating other people up, which seems to be more the thrust of the prime directive: don’t waste your energy on blame.) Make a good effort to learn what you can from the past, see if you can come up with a strategy to do better in some way next time, and leave it at that.

Anyways, enough on regret. I got GrooveLily’s album Are We There Yet? on the strength of that song. The album doesn’t live up to its promise, but there are several nice moments. One of my favorites, on the first track: the lyric “I’m feeling paranoid” coming up in a context where you think we’re going to get a Freud rhyme, but no: “Hanging like Harold Lloyd”. They seem to be turning to the stage more these days (or maybe they have a long stage history - I should look into their back catalog); I just finished listening to Striking 12, a charming musical based on “The Little Match Girl”. (I was amused by this Anime Music Video based on the song “Screwed-up People Make Great Art” from the album.)

Good stuff; I plan to listen to more of their music.

random links: april 21, 2007

Saturday, April 21st, 2007

random links: april 8, 2007

Sunday, April 8th, 2007

schiphol queues

Friday, March 30th, 2007

For the non-EU flights in Schiphol (at least where we were), they place a metal detector at each gate, instead of having a central bank of metal detectors that everybody goes through. And I can’t figure out why. This seems like the worst possible solution from a queuing theory point of view: you get your best utilization when you spread out the arrival of people into queues, but what they do instead is to artificially delay people from entering the queue and then process everybody all at once. Which means that it takes ages to actually get onto the plane once they start calling rows.

So what’s going on here? Imagine if they took those metal detectors and put them all in a big, shared bank: you’d show up, there would be a hundred or so metal detectors that you’d go through, and you’d never have to wait in line for more than a minute or two, I’d imagine. So what benefit do they get out of their current arrangement? Does it somehow use staff more efficiently? If there is any gain there, I don’t think it’s a huge one, and I’m sure they could improve both time and staffing with a smaller number of shared metal detectors. Is there some benefit in having most of the airport before the metal detectors instead of after metal detectors? Yes, I suppose, since you can actually see your friends and family off at the gate; I’ve gotten so used to not being able to do that that it hardly registered, but I guess that’s a good idea. Is there something about the construction of the airport that makes a shared bank (or multiple shared banks) of metal detectors impractical? Not clear to me one way or another. Something else I’m missing? Or is it just a mistake?

mike cohn on estimating and planning

Monday, March 26th, 2007

Last week, I went to a talk by Mike Cohn on “Agile Estimating and Planning”. Good timing: I’d been thinking that I should get around to reading his book on the subject. Which I won a copy of at the drawing after the talk; apparently my recent remarkable good luck has (correctly) decided that I have enough iPods and should start winning other things instead.

I’d gotten my previous take on the subject from others’ books and from a presentation by Ron Jeffries; back when (a previous incarnation of) my team used to estimate regularly, we followed Ron’s 1-point, 2-point, 3-point idea. (Well, we did until we dropped 3 and added 1/2, but the result is almost the same thing.) Mike Cohn, however, uses much larger numbers: 1/2, 1, 2, 3, 5, 8, 13, 20, 40, 100.

That’s for story points; he recommends estimating tasks within stories in terms of hours. (I can’t remember if Ron talked about estimating tasks at all.) And he made a good point about why you should use artificial points instead of real time units for your story estimations: if you use real time for both, you’ll be tempted to expect, say, the time estimated for the tasks making up a story to add up to the time for the entire story. Which makes sense, except that you estimate a story before you’ve broken it up into tasks (you don’t do the latter until somebody has decided that you’ll work on it), so when you do the task estimation, you’ll have thought much more about what’s involved in implementing the story. And you can’t convert between “hours that you’ve thought a lot about” and “hours that you haven’t thought much about”, which you’d be sorely tempted to do if you use hours in both situations.

Mike came about his expertise on the subject honestly, by the way: he was VP of engineering at a company that had adopted Scrum, and that had a fair number of teams working on not-very-long-lived projects. So teams had to estimate stuff, and he imposed the rule that, each project, you had to do something different when you estimated. There was enough discussion going around that people had an idea of what teams with accurate estimates had done in the past, but the rule meant that they couldn’t just stop and declare victory, they had to keep on trying to find ways to improve. A nice example of evolutionary process improvement.

Anyways, after the talk, I asked him about his versus Ron’s recommended range of estimation values. Part of his answer was that maybe the right thing for somebody working in the trenches is different for the right thing for a VP of engineering. More generally, people are going to ask the team how long it takes to implement a feature that’s larger than a simple story; they need a way to answer that. Which is a good point - I don’t have that clear a view on how Ron recommends estimating features larger than a single story. (I should ask, shouldn’t I?)

There’s still a tension there that I’m not entirely comfortable with. Unless you go with long iterations (and Mike prefers two-week iterations, which already doesn’t seem long enough), I don’t see how you can fit stories that vary anywhere near a hundredfold in length into a single iteration. Now, stories at the extremes (especially the large end) are bad, but still, a 1- to 13-point range (or whatever) seems too wide to me to fit within an iteration. But a story that can’t be done within a single iteration isn’t really a story, is it?

So maybe there are there levels needed: features, stories, tasks. Each with their own (non-convertible, as above) estimations. But that’s too much estimation. Given that, I’d actually be tempted to drop the task estimation instead of the feature estimation: isn’t it kind of pointless to spend time how many hours a task will take? Just implement the damn thing! In the previous incarnation of my team, we did break down stories into tasks (we should get back to doing that, it was useful), but we didn’t estimate individual tasks, and I never felt the lack. Maybe I was missing something, but it still seems funny to me.

Actually, though, it’s entirely possible that we were subtly shifting things by a level (and making them too long, to boot.) Because the truth is that a lot of our stories were technical: we weren’t clever enough (and weren’t working with a Customer representative to give us a nudge) to break work up into small, customer-visible units. So maybe what we called stories were really tasks? I don’t think that’s quite accurate, but there’s enough truth to that to make me nervous; something to think about more.

Since Stuart brought it up (see his blog post on the talk if you want another take), I might as well talk about another question I had. Mike presented some very interesting examples (you can see his slides, by the way) of studies that showed that, when people were given extra, irrelevant information, their estimates for tasks increased. (My favorite example was when group A and group B were given exactly the same text, but in one case on a single piece of paper while in another case spread over seven pieces of paper.) To which I asked: that’s neat, but which estimate is more accurate?

I freely admit that I asked this solely out of methodological purity: even though the studies didn’t give any evidence about the relative accuracy, I know which way I’d bet. (Well, one of the studies sort of did give evidence: they gave three teams the same tasks, but told team A nothing about expectations, team B that the customer hoped it could be done in 500 hours and team C that the customer hoped it could be done in 50 hours. All teams insisted that the hopes had nothing to do with their estimates, but team A ended up with an estimate of 456 hours, team B with 555 hours, and team C with 99 hours! Scary, that: a trap that I fall into all too often myself.)

But, the more I think about it, the less sure I am which team’s estimate is the most accurate. Take, for example, the study where team A was told to estimate requirements 1-4, team B was told to estimate requirements 1-5, and team C was told to estimate requirements 1-4 but were also given the future requirement 5 for purely informational purposes. In this case, A and B both estimated 4 hours (even though B was told to estimate strictly more work than A) while C estimated 8 hours (even though they were told to estimate the same work as A)! Looking at that, I don’t see at all why I should believe that A is the most accurate - they give the same answer as B, which is within the margin of error but clearly odd. What seems more likely to me is that A and B are estimating in terms of “hours we haven’t thought much about” while C is estimating in terms of “hours we’ve thought more about”, which we learned earlier can’t be converted to each other!

Anyways: good talk, a good reminder that we should get back to estimating once matters get a bit more under control, and I ended up with a book and enough sets of his planning poker cards that we can use them in our future team meetings. If you have a chance to hear him, I definitely recommend it.

codification of experience

Friday, February 16th, 2007

Another quote from The Toyota Product Development System (p. 102), in the section on checklists:

A company that cannot standardize will struggle to learn from experience and is not truly engaged in lean thinking. Indeed, any company that simply tries new things without standardizing along the way is “randomly wandering through a maze,” repeating the same errors, relying on little more than undocumented hearsay and a wide range of opinions among its employees only to eventually discover that “it has been here before.”

A little more context:

Though based on science, the real world practice of engineering is an art form that relies on tacit knowledge gained through experience and judgment in considering multiple variables that interact in complex ways. As a result, a best solution cannot necessarily be predicted in advance. It is learned over time through experience and is guided by the spirit of kaizen, which postulates that there is always an opportunity to learn more and that learning is an ongoing process. This spirit of engineering kaizen is driven by the never-ending pursuit of technical excellence that underlies consistent checklist utilization, validation, and improvement.

A company that cannot standardize will struggle to learn from experience and is not truly engaged in lean thinking. Indeed, any company that simply tries new things without standardizing along the way is “randomly wandering through a maze,” repeating the same errors, relying on little more than undocumented hearsay and a wide range of opinions among its employees only to eventually discover that “it has been here before.” Toyota uses a systematic and scientific approach to product development. It tests, evaluates, standardizes, improves, and retests, scrupulously following the Plan-Do-Check-Act cycle that was introduced to the company decades ago by Deming. It then standardizes “today’s” best practice. As it accumulates new information and new experiences, these are used to modify current shared standards and reborn as a future “today’s” best practice.

Go experimentation, both trying it and taking the results seriously. We came to a similar conclusion at work last week; we’ll see how our experiment of experimenting turns out.

(I should read some Deming, shouldn’t I?)

don’t broadcast information

Thursday, February 15th, 2007

A quote from Morgan and Liker’s The Toyota Product Development System:

Toyota does very little “information broadcasting” to the masses. Instead, it is up to the individual engineer to know what he or she is responsible for, to pull what is needed, and to know where to get it.

Here’s the full context (pp. 95-96; italics in original):

Pulling Knowledge Through the [Product Development] System

In lean manufacturing, pull production eliminates overproduction by having downstream activities signal their needs (demand) to upstream activities. Kanban cards usually signal (control) production in a pull system. In product development, knowledge and information are the materials that are required by the downstream activity. The speed at which technology delivers information in automotive product development is overwhelming. However, not all information is equal to all people. The lean [Product Development] System uses “pull” to sort through this mass of data to get the right information to the right engineer at the right time. Knowledge is the fundamental element (material) in product development.

Toyota does very little “information broadcasting” to the masses. Instead, it is up to the individual engineer to know what he or she is responsible for, to pull what is needed, and to know where to get it. Individual engineers are expected to locate and extract needed information, whether this be design data residing in the data collector, a product performance experience, or a perspective from a senior executive. This policy holds true for everyone, from the most junior design and release engineer to the chief engineer. The key underlying principle that makes this work is that everyone has access to both the design data and the [Chief Engineer].

For an example from the opposite end of the program hierarchy, all engineers are responsible for creating benchmarks for their respective components. They are expected to gather relevant information and understand the latest technological developments, industry trends, and supplier and competitor products that affect their designs. Once the execution phase begins, manufacturing engineers pull design data from data collectors as they need it to start working on die or fixture designs. All engineers pull requirements from checklists, which are updated at the end of each program.

The supplier mentioned earlier (a company that had an unacceptable management cycle time) illustrates how the [Product Development] system links processes. This seat maker through value stream mapping identified thaht they were batch dumping information onto the next process (design sent hundreds of drawings to purchasing and ordered hundreds of parts prototyping, to build the hundreds of different variations of prototype seats, etc.) After moving to a staggered release system where a subset of seat designs were released on a preplanned schedule, weekly reviews of progress were set up and the supplier set up status boards at each functional area within the value chain. A key purpose of the status boards (referred to as “pull boards”) was to signal the need for information from other functions. Once the status board was in place, it was easy to spot when key information was needed. When key information was delayed, it was identified within a week rather than months later. The example clearly shows that in a lean [Product Development] process, a key enabler for pull knowledge systems is reducing management cycle time.

At first, I was thinking of “don’t broadcast information” in terms of “I don’t like being lectured at”, which made me happy. But I do like a general low level of chatter among team members - e.g. at our daily standup this morning, two of us were talking about what we did yesterday, and another team member mentioned an old bug report that turned out to be relevant, quite possibly saving us a day of time. And if we hadn’t been broadcasting information, that wouldn’t have happened. Then again, some XP sources suggest that the need for a daily standup is a sign that your process isn’t working well enough, but then yet again that’s because they want information to radiate even more (by everybody working in a big, open room, constantly interacting). But that only works with teams below a certain size; eventually, you have to cut off universal chatter to save your sanity.

When I balance all that, I tend to think that I like broadcasting information within a group of a certain size. And I bet Toyota does, too, I just need to find the right section of the book. I could be wrong, though; it’s certainly something to think about. That’s the problem with reading books like this: I’m missing so much of the context, so many details, so much of the gestalt.

Moving along, we see that individuals are apparently able to pull the information they need at will. Which is certainly a problem that we have: we have various “data collectors” (especially our wiki), but they’re not well-organized, consistently maintained, and up to date.

One interesting thing about Toyota product development is apparently that, on the one hand, they’re quite good at consistently storing information, e.g. about results of experiments (whether positive or negative). But they also manage to do this in a terse, accessible form: I’m pretty sure they save the lab notebooks, but they also put their results on a standardized single piece of paper, the “A3 form”. I haven’t yet gotten to the section of the book where they talk about that in detail, but it sounds like it could be a really useful idea, and if done right a very useful antidote to wiki chaos.

following distances in traffic

Monday, January 15th, 2007

When I mentioned my earlier post about questions I had about driving in traffic, Jordan pointed me at this article that claims that a single driver, by leaving a large amount of open space while entering a traffic jam, can actually (at times) break up the jam. Which is pretty amazing, if true. The author claims that he’s repeatedly seen the effects himself; some experimental verification on my commutes home (or that matter, to work) is clearly in order.

I should also spend some timing digging up animations of this effect. The author in question gives a nice start with this page showing how typical merge patterns are inefficient: both sides look pretty plausible to me, and if you count the cars per flashing arrow, the stop-and-go side has half the throughput of the smoothly flowing side. Of course, that’s somewhat tangential to the claim at hand: it’s nice to know that smoothly flowing traffic leads to higher throughput, but what I’d really like to see is an illustration of his claim that a single car can, at times, create a smoothly flowing merge situation. Lots of links left for me to follow; maybe I’ll find something else fun.

curious about queueing theory

Friday, January 5th, 2007

Now that I’m seeing queues everywhere, I’m getting curious about both the underlying math and the underlying pragmatics. Take a highway, for example: say you want to get the most use out of one. What does that mean? I guess it means maximizing total throughput, or more specifically the car miles driven on the road in some time period. Take the limit as time goes to zero, and the instantaneous version is the sum of the speeds of the cars. Or: the number of cars times the average speed of the cars.

Question 1: Have I missed anything yet? I’m probably near the edge of missing something by doing a measurement at one instant in time: apparently a lot of the fun has to do with the distribution of entry times into queues. Let’s put that in the back of our head for now and continue.

So we’re worried about the number of cars times the average speed of the cars. The second factor sounds easy to deal with: floor it! What about the first factor? We have a fixed amount of road space (another simplification, but one I’m happy with for now); and the number of cars is the road space divided by the distance between (fronts of) cars. (We could separate “distance between fronts of cars” into “length of cars” plus “following distance”; for now, let’s not worry about the length of the cars themselves.) So now we want to floor it while tailgating! (In our Mini Coopers, if we’re worrying about lengths of cars.)

Which I would, actually, rather not do. Nor would all of my fellow drivers, though some don’t seem to mind. In general, the faster we go, the larger a following distance we like to maintain. So the two components are fighting against each other. (Good thing, otherwise this problem would be pretty boring.)

Question 2: What’s the relationship between driving speed and following distance for your average driver?

If your average driver’s brain is concerned with being able to react to events in a fixed amount of time, then following distance would vary linearly with speed. (So the throughput wouldn’t vary with the average speed, once you get dense enough!) If your average driver’s brain is concerned with being able to come to a complete stop, then the following distance actually varies quadratically with speed, so the slower the highway, the higher throughput. (I don’t really believe that, though, and here we do start having to worry about the lengths of the cars themselves.) In either case, I’m sure there are boundary effects. And brains are complicated things and developed in an environment where people normally travel at single-digit miles per hour, so probably neither model is particularly accurate.

Now I really do want to start worrying about variations in the instantaneous behavior: now that my commute has me driving home on 101 at 5:45 pm, I can assure you that the speed of traffic varies from second to second. (Those annoying other people who also want to get onto the highway have something to do with this.) This raises so many questions that I don’t even know where to start:

Question 3: Where should I start when thinking about differences in traffic speed?

Let’s try: what affects variations in traffic speed? When traffic speed varies a lot, is there some sort of pattern in the chaos that ensues, or is every traffic mess different? What effects do variations in traffic speed have on throughput? In particular, what can we do to maximize throughput? What happens when we get close to maximum throughput?

And now we turn to my own behavior. I confess that there are times when I don’t maximize my speed, even when I could do so safely and legally. For example, if I see a slowdown ahead of me, I tend to take my foot off the gas and coast, rather than, say, first maintaining speed until I get nervous and then braking more sharply.

Question 4: Compared to a leadfoot, am I hurting, helping, or neither?

Not clear. I’m not affecting my average velocity: I’m decelerating more gradually than our hypothetical leadfoot, but you get the same total deceleration in either situation. Having said that, my position on the road is never ahead of where the leadfoot would be, so maybe I’m delaying not only myself but an entire column of cars behind me. That would be unfortunate.

On the other hand, while I don’t know what the causes and effects of variation in velocity are, I have a hard time believing that yo-yoing speeds really maximize throughput, or even don’t have a negative effect. So maybe, by providing some modest dampening effect on the system, I’m actually helping throughput? It would be nice, but I wish I could point to a concrete mechanism here, could present a model where my behavior helps instead of hurts.

Lots of questions I don’t understand.

Question 5: Any good books on the subject?

Question 6: Is this traffic situation a good analogy for any aspect of software development, or are the behavior of queues that I run into at work different from the behavior of queues that I run into on the way home from work? (Encounter on my way home, I should say - I try hard to not actually run into the queues on the road, because I’m quite sure that exchanging insurance information would not help throughput in any way.)

podcast queue management

Monday, December 18th, 2006

Sorry for the lack of posts. I might have a post stuck in me, or I might just be getting lazy, or might not be thinking enough; hard to say. Maybe I’ll get unstuck over the holidays. Anyways, I present another banal application of lean to everyday life:

Using my mad queue-management skillz, I’ve finally gotten caught up on my podcast listening: for every podcast that I regularly listen to, I now have either no or less than one week’s worth of episodes of any podcast, and have been in that blessed state for about a month by now. It doesn’t hurt that, since changing to the Menlo Park office, my new commute is a bit longer, especially going home - grr 101 grr - but I was heading in that direction even before the office move. In fact, some days, I don’t have any podcasts stored up to listen to, which leaves me at a bit of a loss. Especially with the holidays coming up, when some podcasters have the nerve to take a bit of time off.

Which means that I should find more podcasts to listen to, right? No! (This is how you can tell that I have mad queue-management skillz.) You see, I know about the virtues of maintaining a bit of slack in the schedule. I’ve reached a level where I can predictably listen to all of my podcasts almost every week - some episodes might stick around for two weeks, if a bunch of podcasts that publish on irregular averaging-to-about-once-a-month schedules all happen to arrive at about the same time, but it doesn’t get any worse than that. But I’m not too far away from my listening capacity; and, once I edge up to that capacity, my response time will go through the roof. I’m at that state on my magazine subscriptions, and it isn’t pretty - who knows when I’ll get around to reading those saved up NYRSF issues? And bye bye Granta subscription. So I don’t want that to happen with podcasts, and for all I know the next podcast could turn out to be one too many.

To be completely honest, that wouldn’t be the end of the world - I could occasionally delete an episode without listening to it, if my queue got bad. But that’s just not the way my psychology works: I’m a completist at heart. Also, I like the podcasts I’m listening to now, and I don’t want to delete any episodes of them. And they’re a nice mix: a little more than half music, of various sorts, but also several interesting non-music ones. At the very least, now that I’ve gotten my queue well under control, it’s time to re-evaluate the situation, and see if adding more podcasts is the best thing to do with the gaps that are opening up in my listening schedule.

And I’m pretty sure it’s not. If we think of this in terms of competing queues, then I’ve gotten the highest-priority queue in this area under control. But doing so makes me aware of two other queues that I can now consider dealing with in the same area. Namely:

  • Listening to music that isn’t from podcasts. The Naxos classical music podcast is an incredibly effective advertising tool: something like a third of the episodes make me want to go out and buy the album in question. I’ve gotten other interesting music suggestions from other podcasts, too, and I wouldn’t mind going back and listening to some of my CD collection again.
  • JapanesePod101. You see, I lied above when I said that I’d caught up on all of my podcasts: I’ve caught up on all but one, but on that one I’m 9 months behind. Which wouldn’t be too bad if they published once a month, but since they publish every day, I have my work cut out for me. I am consistently managing to not fall further behind on it now, but I would like to eat into the backlog. And it’s a really good podcast, so worth spending some effort on. I am a bit worried about burning out, but I have enough experience with (effectively) force-feeding myself knowledge in the past that I think I’ll be able to catch the warning signs before things get too bad.

So: one queue under control, two other queues revealed. All good fun, no? In fact, I think I’ll attack one of them right now by ordering a CD to listen to. (Update: no, I’ll break my lean vow and order two CDs. But one’s an EP, and they’re not available from Amazon, so it’s a bit easier for me to batch the order.)

If only queues at work were getting under control so well. Actually, several are, but there are two that are bedeviling me right now. I hope that we’ll make some progress on those during January…

exploratory testing

Friday, December 1st, 2006

The Poppendiecks’ latest book gives an interesting analysis of types of testing. (Taken originally from Brian Marick’s blog.) They propose that you divide testing up in two different ways: on the one hand, you can classify tests as either intended to support programming or to critique the product. On the other hand, you can classify tests as either intended to be business facing or technology facing.

This gives you four quadrants. Tests that are technology facing and supporting programming are unit tests. As my loyal readers know, those are the best thing ever, so I won’t go into details.

Tests that are business facing and supporting programming are acceptance tests, or story tests. It took me a little while longer to appreciate these - I looked at tests initially largely through defect prevention goggles, and surely there couldn’t be any bugs left after my unit tests? Well, actually, it turns out that there could be: there are (many) fewer than if I hadn’t been doing pervasive unit testing, but many fewer than a lot is some, not none. Some of those defects are due to legacy code issues, but by no means all. And it’s not like I have a magic wand to get rid of legacy code, anyways.

In both cases, tests have more virtues than just preventing defects. They establish a contract, for one thing. In the unit test case, it might be a contract between programmers, or it just might be a contract between a single programmer’s fingers and the part of the programmer’s brain that cares about things working properly, but it’s a contract either way. In the story test case, it’s ideally a contract between programmers and business types; I still haven’t reached that world (it’s probably the area at work where we’re least agile), alas, but at the least it’s a contract between code and an imagined outsider. And they promote communication (between programmers, between programmers and business, between a programmer and the same programmer years or months or weeks later). And they promote design. In both cases, they’re automated, to make it as easy as possible for the programmer to run as many tests as possible.

Which is all great: better code, fewer defects, shorter debugging cycles, on and on. With all of that goodness, what more could you want?

Quite a bit, it turns out. There are people who say that it’s okay to have a testing department going through manual tests of your product: programmers have a conflict of interest which prevents them from seriously scrutinizing their code, so the only remedy is to have an army of testers to click through your interface to make sure it all works. Those people are wrong on a bunch of levels: for one thing, clicking through interfaces takes forever; for another thing, the programmers are the only people who know the corner cases; for a third thing, programmers aren’t so irresponsible as this suggests; for a fourth thing, the ways having a fast, comprehensive test suite improves your programming are so varied and positive that you’d be crazy to give it up for a slow external test cycle. It is true that having extra eyes doesn’t hurt; that’s why we would like to bring in business types to help with the acceptance tests, that’s why we pair program and have collective code ownership. Surely all that is good enough?

Well, no: even with a good set of acceptance tests, you’ll still find problems the first time you plop your product in front of a user poking around. A lot of that (at least in my case) can be chalked up to inadequate acceptance testing and inadequate business involvement in test design; still, if you’re like me, it takes a while to learn how to do good acceptance tests, and you’re probably dealing with legacy code which didn’t have proper acceptance tests to start with, and you need some way to learn where your acceptance testing skills need improvements. Playing with the product is a great way to do that.

Which brings us to the business facing / critique product quadrant: exploratory testing. (And useability testing.) People just poking around with your product, seeing what it does, pushing areas that might be limits. Not following a script: if you can script a test, you should work hard to automate it, to help support programming. (And if you find a defect during your exploratory testing, please do automate what you just did, so programmers can learn!) Just trying to look at the product with users’ eyes, seeing how it feels.

Like the earlier categories of tests, exploratory tests have virtues beyond finding defects. Even if you aren’t inserting defects into your code, you may have specification errors: your design may not work as well as you’d hoped when confronted with users. Or, for that matter, you may be playing around with various designs, trying to decide which is best. Or you may just need to communicate to somebody else in a visceral way what your product really does.

At work, we’d been slacking off on exploratory testing until recently: we were very engineering-focused, and the few people on the business side were too busy selling our product to have much time to play around with it. We’re doing better now (learning from our experiences), but we still have a ways to go.

So now I’m happy with three of the quadrants, though I still have a lot to learn. Which strongly suggests that my next revelation will be on the virtue of the fourth quadrant: property testing, from a technology facing point of view. These are perfomance testing, security testing, combinatorial error testing. Actually, maybe I got that revelation a year or so ago: we’d been doing performance testing for a while, which was all well and good and helped us catch a few performance regressions. But what was really eye-opening was when we started inserting random errors (deterministically, starting from a seed which changed every night but which allowed us to rerun the tests if problems arose) into the input of one component of the problem, which did a lovely job of uncovering defects. Again, if we’d been doing better in our unit testing, we wouldn’t have inserted the defects in the first place, but we’re not perfect, and we need ways to learn how to improve our testing skills.

We still have room for improvement on this front, though. We should write random error tests for more components. Our load tests take too long to calibrate, so we haven’t always kept them up to date as we use faster hardware.

A useful analysis; I wish I’d seen it a couple of years ago. (But, if I had, I probably wouldn’t have been able to appreciate it.) I like how it divides up the virtues of a traditional testing group: some of those virtues can better be gotten in other ways, indeed maybe all of them can. But the virtues are real and varied, so there are several kinds of blind spots you should work to avoid.

random links: november 21, 2006

Tuesday, November 21st, 2006

I should really catch up on my blogging; in the mean time, some random links:

response time

Sunday, November 5th, 2006

One thing that’s been bothering me at work recently: our response time to bugs is absurdly slow. Even bugs that are marked as high priority take a while to get worked on; bugs that aren’t marked as high priority may well never get worked on.

Now, some of this is a classification issue: maybe a bug was incorrectly marked as high priority, and there are a lot of bugs open that shouldn’t be there in the first place. But a lot of the bugs that are open really do need to get fixed, and, as the product gets deployed more, there will be times in the future when we’ll run into bugs that really need to get fixed quickly. We’re capable of doing that now (we showed that during a recent trial, for example), but, even so, shouldn’t we spend more time practicing responding quickly, to make sure those skills don’t atrophy?

So, I think, we should rethink our bug prioritization system: we should make sure that high priority really means high priority (i.e. somebody starts work on it immediately), and we should also make sure that there’s a meaningful definition of medium priority (maybe it doesn’t get fixed this week, but it should get fixed this month). That would be a good first start.

Once we’ve gotten that under control, though, and are disciplined enough that high priority bugs are a rare event, quickly solved, we should try to react quickly to a larger class of bugs. After all, from a value stream perspective, the time spent waiting before we start fixing a bug is pure waste. If we’re not sure whether or not it’s valuable to fix a bug, then fine: we should wait until we have more clarity. But if we are going to fix a bug, what are we gaining by waiting? Why not just fix it immediately? We’re not saving work overall by delaying the fix: all we’re doing by waiting is building more debt that we’ll either have to pay off before the next release (if we fix it in time) or after the next release (if it makes it out into the wild). Neither of those is productive in general.

Of course, there are more levels to this problem: in particular, we shouldn’t be inserting defects into our code in the first place. (We’re getting better at that, fortunately.) And we don’t want to use Bugzilla as a substitute for our product backlog: there should be some control over how features get scheduled for implementation. And it can be hard to maintain a steady implementation pace if you’re getting constantly interrupted by bug work. There are solutions to all these problems, however. (E.g. for the latter, a two-part strategy of not writing defects in the first place and allocating slack in your schedule in the second place.) For now, from my point of view, our most urgent issues are (first) reducing the defect backlog and (second) improving response time.

Fortunately, other people agree. Some of my team members have been nagging me for months (years?) to make sure we don’t work on features at the expense of bug fixing, and my boss is more concerned right now with making sure we don’t have problems in the wild than with getting more features added to the product. And metrics have improved over the last week: if we really devote effort to fixing bugs, they do go away. But we have needed to devote the effort: maintaining a constant low-level hum wasn’t good enough, we needed a medium- to high-level hum.

The real test will be once we’ve worked bugs down to an acceptable level. We’ve built up technical debt; once we’ve paid that off, will we switch to a high level of productivity without building up new debt, or will we backslide? I’m optimistic that we’ll do the former: we’re getting pretty sensitized to bugs, and we’re aware of the problems that bugs cause to our normal development activities. On a basic level, they mean we have to waste time each morning investigating red bars on acceptance tests; on a slightly less basic level, the presence of those nondeterministic red bars means that, if you’ve implemented a new feature, it’s hard to be confident that you haven’t made mistakes, because you don’t have a completely clear good/bad signal from the tests.

And once we get bugs down to an acceptable level (zero, say), we can try to still leave some (not all, just some) of the former bug-fixing time in our schedule as slack. Now that I’m convinced (or at least strongly suspect) that slack is a good idea, I want to give it a try, but not without some external signal to let us know when we should stop slacking off! And the presence of bugs sounds like a great external signal to me.

Fun stuff, all this.

lean thinking, shared purpose

Monday, October 16th, 2006

I just finished Lean Thinking; it’s my current favorite lean book. One thing that made me jealous: they give several (to me) convincing examples of companies wanting to try out lean, and that brought in some people who really knew how lean worked. After doing what those people said, they immediately got some fairly impressive improvement. But they then managed to improve on this continually over the next few years.

Which, among other things, serves as a counterpoint to some thoughts I’ve been having, and that seem in the air in general. (See, for example, Martin Fowler on agile imposition.) It’s been clear to me for a while that my team’s agile adoption would have been vastly improved by bringing in an outside expert some time ago. (It would probably still be vastly improved by that.) But, among other things, doing so would be tantamount to my saying “we’re going to do XP, plain and simple”. People may hear me as saying that already, but I don’t intend to be saying that. What I would like to be saying is “here are some things that are really important to me” (high quality standards, sharing knowledge, responding quickly) and “I haven’t heard of anything that sounds as effective as XP to reach that goal”.

So one aspect is that I’m jealous of people who have built up the support where bringing in an outsider to show them what to do is effective. But another aspect is that I’m also jealous of people who have concrete touchstones that they can use to continually approve. This is something that, perhaps, XP isn’t so helpful at. The concreteness of the practices can be very useful if you need something specific to try. But they have a finality about them that (to me) makes it hard to use them as guideposts for continuous improvement. For example, we don’t pair program all the time. I’m willing to believe that we’d be more effective if we did, but I don’t have any great way to convince people (even to convince myself) that doing so would be a good idea, and taking it on faith will only go so far. In a current thread on the XP mailing list, Ron Jeffries proposed telling people to find a way to “deliver working software monthly”; a lot of people are willing to believe that that’s a noble goal, but getting from there to XP is a pretty big step.

So what are intermediate goals that can help you see ways to continually improve? (Through which you might end up at XP or might end up somewhere else; if we can continually improve, I really don’t care about anything else.) Here, I think lean manufacturing has a leg up on agile software development: they have goals at a similar level to agile goals (single piece flow or just in time / pull are probably at a similar level to incremental development and no bugs), but I get the feeling that they have more ways to see flaws and to translate those flaws into concrete improvements. (Categories of waste, or stop the line when you see a bug, combined with root cause analysis, for example.)

Or maybe agile is just as good at that; it could (in all honesty, I’m not being facetious) just be my lack of knowledge combined with my lack of skills in the appropriate areas.

Something to work on.