small steps in haskell

One of my biggest surprises when learning Haskell has been how my typical test-driven development steps fail: it’s easy to write a couple of tests and get them to pass gracelessly, but surprisingly quickly I run into a test that I can’t get to pass without actually being smart, forcing me to make a leap that’s uncomfortably large from my previous position.

I ran into a situation like that last night, and I decided to try to take that big step apart; here’s what I ended up with. The problem in question was exercise 3 on page 84: it told me to print out the first word of each line of its input. My first two tests were as follows:


firstWordsTests =
    TestList["empty" ~: "" @=? firstWords "",
             "one line" ~: "first\n" @=? firstWords "first words\n"]

which I got passing with this implementation:


firstWords "" = ""
firstWords line = head (words line) ++ "\n"

The third test was where I ran into trouble, though:


firstWordsTests =
    TestList["empty" ~: "" @=? firstWords "",
             "one line" ~: "first\n" @=? firstWords "first words\n",
             "two lines" ~: "one\ntwo\n" @=?
                         firstWords "one line\ntwo lines\n"]]

How can I write a pattern which will match the third case? Not at all clear to me, but at least it points out the direction I’m going in: I want to add lines to the test cases until some sort of looping construct falls out. Given that, let’s try to refactor against one red bar in a way that brings out the decomposition into lines that’s latent here.

Fortunately, Haskell has a function lines that transforms a string into a list of the lines making up that string; let’s rewrite firstWords to use it. The smallest step that I managed to come up with to do so was this:


firstWords input = concatFirstWordsOfLineArray (lines input) where
    concatFirstWordsOfLineArray [] = ""
    concatFirstWordsOfLineArray [line] = head (words line) ++ "\n"

Which is a larger step than I’m comfortable with, but at least it’s small enough conceptually that I don’t feel like I’ve leapt into the unknown. I’m not sure what to call this sort of transformation—maybe “Insert Intermediate List”?

(Incidentally, the observant reader will note that this transformation isn’t a refactoring: it preserves the red and green bars, but the nature of the red bar goes from an unexpected result to an exception being thrown. That’s okay with me; what I’m doing is still useful as an implementation pattern. Hmm, maybe I should try writing these transformations up in an Alexandrian style?)

And, after that step, it’s now obvious how to get my test to pass:


firstWords input = concatFirstWordsOfLineArray (lines input) where
    concatFirstWordsOfLineArray [] = ""
    concatFirstWordsOfLineArray [line] = head (words line) ++ "\n"
    concatFirstWordsOfLineArray (line1 : line2 : []) =
        (head (words line1) ++ "\n") ++ (head (words line2) ++ "\n")

At this point, we have some obvious code duplication; Extract Method turns it into


firstWords input = concatFirstWordsOfLineArray (lines input) where
    concatFirstWordsOfLineArray [] = ""
    concatFirstWordsOfLineArray [line] = firstWordLine line
    concatFirstWordsOfLineArray (line1 : line2 : []) =
        firstWordLine line1 ++ firstWordLine line2
    firstWordLine line = head (words line) ++ "\n"

At which point it’s pretty obvious that we’re doing something quite similar to all elements on the list, so we want to transform this into a map plus a subsequent operation. And, in fact, the subsequent operation is concatenating all of the list elements together; keeping that in mind, we do an Insert Intermediate List on the third case, giving us:


firstWords input = concatFirstWordsOfLineArray (lines input) where
    concatFirstWordsOfLineArray [] = ""
    concatFirstWordsOfLineArray [line] = firstWordLine line
    concatFirstWordsOfLineArray (line1 : line2 : []) =
        concat [firstWordLine line1, firstWordLine line2]
    firstWordLine line = head (words line) ++ "\n"

That makes the structure clear for the third branch, and also makes it obvious that we can write the first two branches the same way:


firstWords input = concatFirstWordsOfLineArray (lines input) where
    concatFirstWordsOfLineArray [] = concat []
    concatFirstWordsOfLineArray [line] = concat [firstWordLine line]
    concatFirstWordsOfLineArray (line1 : line2 : []) =
        concat [firstWordLine line1, firstWordLine line2]
    firstWordLine line = head (words line) ++ "\n"

So now we have our map operation:


firstWords input = concatFirstWordsOfLineArray (lines input) where
    concatFirstWordsOfLineArray lines = concat (map firstWordLine lines)
    firstWordLine line = head (words line) ++ "\n"

(These last three steps seem like they should all go together: Extract Identical List Operation?)

Now the code is looking nice (and, in addition, would pass more tests should we choose to write them), and the challenge turns towards expressing it as tersely and clearly as possible. First, Replace Unaltered Parameter with Composition to get:


firstWords = concatFirstWordsOfLineArray . lines where
    concatFirstWordsOfLineArray lines = concat (map firstWordLine lines)
    firstWordLine line = head (words line) ++ "\n"

Do it again (throwing a bit of currying into the mix):


firstWords = concatFirstWordsOfLineArray . lines where
    concatFirstWordsOfLineArray = concat . (map firstWordLine)
    firstWordLine line = head (words line) ++ "\n"

And, by now, concatFirstWordsOfLineArray clearly isn’t pulling its weight, so we Inline Method:


firstWords = (concat . (map firstWordLine) . lines) where
    firstWordLine line = head (words line) ++ "\n"

I still want to make this shorter, which in Haskell land frequently seems to mean using Replace Unaltered Parameter with Composition; to that end, we rewrite the latter definition as


firstWords = concat . (map firstWordLine) . lines where
    firstWordLine line = (++ "\n") . (head (words line))

That lets us turn it into:


firstWords = concat . (map firstWordLine) . lines where
    firstWordLine = (++ "\n") . head . words

at which point we can Inline Method again, and get:


firstWords = concat . (map ((++ "\n") . head . words)) . lines

Which is short, but you have to think a little bit as to what it means; what I’d like to do is find a way to get the unlines function in there, which is a function that takes a list of strings and concatenates them with newlines between. The next step in that direction is to realize that map distributes over function composition; so we Distribute Map, giving us


firstWords = concat . (map (++ "\n")) . (map (head . words)) . lines

and, indeed, the first half of that is exactly unlines:


firstWords = unlines . (map (head . words)) . lines

Phew! The code is now about as terse as I can think of while passing those three tests, and it passes several other new tests that I might think of to boot. And, as a bonus, this version is clearer than any of its predecessors: we split the input into a list of lines (lines), then we grab the first word out of each of those lines (map (head . words)), then we smoosh all of those lines back together (unlines). Though, as Bryan pointed out to me, I forgot to write one test (which I’ll leave as an exercise for the reader), but getting the code to pass that last test was a lot easier in this form than it would have been in forms further up the blog page. (Bryan also had some suggestions for how I might use QuickCheck instead of HUnit to test this, which I hope to be able to follow up over the coming months.)

If this sounds interesting (or if it sounds bizarre but if the idea of learning Haskell sounds interesting despite my peculiar approach), it’s not too late to join the reading group: none of us are moving at a very fast pace, so it shouldn’t be much trouble for a newcomer to catch up.

Post Revisions:

This post has not been revised since publication.

Published 12/7/2009 & Filed in Programming

rock band 2 »
« random links: december 6, 2009

2 Comments

Comments closed

Comment by dfan

What’s really cool about Haskell is how it changes the way your mind works. I am not at all an experienced Haskell programmer, but I would dive directly into your final solution, as that is clearly the operation being performed on the input (in fact, that line of code is basically a literal translation to Haskell of the problem statement).

It is cool how one gradually becomes more comfortable with writing recursive functions “by hand” and then graduates to writing folds and maps immediately. After a while you will automatically skip over those explicit concatFirstWordsOfLineArray steps without thinking.

This does make me wonder if test-first development is less suited to Haskell than to more imperative languages. It seems like the procedure of arriving at a correct solution by gradually transforming failing solutions isn’t as appropriate when the final solution is going to be a one-line jewel instead of a paragraph of imperative code. The long transformation to one-liner you outlined here is instructive but not one that an experienced Haskell programmer would travel. I have even less experience with test-first development than with Haskell, though, so who knows.

12/8/2009 @ 6:17 am
Comment by David Carlton

Right, this example is a total retcon: I pretty much dived directly into my final solution, too, and when doing the next exercise, I started directly from the structure “unlines . transposeWithPadding . lines” and tried to figure out what transposeWithPadding looked like, instead of pretending to discover the lines/unlines part.

The more serious part is this, though: when going through Haskell exercises, I’ve had an uncomfortable number of times where I’ve depended on a flash of insight, and also an uncomfortable number of times where I wrote down something that seemed reasonable, had it prove incorrect, fiddled with it a few times, and had one version that looked a lot like its predecessors randomly work. I certainly don’t want to discount the value of being either smart or randomly lucky as a programming strategy, but I’d like to have a larger toolbox available for me.

One analogy here is with imperative programming: an experienced imperative programmer will bang out a for loop in the same way that an experienced functional programmer will use map. But, every once in a while, there’s a situation where the exact looping structure isn’t obvious; I’ve learned that, whenever that happens, I should write out an unrolled version of the loop that works for enough special cases that it becomes obvious what the underlying loop should look like. I’d like to have a variant of that technique that works in the functional case; some of the steps in the first half of this example are an attempt to figure out what that might feel like.

The second half of this example is different – that’s more taking the code and shaking it back and forth in different ways until I find the way of grouping it that’s clearest. The part I liked the most there was noticing that map distributes over function composition – I’m sure I’ll use that in the future (probably in both directions), because it helps make clear what all my different grouping options are so I can choose the clearest one.

I am willing to believe that TDD and Haskell aren’t a good fit, however. That would be interesting if true (and also unfortunate if I can’t find a technique that works in Haskell that has the virtues that TDD has); right now, my working hypothesis is that TDD still works, but that the details of the refactorings / implementation patterns that are an important part of TDD are significantly different for Haskell than for other languages that I’m used to. (Even at a micro level, as here – I’m sure the macro-level transformations look even more different.) That’s only a working hypothesis, though, it could easily be incorrect.

12/8/2009 @ 10:09 am

malvasia bianca