One of my biggest surprises when learning Haskell has been how my typical test-driven development steps fail: it’s easy to write a couple of tests and get them to pass gracelessly, but surprisingly quickly I run into a test that I can’t get to pass without actually being smart, forcing me to make a leap that’s uncomfortably large from my previous position.

I ran into a situation like that last night, and I decided to try to take that big step apart; here’s what I ended up with. The problem in question was exercise 3 on page 84: it told me to print out the first word of each line of its input. My first two tests were as follows:

firstWordsTests =
    TestList["empty" ~: "" @=? firstWords "",
             "one line" ~: "first\n" @=? firstWords "first words\n"]

which I got passing with this implementation:

firstWords "" = ""
firstWords line = head (words line) ++ "\n"

The third test was where I ran into trouble, though:

firstWordsTests =
    TestList["empty" ~: "" @=? firstWords "",
             "one line" ~: "first\n" @=? firstWords "first words\n",
             "two lines" ~: "one\ntwo\n" @=?
                         firstWords "one line\ntwo lines\n"]]

How can I write a pattern which will match the third case? Not at all clear to me, but at least it points out the direction I’m going in: I want to add lines to the test cases until some sort of looping construct falls out. Given that, let’s try to refactor against one red bar in a way that brings out the decomposition into lines that’s latent here.

Fortunately, Haskell has a function lines that transforms a string into a list of the lines making up that string; let’s rewrite firstWords to use it. The smallest step that I managed to come up with to do so was this:

firstWords input = concatFirstWordsOfLineArray (lines input) where
    concatFirstWordsOfLineArray [] = ""
    concatFirstWordsOfLineArray [line] = head (words line) ++ "\n"

Which is a larger step than I’m comfortable with, but at least it’s small enough conceptually that I don’t feel like I’ve leapt into the unknown. I’m not sure what to call this sort of transformation—maybe “Insert Intermediate List”?

(Incidentally, the observant reader will note that this transformation isn’t a refactoring: it preserves the red and green bars, but the nature of the red bar goes from an unexpected result to an exception being thrown. That’s okay with me; what I’m doing is still useful as an implementation pattern. Hmm, maybe I should try writing these transformations up in an Alexandrian style?)

And, after that step, it’s now obvious how to get my test to pass:

firstWords input = concatFirstWordsOfLineArray (lines input) where
    concatFirstWordsOfLineArray [] = ""
    concatFirstWordsOfLineArray [line] = head (words line) ++ "\n"
    concatFirstWordsOfLineArray (line1 : line2 : []) =
        (head (words line1) ++ "\n") ++ (head (words line2) ++ "\n")

At this point, we have some obvious code duplication; Extract Method turns it into

firstWords input = concatFirstWordsOfLineArray (lines input) where
    concatFirstWordsOfLineArray [] = ""
    concatFirstWordsOfLineArray [line] = firstWordLine line
    concatFirstWordsOfLineArray (line1 : line2 : []) =
        firstWordLine line1 ++ firstWordLine line2
    firstWordLine line = head (words line) ++ "\n"

At which point it’s pretty obvious that we’re doing something quite similar to all elements on the list, so we want to transform this into a map plus a subsequent operation. And, in fact, the subsequent operation is concatenating all of the list elements together; keeping that in mind, we do an Insert Intermediate List on the third case, giving us:

firstWords input = concatFirstWordsOfLineArray (lines input) where
    concatFirstWordsOfLineArray [] = ""
    concatFirstWordsOfLineArray [line] = firstWordLine line
    concatFirstWordsOfLineArray (line1 : line2 : []) =
        concat [firstWordLine line1, firstWordLine line2]
    firstWordLine line = head (words line) ++ "\n"

That makes the structure clear for the third branch, and also makes it obvious that we can write the first two branches the same way:

firstWords input = concatFirstWordsOfLineArray (lines input) where
    concatFirstWordsOfLineArray [] = concat []
    concatFirstWordsOfLineArray [line] = concat [firstWordLine line]
    concatFirstWordsOfLineArray (line1 : line2 : []) =
        concat [firstWordLine line1, firstWordLine line2]
    firstWordLine line = head (words line) ++ "\n"

So now we have our map operation:

firstWords input = concatFirstWordsOfLineArray (lines input) where
    concatFirstWordsOfLineArray lines = concat (map firstWordLine lines)
    firstWordLine line = head (words line) ++ "\n"

(These last three steps seem like they should all go together: Extract Identical List Operation?)

Now the code is looking nice (and, in addition, would pass more tests should we choose to write them), and the challenge turns towards expressing it as tersely and clearly as possible. First, Replace Unaltered Parameter with Composition to get:

firstWords = concatFirstWordsOfLineArray . lines where
    concatFirstWordsOfLineArray lines = concat (map firstWordLine lines)
    firstWordLine line = head (words line) ++ "\n"

Do it again (throwing a bit of currying into the mix):

firstWords = concatFirstWordsOfLineArray . lines where
    concatFirstWordsOfLineArray = concat . (map firstWordLine)
    firstWordLine line = head (words line) ++ "\n"

And, by now, concatFirstWordsOfLineArray clearly isn’t pulling its weight, so we Inline Method:

firstWords = (concat . (map firstWordLine) . lines) where
    firstWordLine line = head (words line) ++ "\n"

I still want to make this shorter, which in Haskell land frequently seems to mean using Replace Unaltered Parameter with Composition; to that end, we rewrite the latter definition as

firstWords = concat . (map firstWordLine) . lines where
    firstWordLine line = (++ "\n") . (head (words line))

That lets us turn it into:

firstWords = concat . (map firstWordLine) . lines where
    firstWordLine = (++ "\n") . head . words

at which point we can Inline Method again, and get:

firstWords = concat . (map ((++ "\n") . head . words)) . lines

Which is short, but you have to think a little bit as to what it means; what I’d like to do is find a way to get the unlines function in there, which is a function that takes a list of strings and concatenates them with newlines between. The next step in that direction is to realize that map distributes over function composition; so we Distribute Map, giving us

firstWords = concat . (map (++ "\n")) . (map (head . words)) . lines

and, indeed, the first half of that is exactly unlines:

firstWords = unlines . (map (head . words)) . lines

Phew! The code is now about as terse as I can think of while passing those three tests, and it passes several other new tests that I might think of to boot. And, as a bonus, this version is clearer than any of its predecessors: we split the input into a list of lines (lines), then we grab the first word out of each of those lines (map (head . words)), then we smoosh all of those lines back together (unlines). Though, as Bryan pointed out to me, I forgot to write one test (which I’ll leave as an exercise for the reader), but getting the code to pass that last test was a lot easier in this form than it would have been in forms further up the blog page. (Bryan also had some suggestions for how I might use QuickCheck instead of HUnit to test this, which I hope to be able to follow up over the coming months.)

If this sounds interesting (or if it sounds bizarre but if the idea of learning Haskell sounds interesting despite my peculiar approach), it’s not too late to join the reading group: none of us are moving at a very fast pace, so it shouldn’t be much trouble for a newcomer to catch up.

Post Revisions:

This post has not been revised since publication.