generating html output

One decision that I had to make when doing the HTML output part of my book database: should I roll my own HTML generator, or use somebody else’s? I ended up going the ‘roll my own’ route, partly because it sounded like more fun, and partly because it would be easier to get the acceptance tests working.

As written, the acceptance tests do a strict textual comparison, and it seemed unlikely that it would be easy to find another library that would generate code indented exactly the way I want. Admittedly, that’s a sign that the acceptance tests are overly strict, so the right thing to do would be to find a way to relax that validation, and in fact I think I already have code for that around. (I use it when validating that my output is legal XHTML.) But that combined with laziness and a desire for fun was enough to sway me.

And it has been fun! I started out with one unit test that checks for the output of a page for an author who hasn’t written any books. The easiest way to get that to work was to hardcode the expected output. So, at this point, my implementation was a function that spit out a really long string.

And then it was time to start refactoring. A single long string is hard to work with, so I broke it up into separate functions for the different parts of the page. Which, actually, I haven’t done much with; it will help me in the future, though. But next came a series of more immediately useful refactorings:

I had the output function generate an array of lines instead of a single multiline string.
I noticed that the lines in the arrays almost all started with whitespace, so I added another argument which is an amount of whitespace to add to all the lines in the array.
The indentation changes in predictable ways: so, rather than pass in “indent 6” here and “indent 8” later on, I had the HtmlOutput class have a member variable with the current indentation, and provided open and close member functions which added or subctracted 2 from the indentation level.
The adding and subtracting happen in conjunction with text, so let’s pass that text in as an argument to open and close
The opening and closing text consists of opening and closing tags: so let’s keep a stack of the elements that are in the current scope, and have close generate the closing tag automatically. (And provide a way to pass in attributes to the opening tag.)
Having to explicitly type open/close pairs violates my RIAA instincts; in Ruby land, that means that we should just have an element functions which generates the tags itself, and which takes a block as an argument to fill in the middle.
But what about element with no body, where the opening tag is the closing tag? No problem: I just won’t pass in a block there, and element can alter its behavior based on block_given?.

That’s where I am now. The next step is to handle elements that I put in the middle of a line (<cite>, <a>, etc.); I think I have a scheme that will work for that, but we’ll see where the refactorings lead me.

I’ve never programmed this way before, refactoring a class into existence based solely on a complicated chunk of expected output. I highly recommend the experience; it’s lots of fun, and has a rather unusual flavor. I’m being good and adding unit tests for all the methods I create; the thing is, though, that each method seems to last for about half an hour before it and its unit test get refactored out of existence, replaced by the next refactoring! For the longest, time, the HtmlOutput tests consisted of two tests, one of which was the result of the previous step of my refactoring and the other of which was the next step in my refactoring, which I was in the process of converting the existing AuthorPrinter object (the user of HtmlOutput) into using. Recently, though, more tests have been coming into existence, which I hope is a sign that I’m settling into a more useable and powerful interface.

My only regret was that I did most of this refactoring without an internet connection, so I couldn’t check all of the intermediate steps into my Subversion repository, and get a good view of the differences between steps. All well; maybe I’ll switch over to a more distributed version control system for my next project.

Post Revisions:

There are no revisions for this post.

Published 7/20/2007 & Filed in Programming

xml, html output »
« jason kendall

4 Comments

Comments closed

Comment by John Cowan

Hacking, they say, is debugging a blank screen until it does what you want. Most Agile concepts are reinventions/descriptions of practices that hackers (not crackers) already knew about, but hadn’t articulated. See ESR’s paper Hacking and Refactoring.

7/20/2007 @ 10:34 pm
Comment by david carlton

Thanks for the reference! I’d missed that one somehow.

7/21/2007 @ 3:47 am
Pingback from malvasia bianca » Blog Archive » xml, html output

[…] malvasia bianca « generating html output […]

7/21/2007 @ 7:05 pm
Comment by Brian Carlton

I had the occasion two write two generators in the last couple of weeks, one for VHDL, one for a shell script. It is so nice for a large number of signals (VHDL) or tedious repetition (sh). And the VHDL one couldn’t have been done from a configuration file.

4/4/2008 @ 11:58 am

malvasia bianca