I’m now generating my HTML pages based on the data in the SQL database. I’m still first assembling all the data in memory using Java method calls, and writing that out to an SQL database, but then I throw away the in-memory copy and regenerate it from the database.

I can’t really say that that step is done, though. There are a couple of things wrong:

  1. I’m generating the various indices (e.g. the list of books by a given author, or the list of recently read books) in a very bad way: basically, I read all the data from the database into memory and generate my own lists out of Java collections, instead of using SQL queries. Given that I’m generating all the web pages statically at once, this is fine from an efficiency point of view, but it’s lousy from a “learn about SQL” point of view.
  2. The separation of responsibilities isn’t nearly as clean as I’d like: in particular, I’m using proxy objects which get lazy-initialized from the database (which is fine, though I’m thinking of getting rid of that eventually), but the code gets confused at various points between the proxy and the object that the proxy creates underneath it.

Don’t get me wrong: it works, which is good. And to make sure it works, I even ran the one acceptance test that really matters, namely generating all 344 web pages corresponding to the current list of books/games/authors/etc., and checking that none of them changed betweend the old and new ways of generating them. I was pretty sure that my unit and acceptance tests were good enough that there wouldn’t be any bugs remaining; I was wrong. To be specific, I was generating links for volumes in a series incorrectly in index pages (point 1 above), where the bug was masked in the tests by point 2 above. Sigh, but at least it means that I have reasonable instincts for bad spots in the code.

Fortunately, attacking the first item will greatly ease the second, and I have a pretty good idea how some of this will work. The one problematic aspect is exactly how I’ll generate the indices, in a way that is nice SQL but still easily testable without finding an in-memory SQL database. Take, for example, the list of books by an author: if it were as simple as select * from books where author_id = AUTHOR; then that would be easy to test. But that doesn’t work for two reasons:

  • If a book is by Fred and George, then I want it to be mentioned in Fred’s page, George’s page, and Fred and George’s page. Which means I have to do that query for various values of AUTHOR.
  • If a book is a volume in a series, then its author_id field is NULL: the value is taken from the series’s author_id field.

So the right SQL query to use is reasonably complicated; and I certainly don’t want my testable SQL-like interface to handle that level of complexity. I think I’ll probably add some sort of ‘indexing hook’ that can generate a more complicated SQL query in the SQL version and evaluate custom Java code in the test version; not my favorite solution, but I can’t think of any other ideas offhand.

I’ll do that next. (Which will take a while, I think.) And then I’ll write a CLI tool to add stuff to the SQL database and modify it. At that point, the SQL transition will be more or less complete, and I’ll be able to get back to making changes that actually affect the HTML I’m generating.

Which I devoutly want to do, but this time hasn’t been wasted. For one thing, I’m learning about SQL, which was after all the goal of the whole exercise. For another thing, it’s very interesting watching the changes that the code undergoes as a result of this annealing, and trying to guess how the code will want to change next. I get the feeling that, when I’m done with the SQL transition, I’ll have the same number of lines of code as when I started, but 75% of them will be different and the structure will be somewhat improved. We’ll see.

Incidentally, while I was at it, I finally got around to installing viewvc (a pity there isn’t a released version with Subversion support, but the daily snapshot I grabbed seems to work fine), and tweaking my acceptance test framework to, among other things, print the time that each test takes.  By far the longest test is the one that connects to Amazon web sites in the US, UK, and France, to verify that my links work.

Post Revisions:

There are no revisions for this post.