Archive for the ‘dbcdb’ Category

updating web pages dynamically

Monday, December 24th, 2007

I’ve now written my first AJAX code: if you go to a random web page in my book/game database, you should be presented with a list of blog posts that refer to that item. At least assuming that I haven’t accidentally used functionality that your favorite browser doesn’t support, which I hear is easy to do with JavaScript; fortunately, Internet Explorer seems to be the most likely candidate, and my CSS is already broken there, so I should be safe enough. (I’ve only tested under Firefox and Safari 3.)

It was fun and not too hard, all things considered. I didn’t have any prior JavaScript experience, but I figured that googling would quickly turn up instructions for how to do what I wanted. Which didn’t seem to be the case in the first 15 minutes or so of searching, but I remembered getting the idea from an example in the REST book, so I looked that up, googled some of the more specific language constructs I had questions with, and had something working in another hour or so.

Aside from stupid mistakes, the main initial difficulty that I ran into was in the guards against cross-site scripting vulnerabilities: my blog has its own domain, while my database is in bactrian.org. (And I was doing my initial prototyping on my home computer, which is a different physical machine.) The easiest way to get around that seemed to be to set up a proxy (or rather, two of them, one on my home computer and one on bactrian.org); a bit of mod_proxy configuration later and my prototype worked on my home computer.

I copied the prototype over to bactrian.org, and updated paths; it stopped working, again giving me a frustrating error message related to cross-site scripting. I couldn’t figure out what was going on, and spent a quite frustrating hour or so alternating between googling for help and trying to install and run a JavaScript debugger. (For some reason, the Venkman package in Ubuntu didn’t work for me.) Eventually, though, I remembered one more path that I would need to translate when moving the prototype; I changed it, and the prototype worked in the deployment environment.

After which it was a simple matter of programming to make the change and update the tests. Most of the work was in the latter: I have the web page skeleton abstracted fairly well out of the tests, but even so I have to modify a few different places if I change the output in a way that affects all pages. And I ran into one more road bump along the way, where inserting some white space in the HTML turned out to require me to change my JavaScript, but I figured that out pretty quickly.

So here we are. I’m really pleased with the results: there’s a big difference between being invited to click on a link to search for related blog posts (which may not even exist) and having a list of posts appear in front of you. This is the last change that I plan to make to the database for the time being (well, I might do a bit of trivial tweaking); a good change to go out on.

Random JavaScript-related thoughts:

  • Based on my limited experience, JavaScript is pleasant enough. I wasn’t impressed with its collections, but other than that the language behaved in the way I wanted, and it was quite easy to search through XML and pop some data into view.
  • I’m still mostly at sea if something goes wrong with my JavaScript. The console gives some basic help, but there were a couple of instances where I ran into a more serious problem and either wished that I could get more information out of the error or poke around data structures or something. Probably the debugger would have helped, if I’d gotten farther with using it.
  • I don’t have any acceptance tests for this, which makes me sad. It’s little enough JavaScript code that I’m happy to skip unit tests for it, but I really would like to be able to push a button and have reasonable confidence that I haven’t broken anything. (Especially since the functionality depends both on my JavaScript code and on WordPress’s behavior, so I’m going to have to manually test this every time I upgrade WordPress.) Some people on the XP list suggested some tools (Selenium in particular); maybe I’ll give that a try at some point.

finished converting dbcdb to ruby

Sunday, October 28th, 2007

I’ve finally finished converting dbcdb from Java to Ruby. I’ve been using the Ruby version of the tool to write the database for about four months, but I’d still been using the Java version to write the web pages.

Nothing too deep going on here; I was actually done with everything but the indexes as of the middle of September, but I hadn’t gotten around to generating the indexes until this weekend. (Or do people prefer that I spell it ‘indices’?) We’ve been busy with some extra event every single weekend for about the last two months; combining that with wanting to learn Japanese, working through Metroid and Picross, and occasionally working on the game with Miranda means that, unless I’m feeling extraordinarily disciplined, dbcdb falls by the wayside. But we had nothing planned this weekend, so I seized the opportunity.

The new code is a little more than half as long; the acceptance tests also run almost twice as fast. (All that JVM startup takes time, I guess? I don’t think there are significant algorithmic performance variations in the two versions.) Go Ruby, though I’m sure it would be very easy to find situations where the performance goes the other way. Both generate the exact same output, as manifested by running the same acceptance tests on both versions and on, ultimately, doing a diff -r on both outputs from the current live database contents.

What next? There are some cosmetic tweaks I may or may not get around to making; I’m not feeling any urgency on that score right now. I had planned to next convert this from generating static web pages offline to generating them dynamically via mod_ruby; now I’m feeling distinctly less interested in that idea. (Partly because the REST book reminded me of some of the benefits of static web pages, ironically.) I still want to experiment with that at some point, but now I’m thinking I’ll just do that by coming up with a Rails project instead of doing everything from scratch.

So it looks like it might be time to declare this a success and move on. And it has been a success, no question: I’ve brushed up on my Java a bit, dabbled with SQL, learned Ruby, and basically enjoyed myself. So, from a purely didactic standpoint, I’m quite happy.

There is one thing that I’m not happy with, though. I’d originally envisioned the generated web pages as actually being useful in that they’d provide an index into my blog posts: they would give an easy way for people to find all the web pages where I write about a given game, say. And they do provide an index, but it’s not as easy as I’d like: people have to click on the link to the database and then click from there to a search link, and that’s expecting quite a bit from my readers. (Especially since there’s honestly nothing of particular interest on the database web page itself.)

So I’d like to remove one of those links, to compress it down to one level. In this AJAX-aware world, the mechanisms for doing that are pretty well-trodden: write some JavaScript to do the query in the background, and then stick the results in the database web page. And, in fact, it turns out that WordPress can generate an RSS feed of query results, so I don’t have to worry about page scraping and having details change as I upgrade my WordPress installation. (Which I should really do one of these days - I’m still on 2.0…)

One last task, then. Which is made a bit harder than it would otherwise be by the fact that I don’t know how to write JavaScript, I’m not familiar with the DOM model (if indeed that’s the right term to use), and I don’t know how to acceptance test AJAX. But I’m not particularly worried about either of these: like I said, this is a well-trodden path, so it shouldn’t be very hard to find examples that do pretty much exactly what I want to do.

xml, html output

Saturday, July 21st, 2007

My HTML output class is now at what I expect to be a reasonably stable state. It’s not by any means a perfect solution for the world’s HTML needs, but it can generate the output that I want without much excess typing, which is all that matters.

Actually, it divided into two classes this morning. First, XmlOutput:

  class XmlOutput
    def initialize(io)
      @io = io
      @indentation = 0
      @elements = []
    end

    def element(*element_and_attributes)
      if (block_given?)
        open_element(element_and_attributes)
        yield(self)
        close_element
      else
        write_indented_element(element_and_attributes)
      end
    end

    def inline_element(*element_and_attributes)
      "<#{element_and_attributes.join(" ")}>" +
        yield +
        "</#{element_and_attributes[0]}>"
    end

    def line
      if (block_given?)
        indent
        @io.write(yield)
      end

      @io.write("\n")
    end

    # FIXME (2007-07-21, carlton): Can I use define_method to
    # construct a method taking a block?
    def self.define_element(element, *attributes)
      module_eval element_def("element", element, attributes)
    end

    def self.define_inline_element(element, *attributes)
      module_eval element_def("inline_element", element, attributes)
    end

    def self.element_def(method, element, attributes)
      %Q{def #{element}(#{attr_args(attributes)} &block)
           #{method}("#{element}", #{attr_vals(attributes)} &block)
         end}
    end

    def self.attr_args(attributes)
      attributes.map { |attribute| attribute.to_s + "_arg, " }
    end

    def self.attr_vals(attributes)
      attributes.map do |attribute|
        '"' + attribute.to_s + '=\\"#{' + attribute.to_s + '_arg}\\"", '
      end
    end

    def write_indented_element(element_and_attributes)
      line { "<#{element_and_attributes.join(" ")} />" }
    end

    def open_element(element_and_attributes)
      line { "<#{element_and_attributes.join(" ")}>" }
      @indentation += 2
      @elements.push(element_and_attributes[0])
    end

    def close_element
      element = @elements.pop
      @indentation -= 2
      line { "</#{element}>" }
    end

    def indent
      @io.write(" " * @indentation)
    end
  end

I’ve given up on the whole public/protected/private distinction, for now: I don’t see much point in it for programming that I’m doing by myself. But I suppose it does have uses when explaining code to others: if you were to use the class directly, then you’d use element, inline_element, and line. The former is for an XML element that you deem important enough to put the opening and closing tags on their own lines (perhaps head and body for HTML); inline_element is for XML elements that you want to stick in the middle of lines (perhaps cite and a for HTML). And line is for text that you’re inserting, either passed as a string or generated via inline_element. They all take blocks, to either fill in the middle of the elements or the lines; two of them do something useful if not given a block, and the third could easily enough if I need that functionality. Oh, and the element functions have a crappy way of specifying attributes.

Which works well enough, but still requires more typing (in my case, manifesting itself as > 80 column lines) than would be ideal. Which is where the class functions define_element and define_inline_element goes in. Here’s HtmlOutput:

  class HtmlOutput < XmlOutput
    define_inline_element :a, :href

    define_inline_element :span, :class

    define_inline_element :li
    alias_method :inline_li, :li

    define_inline_element :title

    define_inline_element :h1
    define_inline_element :h2

    define_element :head
    define_element :body

    define_element :div, :id

    define_element :ul, :class
    alias_method :ul_class, :ul
    define_element :ul

    define_element :li

    define_element :link, :rel, :type, :href

    def html(&block)
      line { "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"" }
      line { "  \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">" }
      element("html", "xmlns=\"http://www.w3.org/1999/xhtml\"",
              "xml:lang=\"en\"", "lang=\"en\"", &block)
    end
  end

This lets me create methods corresponding to the elements that I care about. If those elements take attributes (as in <a href=...>, I pass them as extra arguments (define_inline_element :a, :href), and the generated methods take arguments that are the values for the attributes. So, if I want to generate the following:

  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
      <title>The Title</title>
      <link rel="stylesheet" type="text/css" href="styles.css" />
    </head>

    <body>
      <h1>Main Header</h1>
      <ul>
        <li><a href="http://site/page/">link text</a></li>
      </ul>
    </body>
  </html>

  o.html do
    o.head do
      o.line { o.title { "The Title" } }
      o.link("stylesheet", "text/css", "styles.css")
    end

    o.line

    o.body do
      o.line { o.h1 { "Main Header" } }
      o.ul do
        o.line do
          o.inline_li do
            o.a("http://site/page/") { "link text" }
          end
        end
      end
    end
  end

Admittedly, this isn’t the eighth wonder of the world or anything, but I do think the interface will work pretty well for the specific uses that I have in mind. Or maybe not - I read the relevant chapter in the Pickaxe book this morning; they describe a library with an interface basically identical to what I ended up with, but then comment that people almost never use it, typically preferring to use some sort of HTML template with embedded Ruby instead. And maybe I’ll switch to a solution like that as I get more used to the area.

However that turns out, there are two bits that I want to talk about. One is what I discussed in my previous post, that it was a lot of fun starting with a complex bit of output and refactoring my way into a class that generated it. I won’t yet propose that as the way to go in all situations, and I’m not even sure it actively helped me here: if I’d started out wanting to build up a solution from scratch instead of decompose one out of a monolithic print statement, I don’t see any reason to believe it would have turned out differently or gone any slower. But it was a very pleasant way to develop code, I’m confident it didn’t slow me down at all, and I only spent about 10 minutes of development time wondering what was the best thing to do next. If nothing else, it will give me further motivation to write my acceptance tests early: currently, I have them in mind from the start of a task, but I don’t usually actually write them until the code that they’re testing is finished. That delay isn’t usually for any good reason, it’s simply because I don’t yet like writing acceptance tests as much as I like doing other things, but if I can start to see real effects out of writing the acceptance tests earlier, I’d probably switch to doing so. (It would help if I started using Fit, too; for now, though, I’m not convinced I’m working in areas where that is an obvious win.)

The second bit I want to emphasize is that I love the way the definition of HtmlOutput looks. This is the second time in this project that I’ve done something like that: there’s a base class that implements class functions designed to let you provide functionality in a subclass without writing explicit method definitions in that subclass! Much more fun than sticking in protected hooks here and there, and when it works the subclass definitions are dramatically shorter (and freer of boilerplate repetition) than they would be if I were, say, programming in Java. As the FIXME comment shows, I’m not entirely comfortable with the implementation in this particular case, and now that I think about it, I’m not entirely comfortable with my implementation in the other case as well, but the fact that I can do it at all pleases me greatly.

So: I can generate one particular piece of HTML. Now I just have to have that HTML vary based on the contents of a database. Shouldn’t be too hard; I hope I’ll find a few more ways in which the implementation improves upon its Java counterpart.

generating html output

Friday, July 20th, 2007

One decision that I had to make when doing the HTML output part of my book database: should I roll my own HTML generator, or use somebody else’s? I ended up going the ‘roll my own’ route, partly because it sounded like more fun, and partly because it would be easier to get the acceptance tests working.

As written, the acceptance tests do a strict textual comparison, and it seemed unlikely that it would be easy to find another library that would generate code indented exactly the way I want. Admittedly, that’s a sign that the acceptance tests are overly strict, so the right thing to do would be to find a way to relax that validation, and in fact I think I already have code for that around. (I use it when validating that my output is legal XHTML.) But that combined with laziness and a desire for fun was enough to sway me.

And it has been fun! I started out with one unit test that checks for the output of a page for an author who hasn’t written any books. The easiest way to get that to work was to hardcode the expected output. So, at this point, my implementation was a function that spit out a really long string.

And then it was time to start refactoring. A single long string is hard to work with, so I broke it up into separate functions for the different parts of the page. Which, actually, I haven’t done much with; it will help me in the future, though. But next came a series of more immediately useful refactorings:

  • I had the output function generate an array of lines instead of a single multiline string.
  • I noticed that the lines in the arrays almost all started with whitespace, so I added another argument which is an amount of whitespace to add to all the lines in the array.
  • The indentation changes in predictable ways: so, rather than pass in “indent 6″ here and “indent 8″ later on, I had the HtmlOutput class have a member variable with the current indentation, and provided open and close member functions which added or subctracted 2 from the indentation level.
  • The adding and subtracting happen in conjunction with text, so let’s pass that text in as an argument to open and close
  • The opening and closing text consists of opening and closing tags: so let’s keep a stack of the elements that are in the current scope, and have close generate the closing tag automatically. (And provide a way to pass in attributes to the opening tag.)
  • Having to explicitly type open/close pairs violates my RIAA instincts; in Ruby land, that means that we should just have an element functions which generates the tags itself, and which takes a block as an argument to fill in the middle.
  • But what about element with no body, where the opening tag is the closing tag? No problem: I just won’t pass in a block there, and element can alter its behavior based on block_given?.

That’s where I am now. The next step is to handle elements that I put in the middle of a line (<cite>, <a>, etc.); I think I have a scheme that will work for that, but we’ll see where the refactorings lead me.

I’ve never programmed this way before, refactoring a class into existence based solely on a complicated chunk of expected output. I highly recommend the experience; it’s lots of fun, and has a rather unusual flavor. I’m being good and adding unit tests for all the methods I create; the thing is, though, that each method seems to last for about half an hour before it and its unit test get refactored out of existence, replaced by the next refactoring! For the longest, time, the HtmlOutput tests consisted of two tests, one of which was the result of the previous step of my refactoring and the other of which was the next step in my refactoring, which I was in the process of converting the existing AuthorPrinter object (the user of HtmlOutput) into using. Recently, though, more tests have been coming into existence, which I hope is a sign that I’m settling into a more useable and powerful interface.

My only regret was that I did most of this refactoring without an internet connection, so I couldn’t check all of the intermediate steps into my Subversion repository, and get a good view of the differences between steps. All well; maybe I’ll switch over to a more distributed version control system for my next project.

array.join

Saturday, June 30th, 2007

I was missing Array.join:

class Array
  def process_and_interpose(initial, middle, last)
    initial + (map { |i| yield i }).join(middle) + last
  end
end

switched over to ruby version of the cli tool

Saturday, June 30th, 2007

I’ve switched over to using the Ruby version of the CLI tool for editing my book database; works great, as far as I can tell.

Short, too:

panini$ wc -l *.rb
    9 author_writer.rb
   18 book_writer.rb
   11 closeable.rb
   24 compound_author_writer.rb
   21 connected_database.rb
   30 connected_insert_row.rb
   24 connected_result.rb
   36 connected_result_row.rb
   37 connected_table.rb
   26 connected_write_row.rb
   60 date.rb
   21 decoder.rb
    9 developer_writer.rb
   85 editor.rb
   17 enumerable_helper.rb
   16 game_writer.rb
   23 link_writer.rb
   38 object_name.rb
   45 row.rb
   11 series_writer.rb
    9 system_writer.rb
   16 table.rb
  100 writer.rb
  686 total

(That’s only the production code; the unit tests add another 941 lines.) Hard to believe how long it’s taken to write, given the number of lines of code; I guess that’s what happens when you only work for an hour or two a week, don’t do that every week, are using a new language, and are working with a technology (SQL) that you’re not completely comfortable with. I hope the “generating HTML” part will go faster; I don’t see why not, since I should be able mitigate all of those problems except for “only work for an hour or two a week”.

I did the refactorings I had in mind after last time, and went and reread all the code looking for more. I found a few more areas for improvement, but in general I’m happy with how clean it’s been staying. I should write a tool to calculate lengths of methods: I’m curious what the proportion of one-line methods is.

go refactoring!

Sunday, June 24th, 2007

In our last installment, we had this code:

  def parenthesized_list(array)
    array.process_and_interpose("(", ",", ")") { |element| yield element }
  end

  class Array
    def process_and_interpose(initial, middle, last)
      inject_with_index(initial) do |memo, element, i|
        memo + yield(element) + (i != length - 1 ? middle : last)
      end
    end
  end

I’d extracted the latter method not because I thought I was likely to need it, but because I thought the original implementation of parenthesized_list was insufficiently evocative.

But then today I was finishing off the Ruby version of my CLI tool, so I needed to update entries in existing rows in SQL tables, instead of just adding new rows. And the syntax is different: instead of

INSERT INTO people (id, name, age) VALUES ('256', 'Fred', '25');

the syntax is

UPDATE people SET name = 'George', age = '36' WHERE id = '256';

Which seems like a rather gratuitous difference to me, though I admittedly don’t know SQL nearly well enough to know if there’s a good reason for it.

No parenthesized lists in sight, but that’s okay: my newly extracted process_and_interpose function does great!

  def update_string
    "UPDATE `#{@table.name}` SET #{assignments} WHERE `id` = #{id}"
  end

  def assignments
    @updates.to_a.process_and_interpose("", ",", "") do |assignment|
      "`#{assignment[0]}` = #{quote(assignment[1])}"
    end
  end

The Ruby version of the CLI tool seems to work fine now, incidentally. I haven’t flipped the switch yet and started using it for real, but as far as I can tell there’s no reason not to: it passes all the acceptance tests. (And they run faster than they did under Java; no idea why, but I’m pleasantly surprised.) There’s a bit of refactoring to do, and at some point I might want to think about what the implementation is telling me about my class hierarchies (or, indeed, about the differing importance of class hierarchies in dynamic and static languages), but all in all I’m quite happy.

parenthesized_list revisited

Saturday, June 23rd, 2007

I previously lamented this code:

  def parenthesized_list(array)
    list = "("
    first = false

    array.each do |element|
      if (first)
        list += ","
      else
        first = true
      end

      list += yield element
    end

    list + ")"
  end

I still haven’t found a magic bullet in Enumerable or Array which will let me dramatically shrink it. But I have at least teased out some of the components; this is what I’m using for now:

  def parenthesized_list(array)
    array.process_and_interpose("(", ",", ")") { |element| yield element }
  end

  class Array
    def process_and_interpose(initial, middle, last)
      inject_with_index(initial) do |memo, element, i|
        memo + yield(element) + (i != length - 1 ? middle : last)
      end
    end
  end

  module Enumerable
    def inject_with_index(initial)
      result = initial

      each_with_index { |element, i| result = yield(result, element, i) }

      result
    end
  end

inject_with_index doesn’t seem like a crazy idea; process_and_interpose is a bit specialized, but that’s fine.

Is there some way I can shrink the implementation of inject_with_index? I get the feeling that there’s some sort of generalization staring at me there, but I can’t quite figure it out. If I’m just shrinking code, I could keep on storing in initial instead of introducing a new variable result; I’d want to rename the variable, though. Maybe this?

  module Enumerable
    def inject_with_index(memo)
      each_with_index { |element, i| memo = yield(memo, element, i) }

      memo
    end
  end

I don’t think I like that so much, though: naming the (non-block) argument memo instead of initial it makes it harder to figure out how it gets used. So I kind of prefer calling the argument initial at the start, and then renaming it to result in the body to reflect the implementation.

And of course parenthesized_list is funny in that it just wants to pass along the block that it’s been given, but has to create a new block to do that. That, I think, reflects one of Ruby’s warts: there’s this weird block/procedure distinction that doesn’t, as far as I can tell, buy you much. It’s nice to be able to write blocks on the fly, but why not require functions taking one to make the block argument explicit and get rid of yield? I’m not sure of all the implications, but I don’t think that Ruby’s current choice is the best.

ruby talking to mysql

Saturday, June 16th, 2007

My current programming project at home is to port my dbcdb code from Java to Ruby. So far, I’m working on porting over the CLI tool, which lets me update the database to add books that I’m reading, update information about them, etc.

Until today, I’d been using a fake database abstraction that I made up; today, I started plugging in the real MySQL stuff. Looking at my svn commit history, I see it took me an hour and 20 minutes to get the first bits working with MySQL, which I think is pretty good given my vast ignorance of SQL. It would have been faster if I’d had an interface to work with that was closer to the JDBC interface, because I’m a little familiar with the latter (and in particular it had affected my fake database abstraction), while I have to look up the syntax every time that I have to write raw SQL to add rows / modify rows in a table. I had a particularly fun 15 minutes where I was getting an SQL syntax error stemming from the fact that I’d used “order”, which is a reserved word in SQL, as one of my column names. Eventually I noticed that I’d enclosed the column name in backticks elsewhere, at which point that mystery got resolved.

I had plans to unit test my SQL layer, using an in-memory database, but I couldn’t find a convenient way to do that, so I ended up leaving it without unit tests. There’s a good set of acceptance tests, so I’m not particularly worried about things breaking; for now, typing things in by hand is working fine in getting me to a state where I can run the acceptance tests. The problem with running the acceptance tests right now is that they’re mostly an all-or-nothing thing; I decided to implement the SQL glue necessary to add entries before implementing the SQL glue necessary to modify entries, and unfortunately that’s all jumbled together in the acceptance tests.

It was really a lot of fun: once I had things to a state where I was ready to write a command-line script, I spent maybe 15 minutes correcting stupid syntax problem after stupid msyntax problem istake just to get as far as issuing the first SQL command, but when I got that far, magically all sorts of things just worked, and I could go over to the mysql command line and see the data just sitting there in the table! Way cool.

I suppose I might as well share a bit of code. The string for inserting data into a table is this:

    def insert_string
      "INSERT into `#{@name}` #{@row.fields} VALUES #{@row.values};"
    end

Here are the definitions of the fields and values methods on the row class:

    def fields
      parenthesized_list(@values.keys) { |key| "`#{key.to_s}`" }
    end

    def values
      parenthesized_list(@values.values) do |value|
        "'#{@connection.quote(value.to_s)}'"
      end
    end

Which is nice and pretty. The definition of parenthesized_list, though, I’m not so thrilled about:


    def parenthesized_list(array)
      list = "("
      first = false

      array.each do |element|
        if (first)
          list += ","
        else
          first = true
        end

        list += yield element
      end

      list + ")"
    end

I looked through the Array and Enumerable interfaces, but I didn’t see any way to really improve that. Which seems odd - am I missing something? If not, I should do some refactoring: it wouldn’t surprise me if that’s the longest method in my code base right now. (When the body of a Ruby method gets above 3 lines, I usually start to get nervous…)

I’m really excited about this. I’d been putting this step off for a while, and it was by far the biggest unknown in my current work. So for me to have made a concrete step towards the SQL integration in a single not particularly long programming session was a very pleasant surprise indeed. If I can find time tomorrow (which I may or may not be able to do - we’re going to a performance of H.M.S. Pinafore), I should be able to finish off the CLI tool: the remaining step should be a little smaller than this one.

After which I’ll have to start thinking about other part of the project, namely the part that generates web pages. I suppose one big decision is whether to roll my own XML creation library or to use an existing one. The former sounds a little more fun, and will probably make it easier to generate output that matches my acceptance tests, but I certainly want to stop reinventing the wheel at some point.

i love ruby

Saturday, June 2nd, 2007

Recent non-work programming projects: I’ve been getting back to working on dbcdb, converting the database editing part from Java to Ruby. And, last Tuesday, BayXP had a hands-on session where we all did some pair programming getting us exposed to Behavior-Driven Development in Ruby. (See the RSpec web page.)

I don’t have much to say about the details of this (BDD in particular I can take or leave), but I really enjoy programming in Ruby. I missed the previous month’s BayXP meeting, where they did BDD in Java, but I can’t imagine that the code they came up with was as simple as what we came up with this month. (Even setting aside the fact that Ruby makes implementing BDD easier by letting you add methods to Object.)

And the more I program in Ruby, the more I wonder how I could have been so blind to the fact that internal iterators are so much better than external ones. Or, in general, that I forgot how wonderful creating functions on the fly could be; while I still maintain that destructors are a fine thing, I don’t feel their lack in Ruby the same way I do in Java because I can write functions like

    def perform_and_close
      begin
	yield self
      ensure
	close
      end
    end

which I then use as in

    def edit(id, db)
      db[table_name].perform_and_close do |table|
	write_row(table.find_id(id)[0])
      end
    end

(And yes, you can put the body on one line, with slightly different syntax: I only wrote it that way to avoid making the lines too long.) Which really isn’t any more typing than creating and using an RAII class in C++. (For this example, at least; I don’t have enough experience yet to make general statements.)

I need to get back to programming on the weekends. I’ve done it twice in a row, which is a start; a streak worth continuing.

ruby notes 5: sql libraries

Monday, January 1st, 2007

One of the things I need to do in Ruby is read and update data stored in an existing SQL database. Not wanting to reinvent the wheel, I thought I’d look at existing libraries that provide this functionality. The pickaxe book didn’t give anything useful, but I saved some posts in a newsgroup thread on the topic a couple of months ago, and looked through them.

The libraries that the thread referenced largely fell into two categories. Some of them, like Ruby MySQL, provide quite low-level access to a database. There are classes to wrap some basic constructs (result sets in particular), but you’re pretty much grubbing with the SQL code and data directly.

Other libraries were polar opposites. The most extreme there was Active Record, from Ruby on Rails. In that library, not only is everything abstracted away into Ruby objects, but the Ruby objects themselves actually drive the database structure.

Active Record sounds great to me: everything is defined in one place, and what could be nicer than having refactorings in or changes to your code propagate themselves to your database layer? It is famously known as “opinionated programming”, meaning that its author has ideas about how to structure your database, and if you don’t agree with those ideas, the library isn’t for you. I have no problem with that: I approve of prioritizing clean code over unneeded generality, and I have no reason to believe that I would disagree with the author’s opinions, were I better informed.

Having said that, I decided not to go with Active Record for now. For one thing, I have an existing database schema that I don’t want to modify right now. I don’t know for sure if Active Record would get along with it well, but I suspect that it wouldn’t, and I’m afraid that it would be easy to screw up my database (which has other users that I want to preserve for now) by going with Active Record. Also, I’m doing this for fun and for didactic reasons; reusing others’ code isn’t essential for either of those, and it wasn’t clear to me that going with Active Record would further those goals in this particular case. (I am keeping my ears out for new programming projects that would give me an excuse to use Rails, because I do want to learn about that at some point.)

I think that the level of abstraction that I really want is something more along the lines of Java’s JDBC. To be sure, there were many things about JDBC that annoyed me, but at least it was database-independent and had a few useful abstractions built on top of the bottom level. (Using updatable result sets to add new rows and modify existing rows, for example, which I would have to do by hand-crafted SQL code with the Ruby MySQL libraries, as far as I can tell.) I wouldn’t mind something a little more abstract than that, but I still have more to learn about SQL, and I’m in the mood to not have too much going on behind my back, so this level of abstraction seems appropriate.

Maybe there’s a Ruby library out there that can do what I want at that level of abstraction, but I didn’t see one. And, the more I thought about it, the more it seemed like fun to build a level of abstraction like that (narrowly tailored to my needs, of course) on top of a low level library. It seemed like the sort of project that might help me get my hands dirty with aspects of Ruby that I wouldn’t learn about by, say, implementing algorithms, and getting my hands dirty with SQL occasionally wouldn’t hurt, either. So I think I’m going with rolling my own.

One thing that bothers me, though, is being tied to MySQL by my choice of low-level library. That just seems wrong from a philosophical point of view; from a practical point of view, it could interfere with testing. One problem I’ve had when writing code interfacing with SQL is that it largely restricts me to acceptance tests instead of unit tests for certain kinds of code, because running tests takes so long. (Seconds instead of milliseconds.) The obvious solution to that would be to connect to a faster database (hopefully a purely in-memory one) for purposes of unit testing. So it would help a lot if there were some intermediate layer that gave me database-independence. I guess I should look into ODBC to provide that intermediary? And maybe SQLite can be the fast database that I need? I’ll do some research.

In the mean time, I’ve started writing some client code for my hypothetical SQL abstraction layer, to give me an idea about what sort of interface I’d like.

ruby notes 3

Saturday, December 30th, 2006

[I suspect I'll be writing a fair amount about Ruby, and am too lazy to come up with clever names. And I don't want to rename old posts, so I'm retroactively declaring this to be Ruby notes 1 and this to be Ruby notes 2.]

I just learned about creating arrays of strings using %w{}. Handy, that. I mean, it’s only a few keystrokes, but why not save keystrokes?

I also got around to running my tests with warnings turned on. The warnings suggested that I add parentheses in one place, which I was happy to do. The other thing the warnings suggested, though, annoyed me. As I mentioned yesterday, I wrote a mixin module which cached a value in an instance variable. When I turned on warnings, though, it complained about my referencing an uninitialized instance variable. I assumed (and still assume) that testing an uninitialized variable against nil is safe, but I like having my code warning-free. So what am I to do?

What I ended up doing is not caching that value: it was an untested optimization to begin with, and once I took the time to benchmark it (and a few variants which the interpreter was happier with), it proved to be useless. (At least it wasn’t a pessimization…) So now my code is a bit cleaner, and I was wrong to stick in that optimization.

Having said that, there are other situations in which such an optimization would be the correct thing to do. And this does point at something I’m not thrilled with about Ruby: inheritance and initializing state. For one thing, the whole inheritance picture is a bit muddled: mixin modules have their uses, but I’m not at all convinced that plain old inheritance isn’t a better idea. (Having said that, I also suspect that mixin modules enables useful hacks that can’t be done with some variety of multiple inheritance, I’m just not sure what those hacks are.) Setting that aside, in either case the superclass (or module) can’t set up its state appropriately. In the module case, it’s hard to set up the state at all, as I’ve learned; in the superclass case, constructors (a.k.a. initialize) don’t automatically chain. And while I’m willing to trade off destructors for garbage collection, not having proper constructors bugs me.

Oh well; something would be wrong with me if I couldn’t quickly find something to be annoyed by in any programming language.

Another thought that my first optimization suggests: I could probably write a generic attr_cached function that would turn any function that always returns the same value (for a given instance) into one which caches that value after the first time. That would be fun. Now I’m sad that I don’t have a reason to do that. :-( I suppose I don’t really need a reason, though, do I?

reflection

Friday, December 29th, 2006

I was going to write about Ruby and SQL, but I’m having fun doing other Ruby-related stuff this afternoon, so I’ll write about that instead.

I was writing this unit test, for a class DeveloperWriter. And I got tired of typing DeveloperWriter.new("arg") all the time. (Actually, I got tired of typing new DeveloperWriter("arg") and then being reminded that that isn’t valid Ruby, but never mind that.) So I added a create function to my test class which calls the appropriate new for me: I save fifteen keystrokes each time, and after a few object creations, I’m ahead.

And then I wrote tests for another class, and did the same thing. The third time, this got sort of boring. Hey, maybe I can use reflection to do this for me in a generic fashion? I’m using consistent names for my tests, I should be able to get my hands on the class under test?

So I needed some sort of testcase helper class. Or helper module? Let’s go with class: it can inherit from Test::Unit::TestCase as well, getting rid of that duplication as well. Alas, a bit of experimentation showed that that doesn’t work, or at least not easily: if your actual tests inherit from some intermediate class which inherits from TestCase, then the magic test runner stuff doesn’t work. I took a look through the source code, but that didn’t enlighten me: I couldn’t figure out how the magic stuff works at all, and to the extent that I can figure things out, it doesn’t seem like there’s an obvious workaround. So I’ll save that for a later possible improvement.

Mixin module it is, then. I hope I’ll figure out soon when you put utility functionality in a class (to be inherited from) and when you put it in a module (to be mixed in). The next step: can I get my hands on the Class object representing the class under test? A bit of playing around with irb (yay interactive interpreters, it’s been a while since I used one for anything other than Emacs Lisp) suggests that I should be able to find the name of the class under test by doing self.class.name.sub(/^Dbcdb::Test/, ""), and get the test class itself by doing ::Dbcdb.const_get(self.class.name.sub(/^Dbcdb::Test/, "")). (The regexp usage seems a bit overkill, but it actually paid for itself pretty quickly by catching some errors that would have been more confusing if I hadn’t had a nil floating around. Still, I’m surprised there isn’t another way to remove an initial substring; I’m probably missing something.)

By then, I was getting into the spirit of things: new isn’t anything magic, it’s just a method on the corresponding Class object. And I’ve just found that object, so I can write my generic create function. And hey, it works!

In a fit of doubtless premature optimization, I decided it seems a bit silly to look up the class object over and over again: can’t I stash that in a member variable? It’s apparently considered a bit gauche for modules to have member variables, but it is possible. But where do I initialize it? An initialize method doesn’t seem to do the right thing. I could just initialize it in the body definition, but when I tried that, the wrong class got looked up. I ended up with an accessor function that caches the value; not my favorite style, but it works. Again, maybe there’s something I’m missing.

And, now that I’ve got this helper module, I can throw in other stuff, too. For example, my tests also each have an assert_bad function (speaking of bad, that’s a lousy name, I should rename it) that tries to construct an object and tests than an ArgumentException is thrown. (It’s one line long, as opposed to its five-line Java variant.) So now I can pull that up to my helper class, too. And I have more such ideas in mind.

That was fun, and my tests are more expressive: the test code is almost all assertions, instead of generic helper functions.

Not much else to report yet today. I did get to use regular expressions one other time, replacing 26 lines of Java code with 3 lines of Ruby. (Hmm, maybe I’ll eliminate one of those lines by doing a parallel assignment.) Poking around, there seems to be a java.util.regex package that could have helped slim down the Java code, but it still wouldn’t have been as nice. And Java just doesn’t encourage you to use regexps with the same abandon. (Not that this is unique to Ruby by any means - it’s just a scripting language thing.)

And I can feel the static typing habits just drain away…

first ruby experiments

Wednesday, December 27th, 2006

I wrote my first Ruby code yesterday. It was a port of a date wrapper that I wrote in Java for dbcdb: its only job is to convert to/from written representations, and to have some special dates representing “I read this once, but I don’t remember exactly when” and “I’m in the middle of reading this now”.

It took me an embarrassing amount of time (10-15 minutes?) to get the first do-nothing unit test working, but after a bit of flipping back and forth in what I’m apparently supposed to refer to as the pickaxe book, the test worked fine. From then on it was smooth sailing.

Initial reactions:

  • Unlike Java, there was a builtin date class that did almost everything I wanted to, so no need to hand-write parsing code. (And, as a bonus, I get a rather more flexible parser.)
  • The builtin date class’s output functionality didn’t quite do what I wanted, so I took that as an excuse to write that functionality by hand. I think I wrote it four separate ways, as I found different formatted output mechanisms; for now I’m doing "#{::Date::MONTHNAMES[month]} #{day}, #{year}”, but I may well change that again. (Hmm, one bit of that is calling out for an Extract Method, isn’t it? Let me make that change right now…)
  • It was very nice to have the source code for builtin classes lying around, both to find what method to call on Date to do what I want and to give examples of how to write something. (E.g. formatted output.)
  • Mixins are cool: I just had to add four lines of code to get all sort of comparison functionality.
  • The builtin unit test framework is pretty cool, too: I like how it, by default, runs tests for you when you load a test’s source code, and how you can aggregate your test classes into a suite by just requiring one test class after another.
  • Having private mean “private to this instance” as opposed to “private to this class” is an interesting design choice. So far, the only ramification is that I’ve had to make one method protected; that seems like a reasonable tradeoff.
  • My Ruby date wrapper is a third the length of my Java date wrapper. (The Java version has one or two minor pieces of functionality that the Ruby version doesn’t have, admittedly.) Having said that, I’m not sure this is a very good comparison, because it says more about the vagaries of the respective languages’ builtin date classes than anything else. And I’m not using much of the real power of Ruby - no blocks yet, for example. Still, shorter is better.
  • What is perhaps more interesting is that every method in the Ruby version is either one line long or consists of a case statement where every branch is one line long. (Should I Replace Conditional with Polymorphism? Not yet in this case, I think, but it’s a thought.)
  • I’m still trying to get a feel for where it’s most stylish to include (or not include) parentheses. For now, I’m not including very many.
  • The choice of M-backspace in Emacs ruby mode to do something other than delete the previous word is, shall we say, idiosyncratic. In general, ruby mode isn’t that great, but I can live with it.
  • The documentation that I’ve found can most charitably be described as haphazard. I managed to get the gem stuff setup in my local directory, and the gems documentation was reasonably helpful in that regard. Then I wanted to install the mysql gem. It gave me a choice of five versions; the latest one (2.7.1) mentioned Windows, which seemed strange, given that it knew that I was running under Linux. So I skipped that one, and went to the previous version. (2.7 - but the project’s web page says it’s up to 2.7.4?) It wanted to compile some C code (pretty cool, nice to have that automated), but couldn’t find the appropriate libraries. It did give me an error message listing possible configuration options I might want to give, one of which looked right, but didn’t tell me how to pass that option to the install script! (Easy enough if I were doing it by hand, but I wanted to do it within the gem framework.) And the gem documentation didn’t tell me how to do that, either. Eventually, a random web page turned up the trick, but the experience didn’t impress me too much.
  • My favorite documentation, um, quirk, though, is the Og rubydoc page saying “You can find a nice tutorial at www.rubygarden.com“. I’m sure I can, but how about a direct link to help me along? (Searching on rubygarden for ‘og tutorial’ didn’t help, though ‘og sql tutorial’ did the trick.) It then continues “Be warned that this tutorial describes an earlier version of Og. A LOT of new features have been added in the meantime.” Yes, well, then maybe you guys could help out a bit? The rubydoc also had a lovely list of features, with no guidance as to how we might use them: I might want something that “Can ‘reverse engineer’ legacy database schemase[sic]” but I’m not in the mood to troll through random crap to try to figure out how.

In general, I’m happy, but I’m hardly using the power of the language yet. The community around the language feels a bit raw, for better or for worse - little documentation, many abandoned libraries, no clear winners in many important spaces. That’s okay, though; given what I’ve seen of source code, I should be able to figure stuff out myself, and I’m happy to contribute documentation improvements, too.

I have more thoughts on Ruby and SQL, but this post is already long enough, so I’ll start a separate post for that.

what to do next?

Sunday, August 13th, 2006

I’ve finished the last important code cleanups from my dbcdb code: I removed some proxy objects that had been used for lazy loading. I was really surprised to see how much that cleaned up certain aspects of the code: my Entity objects’ constructors got a lot cleaner, useless attribute setters/getters were removed, and in general responsibilities were greatly clarified: the Entities’ only job is to convert from SQL to HTML.

Which brings me to a pause point. I’ve met some of my objectives: gotten a little more practice with Java and HTML, learned a little about SQL and CSS, and provided an alternate linking structure to use in the blog. Nothing earthshattering, but it’s been of some modest use to myself. And there aren’t any obvious gaping holes to be filled.

So it’s time to take stock and figure out what to do next. For a while, actually, I was considering taking the time I’d been spending on this and using it to learn Japanese instead. (Which would actually take rather more time, but never mind that.) With some regrets, though, I’ve decided that isn’t the best course of action right now. My best guess is that I’ll be looking for another job in about two years from now. (With a huge margin of uncertainty, of course.) And, while I’m not sure what I’ll target in my search, I would like my options to be as many as possible; to that end, spending more time broadening my skills could be of some use. Exactly how much use isn’t clear - having been on the other end of the resumes, I realize how easy it is to reject candidates whose professional experience isn’t exactly what you’re looking for - but it’s worth a shot. So I’ll want to keep this up for another year or so. (After which, I hope to have enough time to take a break and learn Japanese. But who knows what the future will bring.)

So, given that I’m not going to stop now, what next? Rewrite it in Ruby, for one. I’m starting to chafe at Java more and more: just today I ran into a few places where I could use lambda, a few places where static typing was being mildly annoying. So I’ll start by rewriting the CLI tool in Ruby, then rewrite the HTML conversion part in Ruby. After that, I’ll generate the web pages on the fly instead of statically, using mod_ruby. (I don’t plan to learn Rails for now: I don’t have any good applications for that in mind.) After which, who knows; maybe I’ll stop there, maybe I’ll convert the editing tool from a CLI application to a web application. Maybe I’ll play around with web services, scraping book information from Amazon. Hard to say.

The immediate next step isn’t entirely clear. I’ve read/skimmed the Ruby book, but it hasn’t all sunk in; clearly I need to get my hands dirty. And I need to learn how to use Ruby to interface with a database. (Maybe the book talked about that; I skimmed the library section.) It’ll probably take a few months to have anything to show there; I also have a bit of unit-test library cleanup that I’ve been putting off. So don’t be surprised if I go quiet on the programming front for a little while.

dbcdb: improved compound author links

Saturday, August 12th, 2006

I’ve deprecated the old compound author pages - they’re still there, but now nobody links to them. Instead, pages for books written by multiple authors link directly to the individual authors’ pages.

A matter of a change of a couple of lines of code. (Though all of my acceptance tests passed unchanged after that - oops. That has now been fixed.) I’ll probably eventually improve the database design as a result of this, but doing so is hardly urgent.

random dbcdb tweaks

Sunday, August 6th, 2006

Today’s dbcdb projects:

  • Improve the appearance of pages with long fields: now the long field doesn’t get forced to start on the next line. I’d hoped this would fix the Internet Explorer problem, but it doesn’t (though it improves it): for reasons that I haven’t yet investigated, I have to have the key float: left to get its width to stick, which triggers the IE bug.
  • I removed the ‘own’ field. Which was my first bit of database tweaking; yay.
  • I downgraded all ratings by one. Which was my second bit of database tweaking; yay. (Though I didn’t tweak the database in the most stylish way possible.) Definitely a good idea: it turned out that I hadn’t rated any books as 1, and I’d only rated one game as 1.

No particular lessons that I learned; it was all pretty straightforward.

dbcdb: link changes

Monday, July 31st, 2006

I’ve made a couple of dbcdb changes. Now every page contains a link to let you search for all blog posts mentioning it. Which required a bit of WordPress futzing: it turns out that WordPress doesn’t let you search for double quotes by default. Also, I replaced the ISBN and ASIN fields that were really Amazon links by an explicit Amazon link - it’s not like anybody needs to know ISBNs these days, after all.

There are a few more issues I want to deal with. A while back, I noted that the pages looked like crap in Internet Explorer due to a bug in their CSS handling. That didn’t really bother me too much, but now I realize that pages for, say, books with long titles don’t get wrapped in the way I’d prefer. Since the same fix should handle both issues (stop using definition lists), I’ll take care of that.

The other issue that I want to fix is the handling of books by multiple authors. When I first started this, I had an author class, with a compound author subclass; each of those generated a web page. So you could, for example, see for all books by Christopher Alexander, Howard Davis, Julio Martinez, and Don Corner. When I started using SQL, though, the resulting SQL got a bit messy; Per pointed out a more natural table structure, which would (as a side effect) make it rather harder to generate those multiple-author pages.

So: do I go with the pages suggested by the original class hierarchy, the pages suggested by the SQL structure, or neither? (I certainly shouldn’t let implementation details unduly hijack my page layout.) After thinking about it for a bit, I think that the current compound author pages are hurting more often than they’re helping. I basically never want to see all books by a given set of authors, while it’s not infrequently the case that, when presented with a book by multiple authors, I want to see all the other books in the database that one of the authors has written. So the multiple author pages aren’t helping, and are actually forcing an unnecessary level of clicks.

So I’ll probably deprecate the multiple author pages. (And eventually make the corresponding SQL change, though that’s not particularly urgent.)

Also, I’ll probably get rid of the ‘Own’ link (who cares what books I own?) and reduce my ratings by one. After all, while I would probably rate most books in the world as 1’s (= gave up in disgust), the truth is that I’m quite good at avoiding such books, and the world really doesn’t need to know which books I actively disliked versus which books I didn’t particularly care for.

After which I have further plans, about which more later.

dbcdb: links!

Saturday, June 10th, 2006

My dbcdb pages now can have a list of external links attached to them. This is a feature that I’d been wanting to add for a while - until now, the links from within these blog entries probably served as more of an annoyance to my readers than anything else, since the information on those pages was unlikely to be of interest to you. (Except maybe the hidden Amazon link, about which more later.) But now they can potentially serve as a way to make my blog a bit richer - if I refer to, say, a book from within a blog post, then they provide an easy way to find other blog posts where I talk about that book in more detail. (At least once I go and add more links into my dbcdb database.)

The issue of how to handle related posts is, I think, not an uncommon problem for bloggers to run against; c.f. Tim Bray on a variant of the problem.  There are lots of solutions, each suited for different manifestations of the problem, but I like this idea of adding an extra layer of indirection by sticking a mediating web page (my dbcdb pages) in the middle.

Once I decided to take that approach, though, there were two different implementation strategies: I could either add links manually to the database every time I posted about a book/game/whatever, or I could let a search engine find all blog entries that refer to that book/game/whatever. I decided that the latter wasn’t completely satisfactory: not all posts that refer to a given book are created equal. For example, half of my dbcdb blog entries link to the page for The Arcades Project not because they have anything to say about that book but because I’m using that as a default example. Also, one main argument for search, namely that manual indexing isn’t scalable, doesn’t apply here: manual indexing should scale just fine in this case.

Having said that, search also has its value for this: you might (or might not!) be interested in all blog entries where I, say, make an offhand mention of a certain video game, one that I might not choose to put in the index. So my next story will be to add an automatic “search my blog for references to this” link to all entities. With luck, that will give me the best of both worlds. Also, while I’m adding links, I’ll get rid of the ISBN/ASIN fields and replace them with an explicit Amazon link.

Before I started doing this, I realized that, if I didn’t first get rid of MemoryCollection in favor of SqlCollection, I’d have to do a tiny bit more useless typing to implement this. Since I’d been planning to do that soon anyways, I figured this was the time; it turned out to be quite pleasant, using (the classes that implement) the CLI tool’s interface. Yay. And, in doing so, it increased my appreciation for dynamic typing; I might go on about that later, but if nothing else it emphasized that I really do need to spend some time soon playing around with a dynamically typed language.

books with more than two authors

Monday, May 29th, 2006

I can now handle books with more than two authors. Yay. Nice to be able to spend ten or fifteen minutes making a change that actually affects what books I can handle instead of spending months making a change whose effects nobody else can see.

Not that I’m done with the behind-the-scenes changes yet - I still have some amount of cleanup to do, getting rid of no-longer-necessary code, so I’m trying to combine each new feature with a bit of cleanup. And there was one unpleasant surprise when doing that: I was shuffling responsibilities between acceptance tests, when all of a sudden one of them stopped working. And stopped working in a quite strange way - it was updating a row in a database, and some of the columns got the new value while others had their old values. I don’t quite see how this could be any sort of obvious bug in my code - tests are pinning down the behavior pretty tightly, and if anything goes wrong an exception should be thrown that would make it up to the top. And my code should be pretty deterministic, yet the problem in question magically cured itself a few minutes later.

So I’m pretty disturbed. I have no idea if it’s related to my other current mystery, the random hangs I mentioned earlier. Just to be on the safe side, I read through the JDBC Javadoc a bit more, to look for places where I’m not releasing resources; I’m now calling close() on my java.lang.sql.Connection objects, just in case. (I was already calling it on my ResultSet objects, though that is one place where I could imagine there being gaps in my unit test coverage.) But this latest problem doesn’t feel like a resource leak to me…

Much of the future cleanups are pretty obvious, but there is one place where I have a question. Right now, I have an abstract interface Collection with two implementations, MemoryCollection and SqlCollection. The former is the original implementation, from the days when the list of books was represented by Java code; the latter is the implementation I’m using now. So I could delete the former, were it not for the fact that it’s being used in unit tests in a fair number of places.

So: do I keep it around, or do I retrofit the unit tests? It’s not clear to me that keeping it around is going to pose that much of a burden; and if it becomes a burden at some point, I can always retrofit the tests then. (Which will, admittedly, take a while, which is one reason to think about starting on it now.) So if keeping it around is the easiest way to write unit tests, then that’s what I’ll do.

One aspect of keeping it around is this: to create, say, a new Book object using MemoryCollection, I call a factory function, passing it the title and author, and then call various write accessor to update other attributes that I feel like setting. (E.g. the ISBN, or when I read it.) Which is an okay way to do things; it’s certainly better than creating an SQL row by hand. But now that I have this CLI tool to create and edit entities, I could use its interface instead, which is pretty easy to use: something like

add book author Fred title “This Book” isbn 1-234-56789-x

is easy enough to write, probably even easier than calling various write accessors.

There are a couple of issues with that. For one thing, I can’t just pass in a single string, I have to pass in an array of Strings, and it’s a little more verbose than I’d like to create that on the fly in Java. Maybe I could work around that by writing a little function to split the single string into an array, though - that would be easy enough to do, and might save me a fair amount of test-writing time. The other issue is that the above isn’t actually the syntax that I use for the current SQL interface - instead of passing in the name of the author, I have to pass in the entity ID of the author, something like “author 3″ instead of “author Fred”. So there’s some fiddly counting to do, and I lose type safety, neither of which thrills me. (It’s a problem with the CLI tool as well.)

Another possible issue is that I was worrying about the tests being too incestuous, not being tied to the external realities of the database format enough. But, thinking about it more, I’m not worried about that any more: my tests for the Editor object (which writes out the SQL database) really do test the SQL that is generated. So it’s fine for me if my tests for Book, for example, whose main job is to generate HTML, aren’t so closely tied to the details of the SQL: as long as they accept whatever Editor outputs, it should be fine. In fact, it’s arguably superior to have that linkage be enforced in the unit tests. (This is all being enforced in the acceptance tests, too.)

I’ve got enough other cleanups to do that I don’t have to decide this one way or another for a few more weeks. I’m leaning towards getting rid of MemoryCollection and using SqlCollection exclusively, but I could still change my mind.