Libraries, Interfaces, and Dependencies

The code base for my new job is rather larger than the code base at my last job, which means that I have to start thinking about the build process more than I’ve been in the habit of doing recently. I know how to set up a good build system for C-based languages (hint: gcc -MD is your friend), but I’ve never felt as comfortable with the options for Javaish environments, even for relatively straightforward projects that are just building a single program.

And we’re not building a single program: it’s a distributed system, with a collection of programs using some common libraries and some distinct libraries. Which is quite similar to the situation at Kealia, but the build system feels rather different. Part of that is related to the source code layout (we’re putting each module in its own git repository), but part of that is due to differences in the Java/Scala way of compilation and the C-language way of compilation.

At Kealia, our product code was spread across three directories: interface, which had one subdirectory for each library, containing the public header files for that library; lib, which had one subdirectory for each library, containing the .cpp files and private header files for that library; and prog, which had one subdirectory for each program, containing the code specific to that program that we didn’t feel like pulling out to a library. So the build process was in two phases: it built all the library code in parallel, and then all the program code in parallel. (If I’m remembering it correctly, actually, the build system was happy to start working on program code even if some of the libraries weren’t done building, but at any rate dependency chains at the library/program conceptual level were never more than two deep.)

In Javaish languages, though, that doesn’t work, for the simple reason that the language doesn’t have a notion of a header file: if library A depends on library B, then you have to build library B before library A. Java programmers like to talk about how, in C++, header files mean that you have to write your class interfaces twice; that’s a quite valid complaint, but it bites you in situations like this. Though the situation isn’t quite as simple as that description makes it sound: in a lot of situations, Java interfaces end up acting a lot like C++ header files, giving you some amount of separation of compilation while also requiring you to write some of your class interfaces twice. Certainly proper use of interfaces is important in reducing long dependency chains; I don’t think we’ll be able to get down to two levels of dependencies, but what is the shortest chain that we can reasonably get to?

One answer might be three: break each library up into an interface part and an implementation part, like the include and lib directories from my C++ example. Programs should only depend on libraries and interfaces, and library implementations should only depend on other libraries’ interfaces, never on their implementations. That sounds plausible to me, both in terms of being achievable and in terms of being a good idea, but I’m not sure that it’s enough to get us to three levels of dependencies. The problem is that one library’s interface can reasonably depend on another library’s interface: e.g. if you happen to be writing a video server, you might reasonably have some generic MPEG classes that are referred to in the interfaces of multiple libraries, which means that they themselves have to be in a separate library. So that’s four levels of dependency; I can even imagine that it would be hard to avoid introducing a fifth level, because you might have some fundamental data structures (e.g. some sort of widely used numeric abstraction or custom collection) that shows up in the interface of shared classes that are a bit more domain specific.

So: five levels of dependency. That doesn’t sound like a very good rallying cry, does it? Still, at least it’s a stake in the ground, and having those five layers be well defined (programs, library implementations, library interfaces, shared interfaces, and a handful of core interfaces) is something.

Is even that achievable, though? C++ header files define a few different sorts of concepts. There are abstract interfaces; these are a lot like Java interfaces. There are ADTs; these belong in my interface layer above, but contain actual implementation as well. And there are methods that you call without having an object instance at hand: static methods, but also constructors.

What do you do about this latter class of functions? In C-family languages, the linker happily resolves them for you; what’s the analogue for that in Java? I guess the answer there is: if you have a static method (including class constructors) associated to a class that’s not plausibly an ADT, then you have to make sure that there’s an extra layer of indirection involved to let you do the wiring at the program level instead of at the cross-library level. In other words, you have to get serious about dependency injection: library A shouldn’t depend on the implementation of library B, even to do the basic setup for that library. Instead, library B should abstract all those static methods out into an interface, library A should be written in terms of that interface, and every program that uses library A should pass an implementation of that interface to library A. (It can get that implementation from a static method in library B, but the point here is that it’s the program layer that’s grabbing the static method from library B, not library A’s implementation layer.)

At first blush, I think that should work. Actually, it even seems plausible to me that it would lead to better structured programs than you would get if you didn’t worry about rigorously adhering to such standards. Five layers of dependencies: it’s good for your software! (Not the best rallying cry I’ve ever heard…)

Maven

The other aspect of our setup at work that’s new to me is Maven. I’m really not sure what I think about Maven yet: I’m willing to believe it’s better than Ant, but I’m also willing to believe that that is not much of an accomplishment. I like its emphasis on convention, I like the way it’s good at pulling in known versions of external resources (your Scala compiler, your jUnit version, etc.). But I’m less sold on its insistence in separating every artifact into a separate project: what happened to Recursive Make Considered Harmful? And it boggles my mind that, in this day and age, a build system would be written that only supports parallel builds as an experimental feature in its as-yet-unreleased third major revision number bump! We’ve known for most of a decade now that many cores are the way forward, yet I’m using a build system that can’t even keep two cores on my machine busy? That is ridiculous.

Another part of the picture is Hudson: our build system at work is set up so that you’re encouraged to have Hudson build artifacts that you’re not actively touching. (And Hudson is capable of doing those builds in parallel. This perhaps makes it possible to live with the aforementioned Maven limitation, but in no sense excuses it: if I’m starting up a project, I absolutely should not be required to set up a continuous integration server just to get non-braindead compilation performance. I realize that the Java world is not known for its fondness for lightweight systems, but that is… Sigh. I will stop ranting now.) I haven’t yet figured out what the Maven way is here: is Maven perfectly happy for you to have all your build artifacts under your own control, is Maven actively hostile to you having all your build artifacts under your own control, or is Maven agnostic about that question?

This all also bears on the question of source code repositories that I mentioned above. At first, once I got over my surprise, I kind of liked the idea of having each module in its own git repository: let’s take the idea of separate modules seriously. But once we start breaking up one conceptual module into an interface part and a library part, I’m less happy about having them in separate repositories: you’ll want to change an interface and the code that implements that interface in lockstep, which means that changes to them should be part of a single commit. And I’m also not sold on this idea that we should have Hudson in charge of building all these different artifacts for us: maybe I’m conservative, but I still like the idea of checking out a known version of the entire project’s source code, doing a build of it, and getting a reliable, reproducible result.

Lots of interesting questions to think about, lots that I still have to learn. (And that’s without going into my discomfort about dependency info generation for Javaish languages!) Probably the best thing for me to do would be to start a project at home: I should come up with something to write (probably purely in Scala, I don’t think that mixing Java in will either make me happier or clarify the situation) that I can plausibly break out into multiple libraries, so I can really grapple with these dependency issues.

Post Revisions:

This post has not been revised since publication.