literate programming

Prompted by Knuth’s delightful article “The Errors of TeX”, I just read his collection Literate Programming. (Which contains the aforementioned article, among others.) A fascinating read, for multiple reasons: Knuth is a really smart guy, whose opinions I very much respect, but he’s writing from a context that I frequently find very hard to understand.

The most dramatic example of this is the first article, “Structured Programming with go to Statements”. It was written in 1974, three years after I was born, and I never learned Algol 60 (which I think is more or less the language that the article uses). I understand that goto once was used much more, and I can imagine a world in which people had yet to decide that a few iteration constructs (for, while, and the occasional (in my experience very occasional) do…while), combined with a sprinkling of break and continue, were good enough for 99.9 percent of your needs. And while I suspect that some of the other solutions he discusses (Zahn’s iteration constructs) are too complicated to be a part of a well-designed programming language, I can imagine an alternate history in which they would have a more prominent role.

But what I can’t imagine is a world without functions, yet that seems to be the world that this paper was written in. Did programming languages of the time really not have functions? (Certainly some of them did.) Or did they have functions but only allow a single exit from those functions, in which case they were irrelevant to the question at hand? That must have been the case, but I still find it hard to wrap my brain around. Compiler technology must have played a big role, too: the article is very concerned with optimization, and I doubt compilers were too good at inlining at the time.

So I felt like I was in bizarro world when I was reading the article. But it’s always fun reading visiting other worlds, and there were some hints of worlds that are dear to my heart in the article. At the end, he hints at object-oriented programming (or at least modules). Much more interestingly, earlier in the article he discusses the possibility of automated refactoring tools (without using the term “refactoring”, of course). But the use he proposes for those tools is completely different than the current uses of those tools: these days, we want to apply behavior-preserving transformations to our code in order to make it clearer, while Knuth wanted to start from clear code and apply behavior-preserving transformations to make the code less clear, but more optimized!

A decade later come the articles on literate programming. I knew that this was a way to embed source code and a description of what the code does in a same file, to enable you to present a nicely-typeset, richly-commented version of the code. I hadn’t realized quite how lengthy the comments were that Knuth proposed, however; I also hadn’t realized that literate programming divides code up into fragments that can be portions of functions, instead of entire functions.

At least functions play more of a role here than in the earlier paper. But they still don’t play enough of a role. Over and over again, I had to ask: why aren’t these fragments of code functions in their own right? There are probably a few answers to this. For one, I suspect that short functions still weren’t in the air much at the time. For another thing, I suspect that compiler and programming language support for inlining wasn’t very good, so it would have been an unacceptable performance hit. And a third thing is that the code fragments wouldn’t have always worked on their own because they referred to variables external to the code fragment: you need objects for that style of programming to come into its glory.

So it’s pretty amazing to see how Knuth comes towards modern best practices without modern programming languages and tools. And it’s embarrassing to realize that Knuth probably understood the need for Compose Method two decades ago better than I do now. He has also responded to my objects to the paper I discussed above: his literate programming discussions are in the context of Pascal, which apparently doesn’t allow multiple exits from a function, so Knuth provides a return macro to get the same effect (using goto).

[Side note: I never learned Pascal, for no particular reason. (I’m the right age to have learned it, but I was cool enough to jump straight to C. Or something.) I was aware of its pointless procedure/function distinction; I was not aware that it distinguished between calculating the return value for a function and actually returning from a function. Weird.]

Getting back to the “literate” part of literate programming, the length and quality of exposition of the comments in the code samples that are given is pretty stunning. Looking at this through XP glasses, I suspect that the comments are rather too long (too many artifacts not pulling their weight), but it would be interesting to try it out once or twice. (At the very least, I should get around to reading TeX: The Program: it’s only been sitting on my bookshelf for a decade and a half by now…) My first reaction was that lengthy comments could be good for presenting a finished, polished program to the outside world, but not so great for a live program that is constantly being modified, because the comments are an extra burden and could get out of date too easily: far better to spend your time on making the code itself clear. But, in “The Errors of TeX”, Knuth talks about how the comments made it much easier for him to perform some serious surgeries on the code, so I could easily be wrong there.

Some more cultural shifts that you see in the book: at the start of the book, he’s talking seriously about mathematical correctness proofs of programs. By the time he gets to “The Errors of TeX”, though, he’s using test input to exercise every aspect of his code. A big improvement; personally, I’d rather lean more on unit tests, but he’s working in a context where that isn’t so realistic. Also, data structures sure have changed over the years. When I think of, say, a binary tree, I think of a node as being represented by a chunk of memory with two pointers and a data value in it. But in this book, a binary tree is three global arrays (data, left, and right), with a node being an index into those arrays. (So many global variables!)

I’m definitely putting more of his books on my “to read” list.

Post Revisions:

There are no revisions for this post.

Published 7/24/2005 & Filed in Books,Programming

One Comment

Comments closed

Comment by Mark Gritter

The global variables are a Pascal-ism, too, because there is no heap in Pascal. The specialized memory management is very odd to see these days, though.

A lot of the literate programming stuff does really appear to be working through the shortcomings of Pascal. :) I played around with it a while in college; it worked pretty well for writing the report at the same time as the code for projects.

I was not all that impressed reading TeX: the Program, though, because the 1st chapter introduced a bunch of global variables and initialization that were pointless out of context— I thought they really belonged in the situations where they were actually used!

7/25/2005 @ 10:48 am

malvasia bianca