**Warning:** While I don’t intend for anything here to be a spoiler for the “primes and fractions” post I wrote before it (the post following this if you’re looking at the front page), it’s possible that you’d have more fun working out that brainteaser if you didn’t read this post first.

At the time when I wrote my last post, I’d gotten as far as figuring how the fractions in question turned into a computational machine of a much more traditional form than the form in which it’s stated there. After writing the post (and eating lunch), I wrote down that machine on paper, and went through a series of simplifications. (Refactorings, basically, with the extra benefit that I could increase the power of my computational machine, because I didn’t have to restrict my actions to those that could be carried out by the original computational model.)

After an hour or so of that, I had a much simpler model; I couldn’t yet completely see how the prime numbers were going to appear, but I’d gotten a lot closer, and in particular the concept of divisibility had appeared. Hoping to get a bit more insight, I decided to run a few dozen iterations of my new, simpler model.

And, of course, I started getting bad answers; if I’m remembering correctly, I ran into an endless loop or something, and at any rate I certainly wasn’t computing prime numbers. Oops.

Which was frustrating. I mean, I’d been pretty careful, breaking the work up into explicit steps, double-checking my work after each one (and frequently finding mistakes, which in retrospect should have raised warning signs). But I was pretty obsessed by then, so I wasn’t going to let a little problem like having a mistake somewhere within the last hour or two of tedious work stop me!

To recap my progress so far: I started with a computation, I transformed it to simplify it and to improve my understanding, and I made mistakes. Which is something I have experience with when programming, and I already know the answer in that context: I should be doing formal refactorings backed up by running tests after every step. I’d been doing an okay good job of the former, but the latter was too laborious to carry out frequently by hand. (For that matter, it’s just as easy to make mistakes when running through the algorithm by hand to test it as it is when transforming it by refactoring!)

So I brought my work over to the computer: I wrote a Ruby version of the initial state machine, with a driver program that ran through as many iterations of the steps as I wanted. And then I refactored the state machine, reloading it into `irb`

after every change, and saving it into a new file when my little changes added up to a more significant one (e.g. eliminating a state), just so I could look back on what I was doing. I didn’t have formal unit tests: I just ran the program for a few thousand steps after every change and checked that the primes were appearing, on the theory that I was unlikely to make a mistake that let the algorithm still work. (In retrospect, maybe that was a mistake: it would have been easy enough to turn that manual test into an automated test.)

And it went a *lot* faster. I didn’t have to spend as much time double-checking my work: I could just make changes and run the test. I could work in smaller steps, so where on paper I would have composed two transformations in my head (involving changes to multiple variables at once), I could just write them out as a program one after the other and then merge changes to variables one at a time. If I wanted to eliminate a state, I’d just paste the code for that state into every place where it was used, and then do transformations after that. If I wanted to reorganize the control flow by moving a block into all the branches of a conditional, I’d just paste it into all the branches, then simplify the branches one at a time.

I still made mistakes, but even without the tests I made fewer mistakes than I’d been making on paper: it’s really a lot easier to avoid mistakes if you just cut and paste whenever you’re moving operations around, and then break up your subsequent simplifications into smaller steps. (At some point I cross-referenced my work with my paper copies; I’d made two important mistakes pretty early on, and in retrospect I’m surprised that so much of the mathematics managed to survive in the paper version even with those mistakes.)

So now I have a proof; yay. But I’m actually thinking that there are some lessons here that the mathematics community could learn from: I worked faster when I started using the same techniques that I use to write correct, readable code as quickly as possible, and I made a lot fewer results at the same time. And who wouldn’t want that? In particular, I think an automated engine that lets you correctly carry out refactorings on steps of your proof could help a lot, done right: it could let you do the relatively mindless work faster, so you could spend more time thinking at a higher level, and you could spend more time quickly exploring different paths in that mindless work to see if any of it leads somewhere unexpected.

And you would make fewer mistakes. My impression is that most math papers contain mistakes in their proofs; this doesn’t necessarily mean that the results are wrong, but sometimes it does, and even if they aren’t, it’s still good to get the details right. It would also make it easier to be honest with yourself about when you’re handwaving and when you’re being precise; I certainly wouldn’t say that handwaving is inappropriate in all circumstances, but you want to take care when doing it. (What I’d really like is for papers to be published in a format where the printed version contained the steps that are most useful for expository purposes but where there’s a more complete computer-verifiable version of much or all the proof.)

This isn’t a new idea, of course; I worked on a project called “VLISP” the summer after my freshman year of college that, among other things, had a theorem verification engine, and the idea wasn’t new then, either. (The proof of the four color theorem was almost a decade and a half before that.) So why hasn’t it taken off? I’m not sure, but here are some ideas I have:

- It’s hard to translate even relatively formal human reasoning into something that a computer can work with.
- Partly because of that, and partly because mathematicians are frequently reductionists at heart, people have been tempted to concentrate on the oldest, most well-trodden parts of mathematics, which means that their engines aren’t going to be of use to the vast majority of working professional mathematicians.
- People have conflated the concepts of “automated refactoring” with “a complete computer-verified proof (from first principles)” and (worse yet) “computer-generated proof”. I don’t think the latter is any more likely to be useful (right now) than having a computer automatically simplify the former for me; as for the former, a computer-verifiable proof starting from first principles is a nice jugendtraum, but you’d be a lot more likely to make an impact if you could produce a tool that could automate certain parts of proofs in ways that helped actual working mathematicians, even if it couldn’t automate anywhere near all of it.
- Automated refactoring tools only showed up in programming settings a decade and a half ago, and still aren’t widely used. At least I think they aren’t; I’ve never used one, and while most IDEs these days contain a few refactorings, I don’t get the impression that most programmers use them a lot. (Which is one reason why I lean so much on automated tests; in mathematics, I think the automated tests would be a good deal harder to carry out, but fortunately I think automated refactorings would be easier.) So it’s not surprising that they haven’t made an impact in other fields yet.
- Writing a tool like this that is easier to use than paper and pencil would be a
*lot*of work, and require very good judgment.

For all I know, people are making a lot of progress in this area, and I’m just not aware of it. But I think there’s something there: I think our understanding of these problems in the programming realm has advanced enough that we probable have something to teach mathematicians. One of these decades, somebody will push the tools far enough to where using them starts to become almost as fast as working by hand, at which point people will get a lot more interested.

#### Post Revisions:

There are no revisions for this post.

“Almost all theorems are true, but almost all proofs have bugs.” –Paul Pedersen, mathematician and computist.

3/3/2008 @ 9:32 am