Archive for the ‘Computers’ Category

erik ray, r.i.p.

Sunday, June 1st, 2008

I was very sad to learn that Erik Ray died on May 14, after being hit by a car while riding his bicycle. He was more of a friend-of-a-friend than a direct friend, but I certainly enjoyed the time I spent with him when we were both living in the Boston area.

For those of you who come here for video-game-related reasons, he worked on the excellent System Shock in a variety of roles, including doing some of the level design. He also wrote a pleasant introduction to XML.

But the main reason why I miss him is Lambda Expressway. It’s a fabulous quirky audio novella, mixing a sort of adventure story with mentions of the virtues of building your own backhoe, a character with a theme song immortalizing the virtues of buns, and, well, just go listen. (I do every year or so.)

I think I have another tape or two of his around somewhere; I really need to go and get them digitized. I hope they haven’t fallen apart…

paris 2008

Monday, May 5th, 2008

As I have, perhaps, alluded to previously, we spent the second half of April in Paris. Notes:

  • It’s the most wonderful place in the world, but I’m actually not feeling particularly compelled to visit it again any time soon. Some of this has to do with the fact that I’ve been there eight times; some of this has to do with the fact that I rather enjoyed spending the week between Christmas and New Year’s at home, and am not sure how much I want to do any vacationing for the sake of vacationing. Of course, this is all subject to change at any time, and Liesl and Miranda may have different opinions.
  • We’ve had bad hotel luck in the past; based on recommendations from comments on this blog post, we decided to try renting an apartment this time. We went with absoluliving; not as cheap as a cheap hotel, but for the same price as a decent hotel, we could get two bedrooms and a living room, with a clothes washer, a stove (not that we used it), a fridge. Or at least we thought that’s what we were getting; the day before we were supposed to leave, they e-mailed us to tell us, with no explanation whatsoever, that they were changing apartments on us; we ended up in a one-bedroom apartment, which they had the gall to call an upgrade because it was in a trendier neighborhood. To be fair, the apartment wasn’t a complete unknown, since we’d marked it as acceptable from the list of apartments they’d initially proposed to us, but I still didn’t appreciate the bait-and-switch, or whatever it was, at all. (Also, to be fair, I’m happy enough with the area we ended up in, and will consider staying near République in the future, but I didn’t like being in the middle of a very long block on a side street.) The other problem with the apartment was that one window kept squeaking open and closed all night when it got really windy; I’m not really mad at them about this, because I’m not sure how they would have discovered it by inspection, but it does point out a problem with an apartment agency that you don’t have with a hotel, namely that you can’t just complain about a maintenance problem and have them move you, because they might, say, be closed on the weekend. (Fortunately, it happened on a Thursday, and they managed to get somebody in on Friday who eventually stopped the squeaking by duct-taping it shut.) Anyways, one separate bedroom (Miranda was in a sofabed in the living room) is vastly better than everybody sharing a bedroom, so the general idea was a good one.
  • Poor Liesl was sick some of the time; fortunately, it wasn’t nearly as bad as when we were in Amsterdam, but she stayed in the apartment for three (two?) of the days because of that. Partly because of that, we didn’t go to as many restaurants as we might, but we still got some good food out of the trip (including one from a restaurant that apparently changed hands about a week after our last trip and was completely, surprisingly different this time); visiting salons de thé in the afternoon may have been my favorite part of the trip. (The pizza at decent Italian restaurants in Paris is quite nice, too.)
  • Why had I never heard of Lovis Corinth before? My first reaction is that I’d rather look at his art than, say, that of Van Gogh or Gaugin or Seurat. Looking at labels suggested that part of the reason is that his art is scattered around museums in Germany instead of clustered in museums in Paris; glad I’m aware of him now.
  • The baboons at the zoo in the Bois de Vincennes are a hoot.
  • Having internet access in your apartment is a good thing. And no, this is not a sign that I need to relax and tear myself away from the internet: this is a sign that I don’t feel compelled to spend every vacationing hour traipsing from site to site and can, instead, spend time in my hotel just enjoying myself without feeling guilty that I should be doing more on vacation.
  • Having a washing machine in your apartment is also a good thing. And points out another benefit to the internet: if your washing machine is refusing to wash and just blinking when you hit a number, you can google the model name and get a manual. (Answer: you accidentally hit the child lock button; hold it down for four seconds to unlock, and what you thought was the off button is actually the start button.)
  • Miranda’s favorite museum turned out to be the sewer museum.
  • Sacré Coeur is distinctive to look at from a distance but boring on the outside. Not so Notre Dame: there’s something to be said for thousands of people working for hundreds of years to produce something glorious.
  • I really am not impressed by the current Orangina ad campaign: large-breasted zebras just don’t do it for me. Sex, fine; animals, fine; combining the two, ick.
  • We forgot to buy a power converter; fortunately, the basement of BHV had them for sale. (They had one that went both directions, 110-to-220 and 220-110.)
  • Traveling with several puzzle books from Nikoli was an excellent idea: not only are the puzzles top-notch, but the narrower-than-US form factor meant that I could slip one into my jeans pocket, which is very useful when walking through museums where I’ve had to check my backpack, finding myself a room or two ahead of Liesl and Miranda because we go through them at a different pace, and needing to amuse myself. I’m getting a bit burned out on Nurikabe (though I still think they’re an excellent puzzle variant), and Number Link isn’t my fave (once the puzzles get out of the easy range, I have a hard time proving my solution is unique, which frustrates me), but I’m still a big fan of Masyu and Slitherlink. I’ll have to try some of their other puzzle types.
  • I really can dial down the number of books that I take on a trip these days: I have enough other entertainments that I don’t need to carry nearly as many to avoid running out of them. (And there are always bookstores if I guess wrong.)
  • Heavy curtains are great for the first night or two after getting off the plane, but in retrospect I should have stopped closing them completely after that: I never really got my clock adjusted to Paris time. The flip side of which was that lying awake at night gave me lots of practice in going over my Joyo kanji…

types of actions

Sunday, May 4th, 2008

Another thing that I’d forgotten since the first time I read the GTD book: not everything that advances a project is a Next Action. Some actions are for the future (and hence belong on your calendar or tickler file); some actions need to be carried out by other people.

One concrete effect of this realization is that it gave me a way to flag the current status of all of my projects. I have a list of projects; each project has to have to have at least one item associated to it with the label NEXT, WAITING, or SCHEDULED. I may have multiple such actions, if I’m proceeding along multiple fronts; I may also have items on the project that don’t have any of those labels. (Those items might be ideas for future actions or reference materials.) But I have to have at least one item that’s flagged with one of those labels: if I don’t, that’s either a sign that it’s really a someday/maybe item, not a project, or that I need to sit down and come up with a next action on the project.

This also applies to e-mails. Some e-mails, even e-mails that I have flagged as active instead of archived, aren’t associated to a project; I stick these in a folder called ‘conversations’. But lots of my active e-mails are associated to a project. So I have folders ‘actions’, ‘waiting’, and ’scheduled’, corresponding to the labels above. (As well as another folder, ‘projects’, for reference material that I don’t want to archive just yet.) (Actually, not every e-mail in actions/waiting/scheduled is associated to a project in my formal project list: some of them are single-action projects that I don’t feel compelled to capture elsewhere. Though some may be few-action projects that really should be captured elsewhere? I don’t think it’s hurting me yet, though.)

The problem is that this requires too much work for some common operations. Say that an e-mail comes in that I’m waiting for. Then it’s a response to something that’s currently in my ‘waiting’ folder; to avoid forgetting that I’ve gotten the response, I typically move the response to ‘waiting’ as well, then (once I’ve finished clearing out my inbox), go to ‘waiting’, look for e-mails that have gotten responses, and characterize them accordingly. Another difficult issue is when an e-mail requires some amount of context to respond to entirely: do I just have a single message in my actions folder, or the whole thread?

I’m starting to think that gmail has gotten it right by replacing folders with (per-thread) tags. But I’m not willing to move even my personal e-mail usage to gmail’s web interface, and I certainly can’t move my work e-mail there. Does Thunderbird use tags, and make it easy to restrict your view to only messages with a certain tag? (Looking at the web page, I think so, but I’m not completely sure.)

For the time being, I am one of the eccentrics who reads e-mail using Gnus. I assume I’ll move off of it one of these years, but that time hasn’t yet come, and (despite Gnus’s folder-centric nature) I don’t think this will push me off of Gnus, either. I spent a few hours digging through the source code and asking questions of the newsgroup; Gnus doesn’t have tagging support, but it looks like it should be workable to add an extra header to saved e-mails and tell Gnus to limit its view to headers matching a certain value on that header. (A nice benefit of having a mail reader written in a scripting language.) I haven’t yet found the time to implement this, so there might be something that I’m missing, but I’m optimistic.

Once I’ve done that, I can get rid of the separate action/waiting/scheduled folders: those messages can all be in my projects folder, and I can add keystrokes to narrow my view to messages with a certain tag. Of course, this doesn’t solve the ‘response to waiting’ problem listed above; I may actually have my inbox be the same as my project folder. (I’m not sure what the effects of that will be.)

Even the current system is a big improvement over what my inbox used to look like. My actions folder never gets very big; when I got back from vacation, I had 50 e-mails in there when I was done with my inbox scanning, but that was an exception, and having those e-mails all in one place was very useful. (In particular, it allowed me to get it down to the normal 5-or-so level by the next day.) And the waiting and scheduled folders are useful views for periodic reviews. But it’s clearly an area where improvement is possible.

wozniak the memorious

Saturday, May 3rd, 2008

Jim pointed me to this article a few weeks ago, and I’m annoyed to say that I can’t get it out of my head. It’s about a guy who claims to have an algorithm (implemented by a computer program) to help you remember a lot more stuff a lot more solidly than you can with other methods, and it strikes just the right balance of potential importance and buy-in required to get me thinking about it more than I’d like.

The basic idea is this: if you want to remember something, you have to practice remembering it periodically. So it’s not enough to cram facts for an exam and then pretend that you know something: a few months later, you won’t consciously remember most of it. (Which is one reason why I question significant parts of our educational structure, but that’s a separate rant.) Instead, you have to periodically refresh your memory of the facts; fortunately, you can refresh less and less frequently over time and still remember those facts. Basically, the optimal time to refresh each fact is right before you’re about to forget it; this guy claims that he has a computer program that will serve up facts to you at the appropriate time for optimal practice.

This would be very useful to me (and, for that matter, to Miranda) right now: while he will happily apply it to anything, it’s clearly extremely applicable to learning foreign-language vocabulary. (And grammar!) And the theory is also obviously quite plausible (and apparently supported by the empirical psychological literature): I’ve spent a lot of time memorizing facts over the years (and in particular over the last year), and I can testify that this phenomenon of memorizing a word, and then not quite having it at the tip of your memory (or barely still having it at the tip of your memory) some time later is quite correct, and I’m quite willing to believe that there’s some optimal decay pattern for the refreshes.

But I also have a system for memorizing vocabulary that works moderately well right now: not perfectly, by a long shot, but I’ve gotten a lot of use out of it. In particular, right now I have 1200 or so vocabulary cards written down; I’m not about to sit down and digitize them all (which isn’t really necessary), but I’m also nervous about switching to another system which may or may not work, and (if I decide to switch back) to then deal with having some of my vocabulary on a computer and some on physical cards.

Also, to make matters worse, the software is basically Windows-only. So using it isn’t a realistic possibility for me. (It does seem like the sort of software that would strike a chord among Mac geeks, but who knows…)

But then I was idly thinking about it some more over the last day or two. Just how hard could it be to whip together a version of the software myself? The basic infrastructure is pretty straightforward: I need a way to save questions and answers, I need it to display questions to me, and I need to tell it whether or not I’ve answered the questions correctly. Then the software could save my history of when I’ve answered each question successfully (or unsuccessfully), and, based on his magic curves, figure out when it should next offer that question up to me. I’d never written a Rails app (a deficiency that I’d like to remedy), but all the data entry/display sounded like it should be very easy to whip up using Rails; I didn’t know what the magic sauce was, but it’s probably some sort of exponential decay curve, so I should be able to just look up his algorithm and implement it, right?

So I spent some more time at his web site, looking up his algorithm. And, at first, I was pretty disappointed. The most obvious place to start was with the paper version, but it had a few glaring deficiencies. The main one is that it had you work on groups of items all at once, treating each group as equally difficult (i.e. with the same decay curve). (Both the grouping and the equal difficulty seemed wrong to me.) Also (and this is, of course, just a minor annoyance, easily tweaked around), having the first review come four days after you’ve written down a group seemed way too long to me.

Reading that, I was pretty let down. After more poking around, though, it turns out that the algorithm has changed a fair amount over the years; I believe this is the most recent version of the algorithm listed on the website, and that page gives links to earlier historical versions. I haven’t tried to fully understand the most recent version (and, as far as I can tell, there’s not enough information there to reconstruct it, some of the constants there apparently need to be determined empirically), but there are enough ideas to try to remedy the above flaws. It seems like the current version doesn’t always use exponential decay, but I believe earlier intermediate versions did (version 4 seems a particularly useful touchstone), so I could easily start with that; there is a per-item difficulty factor, and there’s some idea that you can calculate the difficulty factor by counting the number of times you’ve gotten the item wrong.

Based on that, it sounds plausible that I could hallucinate an algorithm that probably wouldn’t do any worse than my current method for learning vocabulary. (My current method wastes too much time up-front in going over words that I would ideally review in intervals longer than a day, while at the same time not doing enough review of old words.) And I don’t think it would be too much work to whip up a program to implement it, and I’d get some practice with Rails to boot.

So: would doing that be a good idea? I’m still not sure: if I ultimately decide that I don’t like the results (whether because I don’t think it works well or because I don’t want to be tied to a computer when doing vocab review or because of some other reason), then there would be a real cost in switching back. And it may turn out that this is all really a side-issue: maybe it would be more effective than my current system, even significantly so, if I wanted to memorize a dictionary. But I don’t want to memorize a dictionary, I want to be able to, say, read Japanese, and doing so would probably give me frequent enough review of the words I was actually using to make a program like this superfluous.

Not sure where I’ll go with this yet; for now, I’m too busy, so it’s on the someday/maybe stack. But it’s surprisingly close to the top of that stack; we’ll see where I am in a couple of weeks.

just signed up for twitter

Thursday, April 24th, 2008

I just signed up for Twitter. (Should I capitalize the T or not? Hmm, looks like I should.) I mostly did that not because I want to start using it now but rather because I can imagine wanting to use it in the future, and, if I do so, I’d prefer to have a relatively readable URI. But some experiences recently in the red-bean IRC room got me thinking that I like getting little status updates from my friends; that IRC room is far too high a volume (at times) for subscribing to it to be a good idea for me, but Twitter could be a good alternative.

So I’m open to the idea of using it. Which means that I need two things:

  • Friends who use it, so I have somebody to follow.
  • Good clients for both Linux and Mac.

Presumably I can figure out the latter easily enough myself (though I’m open to suggestions, especially on the Linux front); can my blog readers provide me with a critical mass of the former?

comment management in wp 2.5

Friday, April 18th, 2008

I’m not convinced I like the comment management UI in WordPress 2.5: I accidentally marked several previously-approved comments as spam and then deleted them. Fortunately, I noticed my mistake within a couple of hours, quickly enough that the most recent nightly backup was still around, but it was still an easier mistake to make than I would have liked.

Lessons:

  • Other WordPress users, beware.
  • If you’re saving something you care about, practice restoring from the backup! (Some googling showed me how to restore from a mysqlhotcopy backup, but I still had one heartstopping moment where the initial restore didn’t work because of a file ownership/permissions problem.)
  • If you’re saving something you care about, keep multiple nights’ worth of backups.

So what’s the best way to address that last issue? We have enough disk space lying around that I could just keep around tens of nightly snapshots. But snapshots from night to night are, in general, not very different, and the process (other than transient spam comments) is largely additive; I don’t see why a complete revision history should take up that much more space than two full backups. (Hmm, maybe that’s a bit optimistic, because of all those spam comments.) But tools like that that I’m aware of are all line-based, and I don’t know offhand of a good way to do a mysql database backup that puts different rows in different lines. So I might have to use a binary diff program, which I’m not that familiar with; maybe bsdiff?

Maybe I’m going at this in the wrong direction - maybe I should approach the problem at a different level, backing it up using some sort of xml format that WordPress can handle through its management interface? If I did that, I’d have more confidence that I’d be able to manipulate the resulting backup files in a non-harmful fashion that’s more amenable to line manipulation.

For now, though, I think I’ll just throw disk space at the problem…

upgraded to wordpress 2.5

Saturday, March 29th, 2008

I’ve just upgraded to WordPress 2.5. Seems okay so far; the administrative interface looks pretty different, and I’m not entirely convinced by it (the blue and orange clash, maybe a bit too much whitespace). And there’s now a space for commenters’ pictures; not sure what I think about that, I’m tempted to remove it because I doubt it will get used much. I’ll think about it for a while, I guess.

i guess they did something else after all

Monday, March 24th, 2008

A postscript to my recent mac repair: when I got my computer back, it seemed that the DVD drive sounded a bit different, and now I’m 80% sure that they replaced it without my asking them.

I can’t figure out if this goes in the good column or the bad column. It is true that the drive wasn’t performing as well as I would have liked (though it was probably better than the original drive I got with the machine). Given that, bully on them for noticing it, and I suppose it’s good that they replaced the drive while they were working on the machine.

The flip side is that this machine is on its third optical drive in just under two years; a reminder that the drive quality on these machines isn’t very good, and it raises the possibility that the replacement drive may in fact not turn out to be better than its predecessor.

Still, it did manage to burn a DVD on its first try, which at least suggests that the drive has some basic competency in its anointed function.

A note to myself, should I ever wonder in the future about whether or not the drive has silently been replaced: the current (probably new) drive identifies itself as a “MATSHITA DVD-R UJ-857″

random links: march 23, 2008

Sunday, March 23rd, 2008

A bit video-heavy today.

version control systems and filesystems

Wednesday, March 19th, 2008

I seem to have version control systems on the brain somewhat these days. Until not very long ago, I didn’t think much of them: they’re obviously important to have around if you’re collaborating on software, but other than that, who cares? But then I started getting addicted to looking at diffs after having checked in code, and thought that, perhaps, they might be important if you’re working on software even if you aren’t collaborating. And then I started getting more involved in the administration of the server hosting this blog, and began to appreciate that they’re useful for storing other kinds of information: in particular, we store all the configuration information in Subversion repositories.

And then Jim Blandy pointed me at the figure in this discussion of ZFS and this figure explaining Subversion’s “bubble-up method”. (Hey Jim, start updating your blog and I’ll actually link to it! :-) ) And I started using Time Machine; as John Cowan reminded me, I’m using it in ways that are seriously lacking from the “catastrophic recovery” viewpoint of backup (if my house burns down, I lose that backup, too), but of course backups are useful for other reasons, namely to save you if you make a stupid mistake and accidentally delete something you wish you hadn’t.

So now I’m starting to wonder: are we moving to a world where VCS-like functionality will be considered a part of basic filesystem functionality? At the very least, every time I create a new directory, I now ask myself if I should place it under revision control; frequently the answer is “yes”. For example, I’ve now placed my GTD files under version control; this seems like a particularly good fit, because GTD is all about removing subconscious worries, and now I don’t have to worry that I’m deleting information as I check off steps in a project that I might want to refer to later! (I’m using Mercurial for that repository, initially for a change of pace and because it’s one of the cool new kids on the block, but after thinking about it more, I can easily imagine wanting to clone that repository to this laptop and make commits while I’m off the net.)

There are some interesting design decisions here. On a basic level: do I want to back up everything (with a revision history), or do I want some files/directories to be excluded? (Browser caches are one candidate for exclusion.) And do I want to be able to rewrite history? In particular, do I want to have the ability to remove confidential information from all backups?

How intentional should the creation of new revisions be? I can imagine a filesystem that stores every single change as a separate revision, or one that takes revisions periodically (as Time Machine does). But, in a VCS, revisions generally serve some sort of communication purpose: if I’m working on source code, I don’t want every time I save a file to be reflected in the revision history, I want control over when I commit. Not so clear in other instances; I’m thinking of setting up a cron job to do a commit on my GTD info every night, and reserving manual commits for special occasions.

I’m also wondering what other lessons filesystems could learn from VCSes. For example, distributed VCSes are all the rage these days; could networked filesystems learn something from that? (Though I can’t quite envision how the merge problem will get solved there.) Are there situations (cloud computing?) where we can get some mileage from relaxing consistency guarantees and viewing different filesystems as repositories with a genetic relationship, with changes periodically pushed/pulled in one direction or another? Maybe we should pay more attention to asynchronous notifications, either in the filesystem world or the VCS world; I spend enough time looking at networking problems through a RESTful lens that I get the feeling that asynchronous notifications are a bit out of style, but I’m appreciating them at work more and more these days. (Hmm, Twitter suggests that they’re not out of style at all, doesn’t it?)

No big conclusions, and I’m sure other people have been aware of some of these issues for decades, but I’m enjoying noticing these things.

macbook pro latch repair

Tuesday, March 18th, 2008

A week and a half ago, I noticed that the latch on my MacBook Pro had stopped reliably holding the lid shut. Which isn’t a huge deal—I mostly use it at home, and turn it off between uses—but at some point it would annoy me. My new organized self recognized that the most likely reason for me to not get it repaired immediately was plain old procrastination, so I looked at my calendar, decided now was as good a time as any to be without it for a week, loaded up a few dozen podcasts that I was curious about to tide me through my iTunes-less period, and prepared to have it repaired.

I wasn’t sure if the Apple store could fix it on the premises, but I figured I’d give it a try, so I made an appointment there. As always, they weren’t ready to help me when my appointment time arrived, but at least this time they were ready to help me only 20 minutes or so after the time instead of most of an hour later. (Though even 20 minutes late on a random Thursday morning suggests that their scheduling algorithm could be improved.) They said it had to be shipped away to be fixed; I figured I might as well let them take care of the shipping instead of calling to ask for a box, so I put it in their hands. (This was on a Thursday.)

I’d been under the impression that they used fast shipping, but when I checked Apple’s web site on the weekend, they said they still hadn’t received it. On Monday, same story. On Tuesday, a (quite pleasantly) different story: not only had they received it but they’d already fixed it and were preparing to have it shipped back. On Wednesday, it was being shipped back, there was a tracking number, the shipper claimed it was out for delivery, and late that morning it had been delivered.

So I figured I’d get a phone call from the store soon. By 3pm, though, no phone call; I called them, they confirmed they’d gotten it, but they wanted to run some tests on it to make sure it worked, and couldn’t tell me how long it would take. I questioned this; the person on the other end noted the annoyance in my voice, went away for a minute, and came back and told me that I could just pick it up without them testing it if I wanted. Which I planned to do; an hour later, I got a phone call saying that they’d finished testing it.

I showed up at the store at 5:20 or so; somebody came over to help me, typed on a computer, and told me to wait a minute. No problem, that’s why I downloaded all those podcasts. (In an amusing bit of synchronicity, I was listening to a portion of a Retronauts episode where they were talking about the recent XBLA port of Marathon 2. Sounds like they’ve fixed the nausea issues, maybe I should give it a try.) Five or ten minutes later, somebody else checked on me, but wasn’t concerned that I was still waiting. Another five minutes, and I talked to the original person, and he was more concerned. Eventually, twenty minutes or so after I’d entered the store, somebody emerged with my laptop. And, as it turned out, with one of its DIMMs removed and placed in a separate bag. Another three minutes or so and I was out, barely in time to pick up Miranda.

Thoughts:

  • Ironically, some of the things that annoyed me were a result of systems helpfully providing information. If Apple hadn’t given me a mechanism for tracking where my laptop was, I wouldn’t have known that it apparently took four and a half days to get to the repair depot, or was sitting in their store for five hours before they let me know.
  • Some of the employees did go out of their way to try to be helpful. Some didn’t, but none were actively annoying.
  • The system they work within, however, is actively annoying. I’m tired of Genius Bar delays. (I’m tired of the name Genius Bar.) I don’t want them taking extra time in the store doing checks: if the repair depot can’t repair things properly, they should fix the problem at the repair depot instead of trying to inspect in quality after the fact.
  • And I really don’t want them removing my memory when I ask them to fix a hardware problem that is clearly unrelated. Fortunately, I installed the memory myself, so I had the correct screwdriver to put it back in; otherwise, I would have been a lot more annoyed at the situation. (Especially given that they’d already come close to making me miss a daycare pickup time, so I didn’t have time to wait around for them to undo their work.)
  • Next time, I’ll just avoid the store and send it in myself: that part of the process seems to be nice and speedy. (Unless the store really did send it in fast and Apple’s computers erroneously claimed that it was in transit when it had been received and was waiting for repairs.) Actually, next time I’ll ask around first and see if there’s a local Mac shop that can repair it under warranty on the premises, so I don’t have to wait for a few days.

I’m not too annoyed at the whole situation; it’s not going to significantly decrease the chance that I’ll buy another computer from Apple (if for no other reason that I don’t trust other manufacturers to do any better), and there were some actively good aspects of the situation. But they definitely missed some simple ways to make a good impression on me, or at least to fail to annoy me.

resume formats

Sunday, February 24th, 2008

I’m trying to hire right now. Which means that I get to read lots of resumes, mediated by various pieces of technology. Which is annoying, among other things because the format in which the resumes are most easily read isn’t necessarily preserved by those mediating technologies.

Specifically, Sun’s internal tools only accept resumes in either text format or variants of Microsoft Word. Lots of people apparently don’t have a resume natively available in one of those formats, which means that their resume gets cut-and-pasted from another format into text, with the result that it looks like crap and is a pain in the butt to read. (In particular, resumes are typically full of bullet points and indentation, and neither of those reliably survives that journey.) I literally spend most of five minutes going through a typical resume changing the formatting so it doesn’t get in the way of my reading the resume; it isn’t a complete waste of time, because I’m doing a first pass at skimming the resume while reformatting it, but it also isn’t much fun.

So: what is a resume writer to do? It’s been ages since I’ve updated my resume (and no, I am not looking for a job: I just think it’s wise to update your resume every year or so, since I can’t reliably remember what I was doing much farther back than that); the last time I applied for jobs, I used a LaTeX file which I converted into PDF. (Which took a surprising amount of care to get looking right, if I remember correctly: some sort of font problem in the conversion.) Which means that hiring managers at Sun probably wouldn’t like me!

So what are good resume formats these days? Based on my experiences, it’s essential to have a good-looking text version: text is easy to e-mail, it’s a fallback that will always be available. You also want a version that’s nicely formatted; presumably PDF is the format of choice there. And you may want to put your resume on your personal web page, so HTML is probably a good third option.

But, of course, you only want one source representation. I used LaTeX for this in the past, and I’m still not convinced it’s a crazy idea: I think there are probably decent tools to go from LaTeX to HTML or text. Having said that, there’s also nothing about LaTeX that makes it uniquely well suited to the task. A resume is a lightly formatted extremely hierarchical document; any sort of markup language that lets you easily express that hierarchy while giving a reasonable amount of control over formatting should do the trick.

In particular, HTML should probably do the trick. You’d want to take a bit of care over the CSS that you use to style it with, but I don’t think resumes put any excessive demands on styling. You’d especially want to take care when converting it to PDF; PrinceXML seems to be getting a fair amount of buzz these days, so I’d be tempted to play around with that, despite its closed-source nature. Though my first line of attack would just be to provide a print-specific version of my CSS file; among other things, that would improve the way it looks to people who are printing out the resume from my web page. Were I to chose to put my resume there; not sure what I feel about that yet.

What’s the best way to convert HTML to text in a way that works well for resumes? I could have fun with XSLT, but that’s probably overkill. Honestly, maybe just loading the web page with Lynx would be good enough; I’d have to try it and see, once I get around to actually updating my resume. If not, there must be hundreds of other options.

One other tip for job applicants: when you attach your resume to something, the original file name of the resume will be available to the person receiving the resume, and it will probably be given as a default option for that person to save the resume under. So realize that, when you name your resume, you aren’t the main client of that name, the hiring managers are. In particular, if you want to give your hiring manager warm fuzzies, don’t call it “resume.pdf” or “David C Alternate Resume.pdf”: call it “DavidCarlton.pdf” or “DavidCarltonResume.pdf”. The details aren’t important—different hiring managers have different conventions about the names they’d use to save resumes under—but make sure that your full name is there and that there isn’t other extraneous garbage in the name. If you need to store metadata like that in your local copy, put it in the directory hierarchy, not in the filename.

front row

Saturday, February 23rd, 2008

Recently, we’ve been watching some videos on our TV via our Mac, using an appropriate video adapter and Front Row. Which works pretty well; I don’t plan to make a habit of it, but it’s nice to know that the option is available for the times when I want it.

Actually, I take that back: it works okay, but there are some flaws. Front Row works badly with multiple monitors: it’s unwilling to display on only the second monitor (or rather, I’ve gotten it to do that twice, but I can’t repeat that, and suspect that what I ran into was a bug rather than a feature). This means that, to watch stuff on TV, I have to have the laptop set to a small resolution, which is ugly. (And which takes more clicking than I’d like to switch out of, and it also rearranges my desktop icons.) Also, the laptop goes to sleep if the lid is shut, even if another monitor is connected (cooling issues?), which means that I leave it mostly closed and watch the movie on the TV while weird flickering shadows are mimicking it on the part of the laptop that I can see. And the remote is cute, but I think I’d want a few more buttons if I were using the computer as a DVD player. (Maybe not, though.)

Still, it’s good enough that I can easily imagine not buying another standalone DVD player: a Mac mini (or new Apple TV model?) that was plugged in directly to our TV would probably do just fine, and could also serve double-duty for other purposes. (E.g. iTunes storage.) I’m not sure Apple has just the product I want yet, but they’re fumbling around in that area, and maybe they’ll produce the machine/software I want soon enough.

Which raises the question: what is it that I want? I certainly need something that can work as a DVD player (which you would think is a no-brainer, but rules out the current iteration of Apple TV). My current DVD player is also a DVR, and I would prefer to keep that functionality. (Such as it is: the machine is complete crap, and while it has deigned to start reading DVDs again, it still frequently forgets to record TV shows.) The digital transition is happening in under a year, though, and I assume Comcast will use that as an excuse to force me off of analog, at which point my current DVR won’t work anyways, and I’ll probably have to go through Comcast to get that functionality. A lower price than the current Mac mini would be nice, too.

And then there’s always Blu-Ray: maybe I’ll actually care about that a year from now? And maybe Apple will have released a computer with a Blu-Ray drive? Which I have decidedly mixed feelings about: there seem to be significant technical costs to supporting it in your operating system, and while Apple might do a better job of navigating those costs than Microsoft did, I’m not convinced they’ll be able to really avoid them. So it looks more likely to me that I’d want to go the PS3 route if I wanted a Blu-Ray player. (And eventually there will be PS3 games that I feel compelled to play, won’t there?) Honestly, though, I feel not the slightest urge to switch off of DVDs right now, and I don’t see that changing for several years.

No clear conclusions, and I’m pretty sure we’ll stick with our current setup through 2008. But maybe we’ll fiddle around with this in 2009, depending on the effects of the digital transition and on what hardware is available.

random links: february 18, 2007

Monday, February 18th, 2008

time machine

Sunday, February 3rd, 2008

I hooked up a spare USB drive to my Mac a couple of weeks ago and turned on Time Machine. Seems to be very easy to use; certainly the process of making backups is painless, and while I haven’t tried a restore yet, the GUI looks easy to manage. My only complaint so far is that it doesn’t seem focused on people like me who are only plugging in a backup drive at sporadic intervals: it wants to back up every hour, at 48 minutes after the hour, and insists on that schedule even if I just plugged in the backup drive for the first time in a few days. (So if I attach the backup drive at, say, 9:02, then I have to wait 46 minutes for my next backup, even if I haven’t backed up the computer for days.)

If you have a Mac, I highly recommend it: spend 100 bucks (or whatever) on a USB drive, set up your environment so that drive is near a location where you frequently use your laptop, and get in the habit of connecting the two periodically. (If you have a desktop, then just leave it plugged in all the time.)

living in the cloud

Sunday, February 3rd, 2008

The server hosting this blog had some troubles recently, caused (probably) by spikes in e-mail and web traffic happening at the same time. It’s under control now, but that got me wondering: it’s unfortunate that we have a single, not particularly scalable box that we’re depending on to carry that load. Wouldn’t it be nice if we could move to a setup where resources could expand (or contract) as necessary? In particular, cloud computing is all the rage these days (my own employer has a grid, or Amazon with its S3 storage and EC2 compute; as will doubtless become clear, I know essentially nothing about any of those!); what are the barriers between us and that world?

What are the basic requirements? At a minimum, we can’t be tied to a single machine (or assume that we are running on a single machine at any given point in time, since we want to be able to scale up), so we need to diverge the notions of compute and storage. My guess is that we’ll want multiple incarnations of at least one of those concepts; let’s run through some examples and see.

One test case: can we come up with a server that feels pretty much like the one we’re running now, with the exceptions that there could potentially be multiple live instances (with some shared storage) and that we shouldn’t count on long uptimes? (Hmm, how long should the uptimes be? It’d be nice if instances disappeared entirely when nobody was logged in: maybe provide a mechanism where an ssh connection triggers an instance of the server appearing.)

I guess the compute here would look like an OS instance running on a ram disk. For persistent storage, you’d want to provide some sort of NFS-like view of a subset of the cloud’s storage pool. I guess it would be okay if that persistent storage was somewhat slow; you could have a fast ram disk for situations where you needed to have temporary local data. (Hmm, what about situations like compiles? That could get a bit sticky.)

Those abstractions would provide the basic generic server infrastructure plus home directories and shared storage areas. One question is: what parts of the generic server infrastructure change on a somewhat frequent basis? At least frequent enough that you don’t want to spin a whole new ram disk image just for those changes: e.g. when I change my password, I’d rather not have to create a new image.

Or rather, I don’t want to have to create a new image if that’s a heavyweight operation, so let’s try to make it lightweight instead. To do that, I think you’d want to separate off configuration information from other sorts of information. So you’d want to have a mechanism where you can create a base image, e.g. by picking a set of packages from your favorite Linux distro. And then basically apply a diff to that by adding/editing configuration files. If there are easy ways to manage that diff and add/remove packages, it should be easy to tailor the image that you’re using.

One question: is this case of mimicking shell access to a single Unix box actually useful? It doesn’t solve big compute problems; and if you have a small compute problem, it’s cheap enough to get compute power that you can run at home. So maybe there’s no real need for traditional shell access in a cloud-based world. I’m not convinced, though: e.g. if you want to run custom software in the cloud, you need to be able to compile it (assuming it’s in a compiled language…), and I don’t want to require my home Linux distro (or, for that matter, my home machine’s architecture) to match whatever’s running on the cloud.

Enough about traditional shell access (which is, after all, the most boring use case); what about other services? Recently, we’ve had mail problems; what does a mail server look like in the cloud?

It mostly has a separate pool of storage: I guess it might be accessible from the shell view (you might want to mount your mail spool in your home directory?), but there’s not a lot of overlap there. There also might be some amount of per-user customization (.procmailrc files or their moral equivalent) that you’d want to view from your home directory; not a lot, though. You’d want the potential to farm out incoming SMTP connections to one or several computer servers, growing on demand. And you’d want to divorce receiving mail from reading mail: the SMTP server and the IMAP server have to access the same pool of storage, but there’s no reason why you’d want them to run in the same compute environment. And then there’s sending mail and managing mailing lists.

Hmm, mail is pretty boring. Honestly, I’m not sure why we run our own mail server any more: it seems like all the customization issues are solved well enough, so what I really want is somebody to accept mail, filter it for spam, and store it until I retrieve it, and lots of people can do that fine for me. The only hard part is spam filtering, and that’s a hard enough problem that doing it as a hobbyist effort is doomed to failure.

So let’s move on to the web server, which is the most interesting problem. Different people want to install different packages on their web server: different programming languages, different publishing platforms, different platforms for viewing data that may not have originated from that web server, and people are writing new code in all these areas all the time. So it’s nothing like the mail server situation from that point of view.

How does the configuration look? And to what extent can/should compute instances be shared? Right now, we have one web server with all sorts of stuff installed on it, and a large amount of configuration inside the Apache configuration dir. Some of that configuration information is global, some of it is on a per-vhost basis; there’s also configuration information in individual users’ directories (in the form of .htaccess files). I tend to think that, in a cloud view, that’s the wrong way to slice-and-dice: on a package level, the fact that I want, say, access to Ruby doesn’t mean that everybody does. So probably each vhost gets its own compute configuration? (Of which there could be one or many or even zero instances running at any given time, depending on the traffic load.)

Besides the configuration information, you obviously need storage, for the files that you’re serving up. Right now, that storage is sitting inside my home directory, and in general it would make sense for the storage that’s used by the web server to also be mounted from the shell account view of the cloud. Though, these days, thinking about editing files in your home directory is perhaps a bit passe. In fact, what I typically do is edit files on my home computer and then push them to the server via rsync. (Well, what I typically do is write blog posts, which are stored in a completely different location, about which more later.)

Maybe we should take that idea and run with it: there’s a repository that the web server dishes up to the outside world, but I want to be able to edit it from a remote computer. Using rsync is okay for that, but we have better tools now for managing remotely modifiable repositories: what we really want is a version control system, so I can see a diff before committing, so I can commit from multiple locations, so I can roll back my mistakes. That seems like a useful abstraction that a cloud-based environment should provide: its storage layer should have version control primitives that can be implemented efficiently and are strong enough to, say, let you write a subversion filesystem that works off of it. (I’m under the impression that Google has a subversion filesystem that works with their own internal storage cloud.) That would also be useful for the configuration part of my earlier view that the compute environment consists of a set of packages plus a diff giving configuration information.

Also, to the extent that we’re serving up flat files, we’d like to have as much of that be handled directly by the cloud’s storage abstraction, instead of having it done by the compute layer that’s running the web server. Ultimately, more and more of the web server will, I think, have it just be acting as a proxy. If it figures out that it wants to serve up a file that’s sitting in the storage cloud at some address, it shouldn’t look up the contents of that file through the NFS view of the storage cloud that I hallucinated above, it should simply forward that web request to a web front end of the storage cloud, and let the bits flow. Ideally, the bits wouldn’t even be flowing through the web server at all, once the web server has identified the correct bits; c.f. Van Jacobson’s Google talk.

Of course, these days a lot of the content on a web server (e.g. this immortal prose) isn’t sitting in some directory hierarchy mirrored by the web server: it’s sitting in a database. So any web server platform needs to provide that; that’s an important enough concept (and one that’s different enough from a traditional filesystem) to deserve its own separate abstraction. Not sure what to say here, other than that it would be nice for query results to be RESTful enough that, whenever possible, we can forward them directly from the database cloud to the user, with the web server cloud doing as little work as possible. (Which requires restructuring your web pages; see Tim Bray’s “The Real AJAX Upside”.)

Hmm, so a web server needs:

  • A set of packages that you can select to provide the basic functionality.
  • Configuration that’s easy to modify. (Ideally with a VCS view.)
  • Storage that’s easy to modify. (Ideally with a VCS view, ideally massaged as little as possible by the web server.)
  • A database abstraction. (Ideally with a RESTful view, massaged as little as possible by the web server.)

What am I leaving out? Logging, I guess: you want to be able to figure out who is accessing your data. Which might need a different twist on the storage abstraction: you want fast appends, possibly with indexing to make data analysis easier. Hmm, there’s probably something there that could be shared with the mail server concepts.

Enough about the web server: any other big concepts? We host subversion repositories at red-bean; probably if we can get the above working right, we can get that working right, too. And then there’s big compute projects; I’m sure the cloud has something useful for HPC-style applications, I’m sure the above abstractions aren’t good enough for it, I’m sure my employer has lots of productive ideas in that vein, I just don’t have enough first-hand experience to say anything in particular.

I wonder what any of this has to do with existing cloud abstractions? I’ve read a bit about S3 (it was mentioned as an example in the REST book), but I don’t know squat about EC2, and I don’t even remember the name of Amazon’s database abstraction. I don’t see anything in the picture that I sketched above that should be too difficult to bring to fruition: you can start by providing a traditional view of cloud resources, and then do a few targeted replacements to make it significantly more efficient. (E.g. teaching Apache how to interact directly with the storage server, instead of routing requests through a file system layer.) And then you’ll want to start rethinking your designs, increasing the range of components that can be addressed directly as resources (and hence provided in an optimized way by the cloud core abstractions), stitching them together on the client whenever possible.

Should be fun. I suspect that the next red-bean upgrade will be to another physical machine, but for the one after that, who knows?

random links: january 26, 2007

Saturday, January 26th, 2008

stupid gmail

Saturday, January 26th, 2008

I do not understand the way Google handles their accounts. I have (well, had) two Google accounts: a gmail account (david.b.carlton) that I never used and another account (associated to my public e-mail address) that I use all the time for reading blogs. On the recommendation of some friends, I decided to start using Google as a spam filter, forwarding my mail through their servers; to that end, the natural thing to do would be to unify those accounts, tell the gmail account to forward non-spam mail to my public e-mail address, and bask in the drastically reduced volume of spam that I receive. (Along with some procmail rules on my public account to route e-mail through gmail unless gmail has seen it already.)

Well, no. Some issues that turned up:

  • You can’t unify an existing gmail account and an existing Google account associated with a non-gmail e-mail address.
  • You also can’t do that indirectly by deleting the gmail account and then creating a new gmail account with the same e-mail address as the previous one but with the new gmail account linked to the existing Google account: even if you’ve deleted a gmail account, you (or anybody else) still can’t create a new gmail account with the same name.
  • If a gmail account is linked to an external e-mail address, gmail gets extraordinarily possessive of the latter e-mail address: it refuses to forward e-mail to that address, and it also refuses to forward e-mail that was originally sent from that address.

The upshot is that, after an hour and a half of frustration, I ended up where I started: I still have a Google account that I use all the time linked to my public e-mail address, I have a separate gmail account (which is now forwarding mail to my public e-mail address), but that separate gmail account has a name that I like somewhat less than the name of my first gmail account. (Or, for that matter, than the name of a second gmail account that I created but was then unable to use for the purposes that I wanted.)

I’m actually a little sympathetic to their behavior on the first two issues: the first smells to me like legacy implementation headaches, and I can see how their decision on the second issue avoids a certain class of problems. But their behavior on the third issue just seems like a conscious choice of bizarreness: why refuse to forward to the one external address that I’m guaranteeing is mine? Just because I have an account with Google to use their services doesn’t mean that I’m handing all control of my e-mail over to them…

random links: december 31, 2007

Monday, December 31st, 2007

careful with your layouts

Friday, December 7th, 2007

I recently turned on “fast user switching” on the Mac, and just discovered that the login dialog keeps the previous user’s keyboard layout, instead of reverting to the system default. Which is a problem if the previous user uses Dvorak, the new user doesn’t, and the new user is typing in a password so she can’t even see that something’s gone wrong by looking at the characters that appear.

In fact, switching layouts and then switching users doesn’t work, either: it goes back to Dvorak! Weird. Changing to the Finder, then switching layouts, then switching users works.

To be fair, I can see how this sort of usability bug could slip through testing…