Comments on: living in the cloud

By: david carlton

david carlton — Wed, 06 Feb 2008 05:52:37 +0000

Well, today I’m just running a blog, so it doesn’t matter so much. I don’t know how often contents get synced to disk, but that should be, what, no more than every 30 seconds? And we do offsite backups nightly. Good enough for a blog, given that the server only goes down unexpectedly once every few months; if I were doing something more serious, I’d want to RAID the disks, and perhaps back up more frequently.

Then again, maybe I could get close enough to that with the Amazon solution: dump the db contents to S3 every 10 minutes or so? I guess it depends on how often the compute instances go down, and what the costs would be for frequently backing up to S3. Still feels wrong, but maybe that’s partly because I’m not used to thinking in those terms.

By: John Cowan

John Cowan — Tue, 05 Feb 2008 13:44:58 +0000

Well, what do you do with your database today? Do you have a full two-phase-commit distributed database for reliability, or do you take frequent backups and risk the loss of a certain number of transactions?

If the latter, then backing up the database to S3 is perfectly plausible, using its native backup tools.

I don’t know much about Amazon’s database, except that it’s generally similar to Google’s BigTable: a three dimensional spreadsheet using arbitrary row and column names and the update timestamp as the third dimension. In BigTable, at least, we can further say that the row is the unit of atomicity, that iterations are over rows whose names share a prefix, that the column (or group of columns) is the unit of access control, and that older versions of data can be garbage collected.

By: david carlton

david carlton — Tue, 05 Feb 2008 05:05:48 +0000

John: thanks for the info. Good point about the local disk; that would solve the compile issue nicely. I’d be leery about using that method for a database, though, but clearly S3 isn’t the solution there, either. How does Amazon’s database abstraction work? I’d heard some iffy things about it.

Brian: yeah, getting a second machine might make sense. I hadn’t thought of putting svnbook on S3, but that could be a good idea, too. Hmm.

By: Brian Carlton

Brian Carlton — Tue, 05 Feb 2008 02:14:14 +0000

You are making it too hard, I think. Just move half the domains to a second machine. Or a few of the big usage ones. Or move mail to the new machine, if it’s web and mail evenly. Or put the few files that are most of the usage on S3, if it is just people getting the SVN book.

By: John Cowan

John Cowan — Mon, 04 Feb 2008 00:19:50 +0000

Well, you’ve pretty much characterized EC2 correctly, except that even the ten-cents-an-hour compute instance is fairly decent: it’s a (virtual) single-core 1 GHz 32-bit system with 2 GB of memory and 160 GB of local disk, which you lose when the instance goes down. Fortunately, you can trivially get remote access to S3 (no charge) for persistent storage, though you don’t want to keep your database there because writes to S3 are slow. (Instead, you back up your database there using whatever mechanism your database engine provides for backup). And of course you can run tens or (if you ask nicely) hundreds of these things.

If your job is memory- or compute-intensive, there’s also the forty-cents-an-hour model, with two 2-GHz 64-bit cores, 7.5 GB memory, and 850 GB of local disk; the eighty-cents model has four cores and twice the memory and disk. So provided you don’t have a monstrous large server, and provided you can adapt to semi-offline storage for most of your storage needs, you can just move everything there and apply the $585/mo for the big system against whatever it’s costing you to own your system now.

A friend of mine makes a living doing made-to-order Web crawls for customers who are looking for specific kinds of things on the Web. He uses S3 and EC2 exclusively, so he had no capital cost to start up his company at all. (The crawler is a command-line application, so ssh access is what he needs and gets.) He typically experiences instance uptimes in the general range of 30-60 days.

It’s true that EC2 doesn’t provide the kinds of things you spend most of the post talking about, but then it doesn’t need to. As this model becomes more popular, people will begin provide userland software that is intended to run on tens of instances, rather than on a server and can be streamlined appropriately.