A few months back, I noticed that my desktop machine was using its drive a lot. (The machine, sadly, still has a magnetic disk; I’m just waiting for new iMacs to be released to replace it.) Poking around, it was the backups (Time Machine and Backblaze, which I highly recommend). So something had caused a lot more files on my machine to get modified on a regular basis; I’d signed up for Apple Music recently, so I was afraid it was doing something, and, sure enough, I saw music file names show up in the Backblaze upload list.
I grabbed one of those files from an earlier backup, in order to compare it with the current version. And, of course, the obvious way to compare two binary files is: open them up in Emacs. Specifically, open them up in two buffers, then do
It turned out that the two files differed in three locations; all those locations were in the first few hundred bytes. So that’s good: at least Apple Music hadn’t replaced my file from some version in their library that they’d decided matched my file, they were just changing metadata. (At least I hope it hadn’t: I don’t have any reason to believe that the file I downloaded was from before I turned on Apple Music, and, in fact, as it turns out below: I have an active reason to believe that I didn’t grab the original.)
That raises the question, though: what’s changing in the metadata? I wanted to understand the bytes a little better; so I put both buffers into
hexl-mode. Here’s what it looked like:
The first difference is highlighted; one copy has the bytes
f2ae01 while the other copy has
b212a2. And, actually, all three differences had the same bytes in both the old and new versions.
The other thing that you can see (either in the
hexl-mode version or the original version) is that there’s actually some ASCII around there; in particular, before the modified bits, you’ll see the strings
mdhd. So there are four-character tags in this metadata; if I can figure out what those tags are, maybe I can figure out the meaning of the bytes that changed.
After some poking around, I found a QuickTime File Format Specification. Here’s what it says about
The bytes after
0000 0000 bdfc 5ded d4f2 ae01, with the last three bytes being the ones that changed. Comparing that with the layout diagram, we see
00 is a version,
000000 is flags,
bdfc5ded is a creation time, and
d4f2ae01 is a modification time.
So the modification time seems like it’s changed. (And, looking up the other two tags, the bytes changed there were also modification times.) As a sanity check, let’s try to decode it. At first, I assumed it was a unix time stamp, but that translates to base 10
3572674049, which doesn’t look like a unix timestamp to me. (Turns out that it would be a date in 2083.)
Looking further in the documentation, it says that the modification time is “A 32-bit integer that specifies the calendar date and time (in seconds since midnight, January 1, 1904) when the movie atom was changed. It is strongly recommended that this value should be specified using coordinated universal time (UTC).” Googling a bit, I found a Mac HFS+ Timestamp Converter which seemed to expect those;
3572674049 translates to
Sat, 18 Mar 2017 09:27:29 GMT, while the time in the other file,
0xd4b212a2 = 3568439970 translates to
Sat, 28 Jan 2017 09:19:30 GMT.
And that all makes sense: those timestamps must represent when iTunes was last scanning the file for some Apple Music-related reason.
Stepping back, though: iTunes / Apple Music is modifying the file to update a modification time; and that results in about a gig and a half of backups happening on my computer every day. And, when I write it that way, that’s ridiculous: maybe don’t modify the file, and then you won’t have to update the modification time?
Of course, Apple Music must be using the modification time for some other reason, some sort of scan time, it’s not a literal modification time. But it would be far better if that scan data were stored in a separate file, instead of modifying the music file itself: on a conceptual level, the music hasn’t changed, it’s just bookkeeping information that has changed, while on a pragmatic level, it causes a ton of extra backups to be generated. I’m mostly noticing it with Backblaze, but the consequences for Time Machine are equally bad: it means that my backup disk gets full with multiple versions of the same music file, so my backup history gets cut off more quickly than it should be.
This post has not been revised since publication.