TL;DR version: my iMac’s fusion drive “lost its marbles” right before I went on vacation. This has delayed cutting an 11.05 release candidate 2 with a few scenery fixes, but we should get to it next week. In parallel, we’re working furiously to get all of the code locked down for 11.10.
Everything else that follows is really, really, really, really boring. I’m writing it only because some of my co-workers watched this slow motion car crash and tightened up their backup game a bit. If my drive fail can shake you out of complacency, read on.
Basically: my iMac is my main development machine, and the data is backed up and/or duplicated in a bunch of different places: a USB time machine archive, a Backblaze cloud backup (both are “full machine”), DropBox for virtually all of my documents, and my work for Laminar is kept on Laminar’s source control servers. Data loss was never a huge risk here.
Time loss, however, is a real risk! My goal was to lose as little work time to fixing my machines as possible. So my plan was: restore from time machine disk backup, request a cloud backup restore via hard drive, return the hard drive. The total cost would be a few hours of disk copying and less than an hour of my time. My development machine would be usable for new work while waiting for the cloud backup to arrive.
This has not gone as well as I had hoped! You can learn from my fail here — a few notes.
- Your backup might as well not be a backup if you have not checked that the backup contains the data you think it contains. It turns out that both the cloud backup and time machine backup were missing files! I’m very lucky that they weren’t missing the same files.
- Time machine sometimes decides not to back stuff up. OS X has a hidden per-file/directory attribute that can exclude a file from backup without showing it in the Time Machine UI! Once you check your time machine backup and find a folder is missing, from terminal you can do tmutil isexcluded <file path> to see if the file has been explicitly excluded. If it is, tmutil removeexclusion <file path> fixes this.
- Backblaze ships with a bunch of file exclusions too – mostly designed to not archive stuff that isn’t your data. But beware – stuff you care about might not be on the list. (For example, virtual disks in a virtual machine are excluded by default.) I had to add back .iso files to the backup list. Backblaze backups are also not bootable. This is something I can live with, but always read the fine print about what’s in the backup.
- The Backblaze data restore has been very slow – over ten days for less than half a terabyte and it’s still “in progress”.* While they haven’t exceeded the maximum restore time they advertise, it’s slow enough that the delay matters.
- One other note on Backblaze: I saw major performance problems on my iMac while Backblaze was running, even when a backup was not running (since they were scheduled for overnight). I do not think this is necessarily Backblaze’s fault – it may be a problem with CoreStorage (which “runs” the fusion drive) or even a fault with my drives. From what I can tell, cloud backup exacerbated it by putting a lot more file traffic on my system.
- A possible danger if (like me) you keep documents on DropBox to have them everywhere: when I restored my iMac from Time Machine, I was exposing DropBox to my data from a week ago. I didn’t wait to see if DropBox would figure out what happened; I unlinked my iMac while it was offline after the restore, then re-established DropBox and let it download my data. Better safe than sorry.
- I have been backing up to portable 2.5″ USB drives because they’re cheap and really convenient, but they have a down-side: the mechanisms can easily fail and take your whole backup down. I have five of these drives and one has failed in a three year period.
- I’m really unhappy with CoreStorage, to the point where I would not recommend a fusion drive anymore. CoreStorage is an Apple virtual-volume technology (similar to soft-RAID) that makes one small SSD and one large HDD look like a single unified volume, with some of the data “cached” on the SSD for performance. CoreStorage is a lot newer than HFS, so when things go wrong, most disk utilities you would go to just don’t work.
I actually ended up in a state where (after wasting almost an entire day) I could see my data, but only in single-user mode with a read-only file system. I might have been able to directly copy the data, but I picked to format the drive and restore from the backup to save more of my time and get back to coding X-Plane. My suggestion for developers getting iMacs: get an internal SSD (whatever storage size you can afford) and supplement with a fast external hard drive over Thunderbolt.
Going forward, I am replacing the portable backup drives with a Synology NAS RAID device – this gets me high performance, high capacity backup (about 10 TB) with redundant drives. I picked HGST drives because they’ve had a good track record for reliability. With a large network attached storage server, I can have all of my machines backing up in the house all of the time, and have that be the primary way of getting my data back. I’m keeping cloud backup as a last-resort-the-house-burned-down kind of thing.
If my cloud backup hasn’t shipped Monday, I will rebuild the setup I use to cut builds by hand (it’ll take a few hours but it’s doable) and we’ll cut 11.05r2 that way. If the drive comes, I can get the last of my data back and we’ll get to 11.05r2 the easy way. Either way, we’ll get things moving again.
* I opted for a hard drive restore, which should have one day of shipping time, instead of a download; a smaller restore based on download made clear that the transfer speeds would be slower than FedEx for that quantity of data.