TL;DR version: my iMac’s fusion drive “lost its marbles” right before I went on vacation. This has delayed cutting an 11.05 release candidate 2 with a few scenery fixes, but we should get to it next week. In parallel, we’re working furiously to get all of the code locked down for 11.10.
Everything else that follows is really, really, really, really boring. I’m writing it only because some of my co-workers watched this slow motion car crash and tightened up their backup game a bit. If my drive fail can shake you out of complacency, read on.
Basically: my iMac is my main development machine, and the data is backed up and/or duplicated in a bunch of different places: a USB time machine archive, a Backblaze cloud backup (both are “full machine”), DropBox for virtually all of my documents, and my work for Laminar is kept on Laminar’s source control servers. Data loss was never a huge risk here.
Time loss, however, is a real risk! My goal was to lose as little work time to fixing my machines as possible. So my plan was: restore from time machine disk backup, request a cloud backup restore via hard drive, return the hard drive. The total cost would be a few hours of disk copying and less than an hour of my time. My development machine would be usable for new work while waiting for the cloud backup to arrive.
This has not gone as well as I had hoped! You can learn from my fail here — a few notes.
- Your backup might as well not be a backup if you have not checked that the backup contains the data you think it contains. It turns out that both the cloud backup and time machine backup were missing files! I’m very lucky that they weren’t missing the same files.
- Time machine sometimes decides not to back stuff up. OS X has a hidden per-file/directory attribute that can exclude a file from backup without showing it in the Time Machine UI! Once you check your time machine backup and find a folder is missing, from terminal you can do tmutil isexcluded <file path> to see if the file has been explicitly excluded. If it is, tmutil removeexclusion <file path> fixes this.
- Backblaze ships with a bunch of file exclusions too – mostly designed to not archive stuff that isn’t your data. But beware – stuff you care about might not be on the list. (For example, virtual disks in a virtual machine are excluded by default.) I had to add back .iso files to the backup list. Backblaze backups are also not bootable. This is something I can live with, but always read the fine print about what’s in the backup.
- The Backblaze data restore has been very slow – over ten days for less than half a terabyte and it’s still “in progress”.* While they haven’t exceeded the maximum restore time they advertise, it’s slow enough that the delay matters.
- One other note on Backblaze: I saw major performance problems on my iMac while Backblaze was running, even when a backup was not running (since they were scheduled for overnight). I do not think this is necessarily Backblaze’s fault – it may be a problem with CoreStorage (which “runs” the fusion drive) or even a fault with my drives. From what I can tell, cloud backup exacerbated it by putting a lot more file traffic on my system.
- A possible danger if (like me) you keep documents on DropBox to have them everywhere: when I restored my iMac from Time Machine, I was exposing DropBox to my data from a week ago. I didn’t wait to see if DropBox would figure out what happened; I unlinked my iMac while it was offline after the restore, then re-established DropBox and let it download my data. Better safe than sorry.
- I have been backing up to portable 2.5″ USB drives because they’re cheap and really convenient, but they have a down-side: the mechanisms can easily fail and take your whole backup down. I have five of these drives and one has failed in a three year period.
- I’m really unhappy with CoreStorage, to the point where I would not recommend a fusion drive anymore. CoreStorage is an Apple virtual-volume technology (similar to soft-RAID) that makes one small SSD and one large HDD look like a single unified volume, with some of the data “cached” on the SSD for performance. CoreStorage is a lot newer than HFS, so when things go wrong, most disk utilities you would go to just don’t work.
I actually ended up in a state where (after wasting almost an entire day) I could see my data, but only in single-user mode with a read-only file system. I might have been able to directly copy the data, but I picked to format the drive and restore from the backup to save more of my time and get back to coding X-Plane. My suggestion for developers getting iMacs: get an internal SSD (whatever storage size you can afford) and supplement with a fast external hard drive over Thunderbolt.
Going forward, I am replacing the portable backup drives with a Synology NAS RAID device – this gets me high performance, high capacity backup (about 10 TB) with redundant drives. I picked HGST drives because they’ve had a good track record for reliability. With a large network attached storage server, I can have all of my machines backing up in the house all of the time, and have that be the primary way of getting my data back. I’m keeping cloud backup as a last-resort-the-house-burned-down kind of thing.
If my cloud backup hasn’t shipped Monday, I will rebuild the setup I use to cut builds by hand (it’ll take a few hours but it’s doable) and we’ll cut 11.05r2 that way. If the drive comes, I can get the last of my data back and we’ll get to 11.05r2 the easy way. Either way, we’ll get things moving again.
* I opted for a hard drive restore, which should have one day of shipping time, instead of a download; a smaller restore based on download made clear that the transfer speeds would be slower than FedEx for that quantity of data.
31 comments on “X-Plane 11.05, 11.10, and My Mostly Dead Hard Drive”
Servers, backups and things like that is my trade.
I’m baffled about the complexity of spreading your data. The only thing positive in your story was the decision to use a NAS. To do your external backup in the future, you could use the NAS rSync functionality. That way you will always be sure that your internal data and external data is the same.
But why should I worry. You seem to be on the edge of things, I’m just looking forward to the next release of X-Plane – Keep up the awesome work!
I’m going to keep a cloud backup even with the NAS for off-site…because I work at home, there are limits to the physical security of my data. E.g.
– I go on vacation – if someone decides to steal my equipment, they’d have an easy time taking ALL of it.
– The house burns down – no fire suppression system in the home office.
That’s why you should make the NAS do rSync to external location. We do that all the time on company server backups. The NAS should be perfectly able to synchronize your backups externally. rSync technology is your frind!
Yeah, rsync = greatest thing ever. We used to use it for X-Plane’s demo installs, and never had a problem. We now have “cloud storage” (via OpenStack) and the Swift container protocol and it is…just not nearly as good.
Always test the backup method before you actually need it!
I hear you Ben, been down this road before, so now I have a NAS and a NAS as redundancy. I also keep a cloud storage with unlimited space, for a small fee each month. I believe this is by far the most secure thing, but you are never safe.
Cloud storage is said to be safe, but is data completely safe these days?
Oh well, that is another discussion for sure.
So I know how much you hate bug reporting in this thread, but I do It anyway he he. Kidding aside, please do watch the ones I sent in. The bug report includes a Dropbox link that includes 3 different mp4 videos (short) that will illustrate some major issues with 11.05R1 <<– is this correct? Well anyway, hope you get that into R2 hence my urged reply. Would send an email, but I think your backlog is getting larger day by day.
Oh, before I close – Good luck with the NAS, you will not regret it. It is pretty much a MUST HAVE ITEM.
No code changes in r2. It’ll have to wait for 11.10.
After reading your story and the this one:
I wonder how Windows developers handle disk failures…
I can tell you how I do: restore a fresh Windows system from a block-level copy.
Re: the comment system – it’s working fine, but it’s moderated, so your comments don’t appear immediately.
Yeah but the link i posted in my comment disappeared, and after a page reload the message that my comment awaits moderation disappeared together with my comment.
FWI here is the link i was trying to post:
Weird – there was no sign of the original link in any of the posts.
I’m on my 3rd Mac now using TM and Migration Assistant to restore the new box from the old one. The downtime between two installs is limited only by the disk speed. After copying the data I can immediately continue working on the new machine. I wonder if Windows users can do something similar after getting new hardware.
I used to do that, but after migrating a laptop forward several times, I reached a point where the fonts in x-code were misaligned (and no longer mono-spaced) — apparently unique to this one franken-forwarded machine, and because any google search about tabs and spaces just finds holy-war arguments about code formatting, I was never able to resolve it. The next machine got a clean install and no migration.
I bought a new iMac recently, and the Apple rep strongly advised to avoid fusion drives if running VMs – ‘they dont play well’ or something suitably non specific.
Glad I asked! But good usb3 speed from an external ssd means theres a choice.
Didnt know about some files avoiding the backup – i run a mac mini with osx server and networked time machine, and run a couple of restores through it and all *seemed* well…
I didn’t like fusion either, I installed as you recommend a Samsung Evo 512gb SSD and it was simply the best investment I ever made, it made my Mac superfast, storage is the original 1TB hard drive and an external 3TB for backup, but I do nothing like the storage you need. This is for the creative work, as I still prefer a Mac for its ease of use, X-Plane11 however runs on a Windows, best of both worlds.
Yeah the Evos are fantastic – I have one for my Linux setup and it’s so fast it’s surreal. I have one reserved for my Windows box when I can find the time to migrate to Win10 + the full tool chain. (We used it as a scratch setup for the Hartford show.)
Carbon Copy Cloner works a treat for me. //bombich.com/
CCC is great – I used to use it. But the restriction that you have to use a disk image to talk to a NAS isn’t fantastic.
ah bummer good luck Ben. I never had an issue with my Time Machine or back up drives with my iMac, Ive used iMacs for years. all files are always there.
hope to see all the landmarks in 11.10
I won’t say that you ought to switch to a PC. No… I won’t say it!
You can’t…I _already_ have a PC dev machine. 🙂
I’m quit egalitarian here…I have dev setups based on OS X, Windows, and Linux, and all three annoy the hell out of me. 😉 😉
VR please 😉
Totally unrelated, but… have you guys checked out FSW’s trueSKY? I switched from FSX:SE to X-Plane 11 since trying the demo even before release, and I was part of the public betas.
One thing that disappoints in X-Plane is the weather engine and depiction. Admittedly, I put up with it merely because the physics and performance in X-Plane 11 are so superior to FSX:SE.
But seeing the new updates in FSW… I begin to wonder if they will be able to compete and maybe even take over X-Plane as my everyday sim.
I’d rather stay with X-Plane as the physics engine really makes a difference, but I can’t turn a blind eye to developments in other sims.
Hello Ben , when relased xp 11.10 ? . This weekend ?
No, not this weekend.
More or less when released xp 11.10 ? .
Xp11.05 r2 when relased ?
Hi Ben. I am an enthusiastic X-Plane user and I work for Backblaze. I checked with the engineer who is responsible for creating the hard disk restores and sending them to customers. He told me that your restore is in the last stages of completing. He explained to me that while your restore is not huge, it contains a very high number of files (over 1.4 million), and more files means more time to copy from our data center to our restore servers and then to the USB HD. Once that happens, we run some verifications then encrypt the drive for shipment. Following that we will overnight the HD to you. I’ll stay on top of this and let you know if there are any issues that arise.
Another note: our engineer suggests that CoreStorage could add overhead managing the SSD/HHD that could possibly conflict with our backup processes on your computer. Your plan to move to Synology NAS could be a good idea as you could use CloudSync with our cloud object storage service Backblaze B2, and that will run its own process of indexing and uploading data to B2, meaning your computer would do less work to back up, only needing to back up its local drive(s). — Roderick Bauer, Content Director, Backblaze
Thanks for the info — I pinged support today and they told me the same thing about large file count…I’m not surprised at all, as large file count is also the biggest bottleneck in our installer, something we’ve worked around in the installer by better file batching into zip archives. Unfortunately, the large file count is sort of a cost of doing business a developer – unpacked GIT, source code, art files, it all adds up to a large file count. I appreciated the candid response from support though – it’s useful for planning purposes in the future.
(Even the NAS lives by these rules – backup of a machine with larger video and photo files went a lot faster than the loose collection of source code.)
Cloud Sync would be a fantastic option…as a developer, the cost of ANY background process while working is pretty irritating…better to let the NAS talk to the servers and leave my Macs alone.
Re: Core Storage, I was wondering about that…the particular symptom I saw was large numbers of “stuck” processes…from what I can tell (and it’s not real well documented) apparently there are still blocking states in the Kernel where interruption isn’t possible. My wild guess was that core storage has driver paths that don’t support interruption. The problem was that X-code would get stuck behind a big pile of other IO and I couldn’t even kill -9 and restart it.
I also had a recent harddisk problem with an older 320GB WD 2.5″ disk: The disk had more than 10000 defective sectors, but not a single sector was remapped, and the S.M.A.R.T. status said the disk is healthy (even though any SMART selftest had failed)…
I can guess WD had no spare sectors in favor of squeezing out the maximum capacity from the platters.
My solution was to try to copy all my data off that disk to a newer one, and use a RAID for backup. Well, I decided to use a Drobo 5C with USB3, but now I discovered that a 1.8meter cable just works at Full Speed (about 40 MB/s).
In my desktop I have a RAID1 and I periodically perform smart selftests on the disks in addition to monitoring their SMART status (well, smartmontools does that for me automatically once configured).
Also, most data loss I ever had was due to my stupidity (like erasing the wrong directory or adding a partition at the end of a 18GB disk with Windows 98 (which put the partition right over existing ones; I realized after the other data were gone…).
I bought an external raid device from own/macsales iirc its holds 4 wd red 4tb drives in a raid 5 iirc put you have about 7 choices iirc of raid. it is faster than my Synology has plus much easier I connect through east on my cmp you using a iMac should be able to get a thunderbolt device which would be the fastest of the external drive protocols .. while it did run more than my two drive nas. something you might wish to look into.
Comments are closed.