When it turns out that a bug that we thought was in an OpenGL driver is actually in X-Plane, I try to make a point of blogging it publicly; it’s really, really easy for app developers to blame bugs and weird behavior on the driver writers, who in turn aren’t in a position to respond. The driver writers bust their nuts to develop drivers quickly for the latest hardware that are simultaneously really fast and don’t crash. That is not an easy task, and it’s not fair for us app developers to blame them for our own bugs.
So with that in mind, what did I screw up this time? This is a bug in the framerate test that caused me to mis-diagnose performance of hardware instancing on NVidia drivers on OS X. Thanks to Rob-ART Morgan of Barefeats for catching this – Rob-ART uses the X-Plane framerate test as one of his standard tests of new Macs.
Here’s the TL;DR version: hardware instancing is actually a win on modern NVidia cards on OS X (GeForce 4nn and newer); I will update X-Plane to use hardware instancing on this hardware in our next patch. What follows are the gory (and perhaps tediously boring) details.
What Is Hardware Instancing
Hardware instancing is the ability to tell the graphics card to draw a lot of copies of one object with a single instruction. (We are asking the GPU to draw many “instances” of one object.) Hardware instancing lets X-Plane draw more objects with lower CPU use. X-Plane’s rendering engine will use hardware instancing for simple scenery objects* when available; this is what makes possible the huge amounts of buildings, houses, street signs, and other 3-d detail in X-Plane 10. X-Plane has supported hardware instancing since version 10.0.
The bug is pretty subtle: when we run the framerate test, we do not set the world level of detail explicitly; instead it gets set by X-Plane’s code to set up default rendering settings for a new machine. This “default code” looks at the machine’s hardware capabilities to pick settings.
The problem is: when you disable hardware instancing (via the command line, explicit code in X-Plane, or by using really old hardware) X-Plane puts your hardware into a lower “bucket” and picks lower world level of detail settings.
Thus when you disable hardware instancing, the framerate test is running on lower settings, and produces higher framerate numbers! This makes it look like turning off instancing is actually an improvement in fps, when actually it’s just doing better at an easier test. On my RetinaBook Pro (650M GPU) I get just over 20 fps with instancing disabled vs 17.5 fps with instancing enabled. But the 20 fps is due to the lower world LOD setting that X-Plane picks. If I correctly set the world LOD to “very high” and disable instancing, I get 16.75 fps. Instancing was actually a win after all.
Was Instancing Always a Win?
No. The origin of this mess was the GeForce 8800, where instancing was being emulated by Apple’s OpenGL layer. If instancing is going to be software emulated, we might as well not use it; our own work-around when instancing is not available is as fast as Apple’s emulation and has the option to cull off-screen objects, making it even faster. So I wrote some code to detect a GeForce 8800-type GPU and ignore Apple’s OpenGL instancing emulation. Hence the message “Disabling instancing for DX10 NV hw – it is software emulated.”
I believe the limitations of the 8800 are shared with the subsequent 9nnn cards, the 1nn, 2nn and 3nn, ending in the 330M on OS X.
The Fermi cards and later (4nn and later) are fundamentally different and can hardware instance at full power. At the time they first became available (as after-market cards for the Mac Pro) it appeared that there was a significant penalty for instancing with the Fermi cards as well. Part of this was no doubt due to the framerate test bug, but part may also have been a real driver issue. I went back and tried to re-analyze this case (and I revisited my original bug report to Apple), but X-Plane itself has also changed quite a bit since then, so it’s hard to tell how much of what I saw was a real driver problem and how much was the fps test.
Since the 480 first became available on the Mac, NVidia has made significant improvements to their OS X drivers; one thing is clear: as of OS X 10.9.5 instancing is a win and any penalty is an artifact of the fps test.
What About Yosemite?
I don’t know what the effect of instancing is on Yosemite; I wanted to re-examine this bug before I updated my laptop to Yosemite. That will be my next step and will give me a chance to look at a lot of the weird Yosemite behavior users are reporting.
What Do I Need To Do?
You do not need to make any changes on your own machine. If you have an NVidia Mac, you’ll get a small (e.g. < 5%) improvement in fps in the next minor patch when we re-enable instancing.
* In order to draw an object with hardware instancing, it needs to avoid a bunch of object features: no animation, no attributes, etc. Basically the object has to be simple enough to send to the GPU in a single instruction. Our artists specifically worked to make sure that most of the autogen objects were instancing-friendly.