X-Plane 11.02 should be out this week; we’re down to one bug, whose fix I am verifying now. There have been a number of questions about performance, so to start, here is some info on three things we’ve done to make 11.02 faster than 11.01.
8-bit Water. The dynamic FFT-based ocean wave textures we stream in X-Plane 11 are floating point textures in 11.01 (F32 on the CPU, F16 on the GPU). This was an easy decision for getting the tech working, but as it turns out, transferring the textures to the GPU is slow, particularly on the NVidia drivers.*
For 11.02, Sidney has rewritten the shaders to cope with 8-bit waves. The results look almost the same, but the amount of data transferred is 4x smaller, and more importantly, 8-bit RGBA is the path most likely to be handled well by the driver, so this should be a win.
Sidney also wrote some code to transfer the textures asynchronously, but we’re holding off until 11.10 for that, as it may require debugging or behave weirdly on some drivers.
Faster Car Bucketing. The cars have always cost more CPU than they should, and profiling indicated that 90% of the work was in moving the cars around in our scenery system as they drove. The code to “rebucket” them has been modified and is now significantly cheaper. We are not turning the car density up yet (it’s not that fast), but at this point with the cars at the highest setting we ship, they now take 2-3 ms total to compute, which means they have no frame rate impact. I’d like to bring the density up in the future if we can get further performance wins, which I think we can.
Better Core Scheduling. If you’ve been reading carefully, you should be shouting at the screen about now about how the hell something that takes 2-3 ms total is “free” – 3 ms means that if you were running at 60 fps you’re down to 50. That’s not free?
I am declaring the cars free because they now run in parallel to the flight model, and it’s very likely that the total flight model work takes at least 2-3 ms, even without AI planes. The third optimization is a big cleanup of the multi-core scheduling that we do within a frame.
X-Plane uses multiple cores both to load background scenery as you fly and to speed up some calculations within a frame. As of now, the major “per frame” multi-core computations are:
- The flight model (if you have more than one aircraft – we can’t multi-core a single plane).
- DSF scenery maintenance (not super expensive, but does get multi-core acceleration).
- Car computation (typically uses 1-2 cores at most).
- FFT water calculations for the next frame (uses up to four cores).
X-Plane 11.01 was not scheduling these particularly well – here’s a picture.
What you’re seeing is X-Plane kicking off the FFT water too early, and that work blocks X-plane from completing AI aircraft calculations. The big red bar up top is the sim waiting (and FPS dying) because the AI planes weren’t done in time.
(The bottom ‘track’ with nothing on it is an IO thread that’s waiting in case we need to do UDP I/O. Since I had IO off, it is efficiently sleeping. This profile is on a 4-core machine so we couldn’t have stuck work down there.)
We start the (newly optimized) cars as early as possible so they complete at about the same time as the flight model; we get all DSF work done immediately, and we don’t start water until the very end. In the meantime, the main thread is free to go do the actual frame rendering.
This is just an incremental step for multi-core use; we have been steadily adding more multi-core work for the last few years, and we’ll be adding more in future X-Plane 11 updates. For example, X-Plane 10.50 re-structured the renderer, separating the work of discovering what to draw (“culling”) from the work of actually drawing. In X-Plane 11, we can do that culling on multiple cores, improving total framerate.
I don’t have great numbers on what kind of performance change you’ll see in 11.02; it’s actually hard to measure the improvements here with the FPS test because the FPS test runs a replay (and not the actual flight model) and doesn’t run long enough to generate car traffic. But we think it should be a good incremental improvement.
* It is not a bug that this case was slow for the NVidia driver; no OpenGL driver is contractually obligated to do anything in a particular time frame. It was slightly surprising in that NVidia seems to go farther than other GL vendors to optimize less common and less efficient code paths.
NVidia does normally allow for complete threading of CPU-side driver work, so it’s possible they thought there was no need to optimize this case directly since it would be on a worker thread; by comparison, Apple does not use a general worker thread for their driver but does use a worker thread for all CPU-based texture transfers, at least as far as we can tell by profiling X-Plane.