I have spent almost the entire last week looking at ATI performance on Windows in depth; this post summarizes some of my findings and what we are now working on.  Everything on this post applies to ATI hardware on Windows; the ATI Mac driver is a completely separate chunk of code.

Forgive me for quoting an old post, but:

As with all driver-related problems, I must point out: I do not know if it’s our fault or ATI’s, and the party at fault may not be the party to fix it.  Apps work around driver bugs and drivers work around illegal app behavior all the time…that’s just how it goes.  Philosophically, the OpenGL spec doesn’t require any particular API calls to be fast, so you can’t really call a performance bug a bug at all.  It’s ATI’s job to make the card really fast and my job to speculate which subset of OpenGL will cause the card to be its fastest.

This proved to be true – most of the ATI performance problems on Windows involve us leaning heavily on an API that isn’t as fast as we thought it was, but that’s not really a bug, it’s just the particular performance of one driver we run on.  The solution is to use another driver path.

Cloud Performance

I’m going to try to keep this post non-technical; if you are an OpenGL nerd you can read more than you ever wanted to know here.

With 100,000 cloud puffs (this is a typical number for one broken layer, looking at an area of thicker clouds) we were seeing a total cost of about 9 ms to draw the clouds if we weren’t GPU bound, compared to about 2 ms on NVidia hardware and the same machine.

Is 7 ms a big delta?  Well, that depends on context.  For a game developer, 7 ms is a huge number.  At 15 fps, saving 7 ms gets you to 16.7 fps, but at 30 fps it takes you up to 37 ms.  That’s one of the crazy things about framerate – because it is the inverse of how long things take, you get bigger changes when the sim is running faster.  For this reason I prefer to think in milliseconds.  If we can get a frame out in 20 ms we’re doing really good; if it starts to take more than 50 ms, we’re in real trouble.  You can think of 50 ms as a budget, and 7 ms is 15% of the budget – a number you can’t ignore.

The ATI guys pointed us to a better way to push the cloud data through to the card, and the results are better – about 3 ms for the same test case.  That should make things a bit better for real use of the sim, and should get clouds out of the “oh sh-t” category.

Now there is one bit of fine print.  Above I said “if we weren’t GPU bound”.  I put the sim through some contortions to measure just the cost of the geometry of clouds, because that’s where ATI and NV cards were acting very differently.  But for almost anyone, clouds eat a lot of fill rate.  That fill rate cost is worse if you crank the rendering setting, run HDR, run HDR + 4xSSAA, have a huge monitor, or have a cheaper, lower compute-power card.  So if you were CPU bound, this change will help, but if you don’t have enough GPU power, you’re just going to be blocked on something else.

(A good way to tell if you are fill rate bound: make the window bigger and smaller.  If a smaller window is faster, it’s GPU fill rate; if they’re the same speed it’s your CPU or possibly the bus.)

At this point I expect to integrate the new cloud code for ATI Windows into the next major patch.

Performance Minus Clouds

I took some comprehensive measurements of framerate in CPU-bound conditions and found that with the “penalty” for the existing clouds subtracted out of the numbers, my machine was about 5% faster with NV hardware than ATI hardware.  That may represent some overall difference in driver efficiency, or some other less important hardware path that needs tuning.  But the main thing I would say is: 5% isn’t that much – we get bigger changes of performance in routine whole-sim optimization and they don’t affect all hardware in the same way.  I have a number of todo items still on my performance list, so overall performance will need to be revisited in the future.

The Cars

The other code path in the sim that’s specifically slower on ATI cards is the cars, and when I looked there, what I found was sloppy code on my part; that sloppy code affects the ATI/Windows case disproportionately, but the code is just slow on pretty much any hardware/OS combination.  Propsman also pointed me at a number of boneheaded things going on with the cars, and I am working to fix them all for the next major patch.

So my advice for now is to keep the car settings low; it’s clear that they are very CPU expensive and it’s something I am working on.

Fill Rate

One of the problems with poor CPU performance in a driver is that you never get to see what the actual hardware can do if the driver can’t “get out of the way” CPU-wise, and with clouds having a CPU penalty, it was impossible to see what the Radeon 7970 could really do compared to a GTX 580.  Nothing else creates that much fill rate use on my single 1920 x 1200 monitor.*

I was able to synthesize a super-high fill-rate condition by enabling HDR, 4x SSAA, full screen, in the 747 internal view.  This setup pushes an astonishing number of pixels (something that I am looking to optimize inside X-Plane).  I set the 747 up at KSEA at night so that I was filling a huge amount of screen with a large number of flood lights.  This causes the deferred renderer to fill in a ton of pixels.

In this “no cloud killer fill” configuration, I was able to see the 7970 finally pull away from the 580 (a card from a previous generation).  The 7970 was able to pull 13.4 fps compared to 10.6 fps, a 26% improvement.  Surprisingly, my 6950, which is not a top-end card (it was cheaper than the 6970 that was meant to compete with the 580) was able to pull 10.2 fps – only 4% slower for a significantly lower price.

In all cases, this test generated a lot of heat.  The exhaust vent on the 7970 felt like a hair dryer and the 580 reached an internal temperature of 89C.

CPU Still Matters

One last thing to note: despite our efforts to push more work to the GPU, it’s still really easily to have X-Plane be CPU limited; the heavy GPU features (large format, more anti-aliasing, HDR) aren’t necessarily that exciting until after you’ve used up a bunch of CPU (cranking autogen, etc).  For older CPUs, CPU is still a big factor in X-Plane.  One user has an older Phenom CPU; it benches 25-40% slower than the i5 in published tests, and the user’s framerate tests with the 7950 were 30% slower than mine.  This wasn’t due to the slightly lower GPU kit, it’s all in the CPU.

The executive summary is something like this:

  • We are specifically optimizing the cloud path for ATI/Windows, which should close the biggest performance gap.
  • We still have a bunch of performance optimizations to put in that affect all platforms.
  • Over time, I expect this to make ATI very competitive, and to allow everyone to run with “more stuff”.
  • Even with tuning, you can max out your CPU so careful tuning of rendering settings really matters, especially with older hardware.

* As X-Plane becomes more fill-rate efficient it has become harder for me to really max out high-end cards.  It looks like I may have to simply purchase a bigger monitor to generate the kind of loads that many users routinely fly with.

About Ben Supnik

Ben is a software engineer who works on X-Plane; he spends most of his days drinking coffee and swearing at the computer -- sometimes at the same time.

24 comments on “ATI Performance on Windows

  1. Very informative Ben.

    Hope you’ll be able to use some of your magic on the mac ATI drivers. On a nice day 40+ but then the clouds roll in and it’s 18 or less.
    Good luck with it.

    Dom

  2. Unfortuately a lot of us serious XP + ATI users are using at least 3x1920x1080, 6Mpix screens for better peripheral vision… that’s 2.7x factor (4.6x for 5 monitor setup)… still with no SLI/CrossfireX option.

    1. XP in 2012. For performance.
      OK I guess if it’s ok for you to miss a SUBSTANCIAL set of performance optimization and new technologies that come both from new Windows and drivers for new Windows, fine.
      I thought this philosophy was dead for good. Wrong.

        1. I am laughing now in front of my monitor. 😀
          A spike of stupidity (for an otherwise quite intelligent person). 😉
          Mistook XP for the Windows version instead of X-Plane short. 😛
          Hahaha sorry.

        2. I’m trying hard too, maybe something having to do with Windows? x64 philosophy? No. I don’t get it.

          As already mentioned, nice article Ben, looking forward to seeing more of this detailed info about core features/problems etc. Some relevant develeper screenshots/illustratuions would be nice to have too. They really make difficult things easier to understand!

  3. Great news looking forward to this I can already fly with 30 + FPS with scatered clouds . hopefully I can start using more weather . with My amd system

  4. Hi Ben,

    did you have time to look into Nvidias new TXAA? In my understanding it could be a solution for the quality problems of FXAA without beeing very expensive on performance. But I´m a layman…
    I know that there is almost no experience with this new tech but I wonder if it would be possible to make xp10 capable of using it or if that would mean a major code change.

    Thanks
    Flo

    1. TXAA: unknown. First, it depends on NV’s terms of use. The GPU vendors are in a sort of weird position: having made the GPU abstraction lower level (by moving from fixed function to shaders) it has become a lot harder for them to provide “finished” effects. In the old days of fixed function, FSAA worked on all games – the GPU vendors knew it would because the pipeline couldn’t be used in that many ways. So they could compete on FSAA speed but also quality, and use creative filters and algorithms.

      Now look at post-processed anti-aliasing (this is what TSAA, FXAA, and MLAA all are). To use them right, the game has to drop them in at just the right time – hence the control panel versions trash text, and NV provides FXAA in source. (That’s a big reason why we use it – because it’s a very straight forward integration.)

      But this means that NV’s FXAA, which they funded development for, is now running on ATI hardware in X-Plane! So the question is whether going forward the GPU makers will tend to “give out” such effects or try to wrap them into the driver so that they only run on their hardware.

      Second, TXAA apparently (I have only read the very brief Kepler white paper – if there’s an SDK for TXAA post a link, but I think it’s early) has an optional temporal input; there _may_ be reasons why we can’t do this in X-Plane. Temporal anti-aliasing is something that not all engines can “just do” – there is fine print.

    1. Well this particular optimization _is_ AMD-specific. Generally: we look at all optimizations based on:
      – How much perf boost.
      – How many of our users will see it.
      – Which set of users will see it. (We care more about mid-range optimizations than high-end because if you have a GT680 or 7970 you ALREADY have the potential for high perf.)
      – How much dev time to implement.
      – How invasive on the engine (e.g. if we have to turn the whole sim upside down to get a small boost, that’s not good.)

      In this case the AMD optimization fits well: 33% of our user base, quick integration, big perf boost, and no high level change to the sim. All other optimizations get analyzed along the same criteria.

      1. This leads me to another question… Do you have any good hardware stats of your user base? (and is it publishable? why wouldn’t it be)

        1. Only the OS split and 32-64 bit split mentioned previously – we can infer that the 64-bit folks have 64 bit CPUs. We do not currently have a GPU or memory survey.

          1. I think it would be very useful (many devs do it), to actually make a program that extracts needed info (possibly to a text file so that user know that nothing personal is retrieved) in a uniform way, so that interesting statisics can be extracted.
            OS is vital, CPU type also, GPU(s), RAM size and speed, disk sizes, network capabilities.
            It would be interesting to know if you still develop (and “drive around obstacles”) for a limitation for customers that represent maybe 0,5% of your base. Or that you don’t actually make use of something 90% of your users already have.

            Anyway, you know better.

  5. “Nothing else creates that much fill rate use on my single 1920 x 1200 monitor.*”

    Someone buy that man a bigger monitor! A nice 30″ 2560×1600.

    – CK.

  6. “…leaning heavily on an API that isn’t as fast as we thought it was…”

    So the most interesting bit is missing, what API is this all about, and which one is faster ?

    PhM

  7. I’ve been doing some observations of X-Plane performance on my fairly-new ATI 6950.
    My CPU & RAM: Intel i7 2600K overclocked to 4.1 GHZ, 16GB RAM (sorry no clock speeds)
    Video card clock speeds: 840MHZ clock speed, 1325MHZ ram clock,

    My observations:
    There are 3 things slowing me down. 2 are mentioned in this post.
    1. Clouds / Weather
    2. Cars
    3. Objects

    I have also noticed that my graphics card may not be able to “handle” the fill rate.
    I have adjusted options such as trees, etc. I have not noticed much of a difference in performance with these ON/OFF. I also, have 8 AI planes on, which I have tried toggling, with no performance issues. I believe objects is one of the main killers here. I have also noticed, that the utilization of my resources is not close to full (36% of my graphics card and 16% of my CPU, heh). Also, I have noticed that X-Plane is not giving jobs to all of my cores, as some are not in use AT ALL. I am pretty sure that this may be due to a problem within OpenGL, as I have noticed this problem with other OpenGL based games, however, I hope there is possibly a way to bypass such a limitation. Also, as Ben has stated in his post, I believe the fill rate does need to be optimized in X-Plane 10.
    Those are my current observations. I will be observing more into this issue in the future. Also, something to note, is that while observing, I am using the BETA CCC driver 12.4, which I did forget to mention above. Also, I do have a question if anyone here is a graphical guru, would it be possible to give jobs to each graphics card to do, instead of having them generate every other frame, while in crossfire?
    Also, one more thing I forgot to add above about the objects, is that I have not determined whether these are graphical-based or cpu-based, or both.

    With that said, I will conclude my brief observations.

Comments are closed.