What’s This Whole PCIe Thing About, Anyway?

I’ll try to summarize some of our hardware findings for X-Plane 10 over the next few posts.  But in my previous post I mentioned that the new MacBook Pros have only an 8x PCIe connection to the discrete GPU (that is, the nice GPU that isn’t built in to the CPU, the one you want to fly X-Plane with) and this got a bit of attention.

So it begs the question: what is this PCIe bus and why do we need to  care all of a sudden?

The PCIe bus is the connection between the CPU/main memory and your graphics card (with its memory and GPU).  It is the bottleneck through which all communications must flow – sometimes every frame, sometimes every now and then.

PCIe slots are named by the number of lanes (e.g. 16x means 16 lanes) – each lane has fixed capacity (which is doubled in PCIe 2.0).  So a graphics card in a 16x slot drink data from your computer at double the rate of one in an 8x slot – it’s an extra wide straw.

(Nerds: I realize this is about the worst description of the PCIe bus you will ever find.  Go read Wikipedia!)

What Do We Use the PCIe Bus For?

X-Plane needs the PCIe bus to:

  • Send the instructions to draw each frame to the GPU.
  • Transfer any textures, new OBJ meshes, and other data that will be held in VRAM.  The data is born on the CPU, goes over the PCIe bus once, and then lives in VRAM.
  • Send to the GPU anything that changes every frame to the GPU.  For example, smoke puffs and car headlights have to go over the PCIe bus every frame because they are constantly changing.
  • Send to the GPU mountains, forests and other non-repeating geometry.  This data gets sent every frame.

If the sum of all of the stuff on that list gets too big, your framerate drops as the CPU and GPU both wait data to make it over the bus.  In other words, the bus can at times be the bottleneck in terms of framerate.

If you set your rendering settings near the maximum that your computer can handle and get the occaisional stutter, that may be X-Plane running out of PCIe bus bandwidth.  As you fly to a region with new textures that haven’t been used before, the OpenGL driver will transfer our textures over the PCIe bus from system RAM to VRAM.  If the PCIe bus is already nearly maxed out, the extra traffic of those textures is going to temporarily hurt framerate – sometimes in the form of a stutter or pause.

Are You Sure You Know What You’re Doing?

At this point those of you who know some things about 3-d graphics are shouting at your monitors: why are you guys transferring the mountains and forests over the PCIe bus every frame?  Why not just put them in VRAM, since they don’t change?

That’s a good question and if you have a better solution than the one we use, I’d love to hear it.

The problem is this: OpenGL doesn’t give us a good way to prioritize which meshes (VBOs) stay in VRAM and which ones are purged out when we run out of VRAM.  If we put every mesh in the sim into VRAM, framerate gets better (because we aren’t using the PCIe bus) right up until we run out of VRAM.  At that point the OpenGL driver freaks out and starts throwing out textures to make room for meshes, and then the textures have to be sent back over the PCIe bus, and we end up in a world of hurt.  We end up in a state of texture thrash as we have too much “stuff” for VRAM and framerate falls off of a cliff.

The real problem is this: X-Plane has no idea how much VRAM is available for its own use.  Sure the card might have 256 MB, but how much is being used by the OS window manager for those translucent window effects, or by other applications?  We can’t even add up how much VRAM we use with ultimate precision because we don’t know the granularity of allocation on the video card (there’s real overhead for VBOs being rounded up to the VM page size, for example) or whether side buffers like a hierarchial Z buffer have been allocated.

X-Plane works around this with a simple rule: all OBJs go to VRAM, because their geometry is likely to be repeated, and non-repeating geometry, like forests and mountains, stay in system RAM and go over the bus.

This heuristic actually works pretty well in X-Plane 9 – we have enough bandwidth to transfer all of that “stuff” once per frame, and we tend not to run out of VRAM and thrash.

Why Does X-Plane 10 Want more PCIe Bus Bandwidth?

X-Plane 10 is hungrier for bus bandwidth for three reasons:

  1. The OBJ engine’s performance has been improved a lot.  In the past, we’d run out of CPU capacity (to draw OBJs) long before we ran out of bus bandwidth.  This isn’t always the case with X-Plane 10.  The graphics are always held back by their weakest link.  If you have a strong GPU (and low effects settings) and the OBJ engine is efficent, the PCIe bus is the weakest link.
  2. The art assets are more detailed and thus contain more vertices.
  3. Shadows.

When shadowing is on we have to draw the entire world multiple times, once to build shadow maps and once to draw the real world.  So shadowing can double (or even triple or worse) our bus bandwidth usage.  We didn’t have that kind of free capacity on the bus in the first place.

We’re still working on the engine, art assets, and performance, so my hope is that we’ll find ways to cut down bus use (especially with shadows).  And there has to be one slowest part of the system – as of this writing, the PCIe bus is often it.

  • Facebook
  • Reddit
  • StumbleUpon
  • Twitter
  • Google Buzz
  • LinkedIn
This entry was posted in Development, Hardware. Bookmark the permalink.

25 Responses to What’s This Whole PCIe Thing About, Anyway?

  1. chris says:

    will owners of the current high end mbp have a crappy experience with v-10 due to 8lane vs 16 or not?

  2. chris says:

    This current mbp i have is running xp great, despite the “best of the best” pci, gpus etc. Randy emailed me back saying I should be fine, and that the experience should still be great, if not better than 9 due to coding optimization etc, weather will be better as in it won’t sop the frame rate and comp as much anymore too. Do you concur? v-10 can’t be limited to pcie 16x slots- however more lanes would of course be ideal :)

  3. Emile says:

    Will 16x be enough, Ben?

  4. chris says:

    Thanks for the help Ben, can’t wait to use xp 10

  5. James says:

    Glad to see the PCIe bus becoming the bottleneck. Really, it means XP10 is finally taking full advantage of the CPU cores and GPUs at our disposal.

    Ben – what’s the impact of the new engine on the memory bandwidth? Does it suffer the same type of problem as the PCI bus?

  6. Flightime56 says:

    It always usually the weather that drops me through the floor, add a big custom scenery and i’m washing in the grey matter, but even flying i notice that huge weather drop, even if that has been improved (it has) will make XP10 worthwhile…

  7. luke says:

    Hi Ben
    Take on Helicopters, a game built on top of the arma 2 game engine has to be the best looking flight simulator I’ve ever seen or played, and I think it’s largely because it manages to efficiently store the objects in vram, without the problem of a pci or cpu bottleneck. I’ve noticed that several games which field large scale environments, including arma 2, allow the user to select the vram usage size in the options (small, medium, large, very large). Admittedly this isn’t a perfectly engineered solution, it seems to be the common way to deal with the problem as far as I can tell as a user. Then it’s up to the user to experiment with which option will yield the best performance, and avoid texture thrashing. We already do a lot of tweaking settings in x-plane 9, so it’s not like having an extra option like that will be too daunting.
    Also, I havn’t done very much opengl programming, but I found this interesting page on VBO usage flags and how they affect vram allocation in opengl which might be interesting (well you probably already know, but hey, I thought it was a good read).

  8. luke says:

    oh, the link didn’t get posted for some reason, second try
    http://www.songho.ca/opengl/gl_vbo.html

  9. Mutley10G says:

    Wow – all very busy! Any views on how a 2009 Macbook Pro will handle XP10? Currently running 20-30 fps with various aircraft (eg Carenado’s 33 at 21 fps, R22 at around 30), texture high res, near-world view, occasional jitters but generally fantastic.

    I ask as you’ve indicated coding improvements!

  10. Luke says:

    Ah, sorry about that. It wasn’t a competing product in my eyes (I own xp 9 and am very excited about 10. Xp allows you to create your own aircraft, and has a totally different feel/approach towards flight simulation being serious or a game, and I don’t think I’ll end up buying it). But now that you point it out, perhaps it is competing ;) . Anyway, it is exciting stuff, and I’m looking forward to xp 10 being able to make better use of the gpu, because on xp 9 it never reaches above 40% usage on msi afterburner, and it feels like a bit of a waste, especially seeing as I have a quad core feeding it.

    • Luke says:

      ooops, I mean I don’t think I’ll end up buying that competing product :D ,
      wow, it’s hard getting used to not being able to edit your own comments again. Sorry for the double post, and the triple one before.

  11. Eric Liskay says:

    Good thing I have 4 PCIe 2.0 X16 slots then!

    • chris says:

      what kind of comp you have? then that makes you in the 5% of people who can even run v10 quite nicely the rest of us are screwed

      • Chris Serio says:

        Have you actually understood Ben’s post? Ben said…

        First: don’t panic. Wait for version 10 to come out, then try it. You might be pleasantly surprised

        Then referring to computers that are just barely running XP 9 currently he said…

        if you can run 9, you may be able to run 10 acceptably

        If you have an i7, you’re certainly not in the category of computers that are just BARELY running XP9 (unless you have an onboard GPU) so what do you have to be so nervous about? You’re panicking like the sky is falling and you’re misconstruing the entire point of Ben’s post which is merely trying to inform the community what pieces of hardware are going to be important in the context of “I have money…what should I spend it on if i want to upgrade for v10?”. He didn’t say nor imply in any way that “5% of the population” are going to be fine while the “rest…are screwed”.

        I _just_ upgraded my developer computer to an i7 a few months ago. I was running a Core 2 Duo with 2GB of RAM before that and the primary reason for the upgrade was so my compile times would be shorter!

        Take a deep breath, wait for December and try it for yourself…and please in the meantime, if you don’t have something positive and on-topic to say, please kindly find another forum in which to share your opinions. This is not the right forum for gripes.

        • chris says:

          December it is, counting down. banking on the” you may be pleasantly surprised”

          I’m sure in the end, it will be fine, due to all of your guys’s hard work on this.

    • James says:

      Eric, I think only up to 2 out of your 4 PCI slots can be electrically 16x enabled simultaneously, at least based on the technology that I know. For instance, LGA1366/X58 supports 40 PCIe lanes, so you could do 16/16/8, or 8/8/8/8, or 16/8/8/8 (but not 16/16/16/16). If not, I’d be curious to know what system you are using ;)

      • Eric Liskay says:

        It’s an EVGA SR-2 motherboard with the Intel 5520 chipset and two Nvidia NF200 chips to allow for 4 x16 PCIe slots. I don’t plan to do 4-way SLI mind you. I am quite happy with two-way SLI currently. It supports 16/16/16/16 or 16/8/8/8/8/8/8. It will be interesting comparing the performance of X-Plane 10 in Mac OS versus Windows.

  12. Dark Photon says:

    For the non-repeating vertex data, consider using Streaming VBOs with re-use. Just ensure you allocate a streaming VBO plenty large enough to cache a frame of data. Then, most of the time, you don’t have to reupload, you just reuse.

    * http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=282868#Post282868
    * http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=282902
    * http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=273141#Post273141
    * http://www.opengl.org/wiki/Buffer_Object_Streaming

    • Ben Supnik says:

      This is exactly what X-Plane already does. The reused but not repeated data are stream-draw VBOs allocated once. The result is a pile of bus bandwidth usage!