Hint: it might not be what you think!  Vertex count isn’t usually the limiting factor on frame-rate (usually the problem is fill-rate, that is, how many pixels on screen get fiddled with, or CPU time spent talking to the GPU about changing attributes and shaders).  But because vertex count isn’t usually the problem, it’s an area where an author might be tended to “go a little nuts”.  It’s fairly easy to add more vertices in a high-powered 3-d modeling program, and they seem free at first.  But eventually, they do have a cost.

Vertex costs are divided into two broad categories based on where your mesh lives.  Your mesh might live in VRAM (in which case the GPU draws the mesh by reading it from VRAM), or it might live in main memory (in which case the GPU draws the mesh by fetching it from main memory over the PCIe bus).  Fortunately it’s easy to know which case you have in X-Plane:

  • For OBJs, meshes live in VRAM!  (Who knew?)
  • For everything else, they live in main memory.  This includes the terrain, forests, roads, facades, you name it.

Meshes In VRAM

If a mesh is in VRAM, the cost of drawing it is relatively unimportant.  My 4870 can draw just under 400 million triangles per second – and it’s probably limited by communication to the GPU.  And ATI has created two new generations of cards since the 4870.

Furthermore, mesh draw costs are only paid when they are drawn, so with some careful LOD you can get away with the occasional “huge mesh” – the GPU has the capacity if not everyone tries to push a million vertices at once.  (Obviously a million vertices in an autogen house that is repeated 500 times is going to cause problems.)

But there is a cost here, and it is – the VRAM itself!  A mesh costs 32 bytes per vertex (plus 4 bytes per index), so our mesh is going to eat at least 32 MB of VRAM.  That’s not inconsequential; for a user with a 256 MB card we just used up 1/8th of all VRAM on a single mesh.

One note about LOD here: the vertex cost of drawing is a function of what is actually drawn, so if we have a million-vertex high LOD mesh and a thousand-vertex low LOD mesh, we only burn a (small) chunk of our vertex budget when the high LOD is drawn.

But the entire mesh must be in VRAM to draw either LOD!  Only things drawn on screen have to be in VRAM, but textures and meshes go into VRAM as a whole, all LODs.  So we only save our 32 MB of VRAM by not drawing the object at all (e.g. it being farther away than the farthest LOD).

Meshes in Main Memory

For anything that isn’t an object, the mesh lives in main system memory, and is transferred over the PCIe bus when it needs to be drawn.  (This is sometimes called “AGP memory” because this could first be done when the AGP slot was invented.)  Here we have a new limitation: we can run out of capacity to transfer data on the PCIe slot.

Let’s go back to our mesh: our million vertex mesh probably takes around 32 MB.  It will have to be transferred over the bus each time we draw.  At 60 fps that’s over 1.8 GB of data per second.  A 16x PCIe 2.0 slot only has 8 GB/second of total bandwidth from the computer to the graphics card.  So we just ate 25% of the bus with our one mesh!  (In fact, the real situation is quite a bit worse; on my Mac Pro, even with simple performance test apps, I can’t push much more than 2.5 GB/second to the card, so we’ve really used 75% of our budget.)

On the bright side, storage in main memory is relatively plentiful, so if we don’t draw our mesh, there’s not a huge penalty.  Careful LOD can keep the total number of vertices emitted low.

Take-Away Points

  • Non-OBJ vertex count is significantly more expensive than OBJ vertex count.
  • OBJ meshes take up VRAM; the high LOD takes up VRAM even when the low LOD is in use.
  • To reduce the cost of OBJ meshes, limit the total LOD of the object.

About Ben Supnik

Ben is a software engineer who works on X-Plane; he spends most of his days drinking coffee and swearing at the computer -- sometimes at the same time.

7 comments on “What Is the Cost of 1 Million Vertices?

  1. Man, I was having a coronary every time Rhino meshed my relatively simple fuselage at over 20,000 polygons, which is about 16,000 vertices, depending, of course, on how I configure the mesher. Hmm. It’s quite smooth at around a density of 13,000 polygons, 11,500 vertices or so. But if I’m playing with a gig of VRAM, this is itty bitty, and I predict more fun to come. It’s important to be conservative, but conservatism can be taken too far. My life just got easier. Thanks, Ben!

  2. What app did you use to test your PCIe performance? I want to test whether the limited 8x PCIe on the new MacBook Pro (due to Thunderbolt) is affecting throughput at all.

    1. I used three things to look at perf – sadly two of them I can’t share.
      1. I have an app from another company that is covered by NDA. (But: it’s really just a very short program that does gl draw calls, so no magic there…it was useful to validate that X-Plane’s “real world” overhead was within reason.)
      2. X-Plane itself…much easier done in XP10 with a few source modifications to check certain cases. (A build flag can be turned on that provides more hooks into rendering for this purpose.)
      3. When X-Plane is set to ‘thrash’ memory, the transfers are visible via “driver monitor”, a tool that comes with Apple’s developer tools. You see “texture page-ons”, which really means “meshes or textures that need to be in VRAM.

  3. That explains finally why I have so many “Xmapped run out of memory” troubles with better meshes and orthosceneries.
    if I’m right, they are stored on the main memory, so it is not a real “memory trouble” but a “bus trouble”, for which X-plane is not in fault.
    Is there any way to improve something for this in XP10 ??

    (I really can’t understand why FS – sorry, don’t bite .. lol- is able to have large orthoscenery with very accurate mesh … I think I have seen somewhere some 1.2m meshes !!!!)

    1. Because FSX has separated mesh (raw data) from vector data, and made realtime tesselation, LOD etc… determined by the user in graphics options (the circle of LOD… distance). All things outside the circle is not loaded.. or with less details.

      X-Plane has baked TILES, and ALL the 6 tiles must be loaded with FULL details.

      I am playing with Condor scenery at this time, and there is the same system as FSX… no vector data, but raw “mesh” with little tiles draped over.

      Sinon, tu veux une pizza, Mr Madine ? 😀

      1. Right. In X-Plane, the textures are scalable using LOAD_CENTER, at least up to a point, but the mesh is pre-baked. So … you can throw a lot of texture res at X-Plane but you can’t throw a ton of mesh detail. We’ll grow the architecture to scale the mesh too, at least up to a point. (I think there are limits to the usefulness of high-res meshes, but there’s no question that the current limits are too low.)

        1. Great, so if I understand well, XP10 will be able to load “a little bit” more than XP9 …

          Merci pour l’offre de Pizza, mais je les aime chaudes alors il va falloir la servir pronto !!

Comments are closed.