Tag: performance

More Aircraft RFCs – Landing Lights

It looks to me like we could afford a few landing light halos on most (but not all) hardware.  This gets a bit tricky in terms of how we make this available to authors…

  • We have to allow access without breaking old planes.
  • There will be two distinct cases due to very different hardware.

So…I have posted an RFC on the X-Plane Wiki.  Please post your thoughts on the discussion page!

One option (not really discussed in the RFC) is to do nothing at all.  Basically I hit upon this during some routine refactoring of the shaders.  The whole issue can be deferred indefinitely.
Why wait?  Well, I don’t believe that an incremental increase in the number of landing light halos is the future.  Our end goal must be some kind of truly global illumination, hopefully without a fixed lighting budget.  It may not make sense to add a bunch of complexity to the aircraft SDK only to have all of those limit become unnecessary cruft a short time later.
(I think I can hear the airport designers typing “why do the airplane designers get four lights and we get none?  Give us a light or two!”  My answer is: because of the fixed budget problem. We can allocate a fixed budget of lights to the user’s aircraft because it is first in line – we know we either have the lights or we don’t.  As soon as we start putting global lights in the scenery, we have to deal with the case where we run out of global lights.  For scenery I definitely want to wait on a scheme that isn’t insanely resource limited!)
Programmers: yes – Dx10 hardware can do a hell of a lot more than 4 global lights.  Heck – it can do a hell of a lot period!  For example, it can do deferred rendering, or light pre-rendering. A true global lighting solution might not have anything to do with “let’s add more global lights a few at a time.”
Posted in Aircraft, Development, File Formats by | 5 Comments

Shader Optimization Fallout

Every time I work on a new X-Plane feature, I do a combination of:

  • Reorganizing and cleaning up old code.
  • Adding new features.
  • Tuning performance for this new environment.

My experience has been that the investment in cleaning up old code is more than paid for by faster, easier development of new code – it’s easier to code in a “clean” work area.

As part of my work on 930 I am refactoring and optimizing how we set up pixel shaders.  I’m not sure if there will be any framerate benefits in the short term, but in the long term there is definitely an advantage to being able to set up the most optimal shader configuration for any situation.
(Since most of what we draw – OBJs, airplanes, DSFs) can be created by users, we never really know what we’ll be drawing…the set of art content X-Plane can handle is almost unlimited.  So it is up to shader optimization code to “find” the optimal setup for a particular stew of OBJ attributes, textures, etc.)
The short term fall-out during beta is unfortunately a certain amount of pain.  It’s likely that these changes will introduce graphic quirks with certain combinations of planes.  These are fixable!  The important thing is: if you hit a graphics bug with a particular plane or scenery pack in 930 (whenever we get to beta – we are not in beta yet!) and that bug is not in 921 – report it! It may be that the optimizer is being too aggressive with a particular combination of settings and turning off some critical feature.
I will run the new shader optimizer code through just about every scenery pack and airplane I can find, but invariably there is some magic trick in a third party plane on the .org that I won’t have.
One thought for creating fast content: alpha is expensive!  Or rather, let me rephrase that to: if you are not using the alpha channel of your texture, you should not have an alpha channel in your texture.  
(For PNG this means stripping the alpha channel off, rather than having a solid 100% opaque alpha channel.  For DDS this means using DXT1 with no transparent pixels.)
The new shader optimizer detects the case where alpha is not being used and sets up a more optimal code path.  (The old shader optimizer did that too, but only some of the time – in the new code, we will always take this optimization.)
Having alpha blending enabled can inhibit “early-Z” optimizations on modern GPUs, and also require a more expensive blending operation in the framebuffer.*  So if your model doesn’t use alpha, strip the channel.
* Some newer graphics cards recognize 100% opaque alpha and provide fast write to the framebuffer.  But even if early-Z-type optimizations become alpha friendly, there will still be optimizations we can make in the sim if we hit the no-alpha case.
Comments Off on Shader Optimization Fallout

The impenetrable Object Barrier

Some coding problems are stubborn – I find myself looking back at a week of working realizing that all I really did was prove that a bunch of theoretical improvements don’t work in practice.

Improving OBJ throughput is one of those problems.  On a high-end machine, even drastic changes to the OBJ engine make only the slightest difference in throughput – 2 or 3% at best. Every improvement counts, but a 3% improvement doesn’t change the game for how we draw scenery.
There is at least one route I haven’t had time to go down yet: object instancing.  The theory is that by making many objects appear with only one object drawn, we get a multiplier, e.g. a 2x or 4x or larger amplification of the number of objects we can have.
In practice it won’t be that simple:
  • To get such an amplification we have to recognize groups of the exact same object.  Grouped objects will have to be culled together.  So we might get a hit in performance as we draw more objects that are off-screen, just to use instancing.
  • It may be that the grouping requirement is so severe that it is not practical to find and group objects arbitrarily (instead we would have to group objects that are built together, like clusters of runway lights).  That might limit the scope of where we can instance.
  • The objects have to look more or less the same, so some categories of very complex objects won’t be subject to instancing.  (E.g. objects with animation where each object might look different.)
  • I have already coded some experiments with geometry shaders, and the results are just dreadful – geometry shaders simply don’t output a huge number of vertices efficiently, so they don’t help us increase our total vertex throughput.  The experience has left me with a “prove it” attitude toward GL extensions that are supposed to make things faster.

When will we know whether instancing can help?  I don’t know — I suspect that I won’t be able to find time for code experiments for a bit, due to other work, particularly on scenery creation and tools.

Posted in Scenery by | Comments Off on The impenetrable Object Barrier

Moving Features to the GPU

A hidden detail of my previous post on variation and terrain textures: variation for flat textures was implemented using more triangles in the DSF in X-Plane 8, but is implemented in a shader in X-Plane 9.  This means that you don’t get this feature in X-Plane 9 if shaders are off.

My guess is that this is perfectly acceptable to just about every user.
  • If you don’t have shaders, you have something like a GeForce 4 or Radeon 8500, and are fighting for frame-rate.  In this case, not paying the price of layer-based variation is a win.
  • If you have shaders, you’re getting better performance because the shader creates variation more efficiently than the old layering scheme did.

This kind of move of a feature to the GPU can only happen at major versions when we recut the global scenery, because (to utilize the benefit) the DSFs are recut with fewer (now unneeded) layers.  So features aren’t going to mysteriously disappear mid-version.

I do have a goal to move more layering-type features to the GPU for future global scenery renders.  There are a number of good reasons:
  • DSF file size is limited – we have distribution requirements on the number of DVDs we ship.  So DSF file size is better spent on more detailed meshes than on layers.
  • GPU power is increasing faster than anything else, so it’s good to put these effects on the GPU – the GPU is still hungry for more!
  • If a feature is run on the GPU, we can scale it up or down or turn it on or off, for more flexible rendering settings on a wide variety of hardware.  A feature baked into the DSF is there for everyone, no way to turn it off.

My hope for the next render is to (somehow) move the cliff algorithm (which is currently done with 2-4 layers) to the GPU, which would shrink DSFs, improve performance, and probably create nicer looking output.

Posted in Development, File Formats by | Comments Off on Moving Features to the GPU

Two Video Cards, Two Vendors

The short answer is: this is not a very good idea.

Now with OS X, this configuration is supported, and OS X will cleverly copy graphic output from one video card to another to make the system work well. You will get a fps hit when this happens.

With Vista, this configuration isn’t supported. (Snarky comment: it is lame that Microsoft completely rewrote their video driver infrastructure and went backward in terms of configuration support.)

With Linux, I have no idea if this configuration can run. I do know that trying to change my configuration hosed Ubuntu thoroughly and I decided not to break my Linux boxes any more, having spent plenty of time doing that already in the last few days.

For X-Plane, we can’t handle this case very well (at best you get the framerate hit) because we need to share textures between the IOS screen and main screen. So if you are trying to set up an IOS screen, you really do need a dual-headed graphics card. For what it’s worth, every card I’ve gotten in the last few years has had two video outputs.

Posted in Development by | 3 Comments

Fun With Menubars

My Mac Pro has just gotten weirder – I put a Radeon HD 3870 into the second PCIe x16 slot. (The machine comes with  a GeForce 8800.)  I now have one monitor in each.

So here’s where things get fun:
  • Start X-Plane.  60 fps.
  • Drag the window to the second monitor.  30 fps.
  • Quit, move the menu bar to the second monitor, restart.  (X-Plane is now on the right.)  160 fps.
  • Drag the window back to the primary monitor on the left.  100 fps.

What’s going on?  Two things:

  • On OS X, X-Plane’s graphics are rendered by one video card, and that video card (in 921) is the card that has the menu on one of its monitors.
  • When an OpenGL window is displayed on a monitor that is not attached to the video card that is doing the rendering, OS X will copy the image from one video card to another, at a cost of some framerate.

So what’s going on above?  Well, the 60 fps is my 8800.  When I drag the window, the OS starts copying the graphics, slowing fps.  When I move the menu bar, the 3870 does the rendering, and we get much higher fps.  Once again, put the window on the monitor that is not attached to the video card, and fps hit.

Final note: fps tests of the 8800 vs 3870 with X-Plane 921:
Fps test 2, 8800: 46,49,51
Fps test 2, 3870: 70,75,80
Fps test 3, 8800: 24,25,25
Fps test 3, 3870: 40,41,43
In other words, the 3870 is significantly faster.  I believe that this is due to the OS X drivers, not the cards themselves.  Note that the 3870 is in a PCIe 1.0 slot and the 8800 is in a PCIe 2.0 slot.
Posted in Development by | 9 Comments

Hardware Guidance: Four Cores and DX10

I think we’ve reached the point where, if you are putting together a new computer and have X-Plane in mind:

  • Get a quad-core machine if the pricing is favorable (and I think it should be now).
  • Get a “Direct X 10” compatible graphics card.  That would be an nVidia 8, or 9 series (or I guess that crazy new 280 card) or a Radeon HD 2000/3000/4000.  DX10-type cards can be had for $100 to $150.

Quad core is easy: X-Plane 921 will use as many cores as yo have for texture loading (especially in paged scenery), uses two cores all the time, and uses 3 during DSF load.  The infrastructure for this additional scalability (previous builds used two cores, more or less) will let us put 3-d generation on 4 cores or more.  More on this in another post, but basically X-Plane’s utilization of cores is good and getting better, so four cores is good, particularly if it’s not a lot more expensive.

Now for DX10, first I have to say two things:
  1. We don’t use DirectX.  We have no intention of switching to DirectX, dropping OpenGL support, or dropping OS X/Linux support.  I just say “DX10” to indicate a level of hardware functionality (specified by Microsoft).  The DX10 cards have to have certain hardware tricks, and those tricks can be accessed both in OpenGL and Direct3D.  We will access them by OpenGL.
  2. We are not going to drop support for non-DX10 cards!  (We’re not that crazy.)

X-Plane does not yet utilize those new DX10 features, but the DX10-compatible cards are better cards than the past generations, and are now affordable*.  By making sure you get one of these, you’ll be able to use new graphic features when they come out.

* The roll-out of DX10 cards has been similar to DX9.  With the first generation cards there was one expensive but fast card and one cheap but slow card.  With DX10, NVidia got there first, with DX9 ATI did.  Like a few years ago, now that we’re a few revs into the new spec, both vendors are making high quality cards that aren’t too expensive.
Posted in Development by | 4 Comments

(More) Triangle Optimizations

Yesterday I described how triangles and meshes can be optimized and hypothesized that building OBJs carefully could improve vertex throughput.  Having looked at some numbers today, I think the potential for framerate improvement isn’t that great…an improvement would come from cache utilization (post vertex shader), and our cache usage seems to be pretty good already.

Simulating a FIFO vertex cache with 16 vertices (an average number – very old hardware might have 8 or 12, and newer hardware has at least 24 slots) I found that we miss the cache preventably around 15% of the time (using a random set of OBJs from LOWI to test) – sometimes we really missed bad (20-25%) but a lot of the time the miss rate might be as low as 5%.  
What these numbers mean is that at the very best, index optimizations in OBJs to improve vertex throughput might only improve vertex processing by about 15% (with the FPS improvement being less, since vertex throughput isn’t the only thing that slows us down).
In other words, if I solve the cache problem perfectly (which may be impossible) we get at best 15%.
So this could be a nice optimization (every 5% win counts, and they matter if you can improve fps by 5% over and over) but cache utilization isn’t going to change the nature of what you can model with an OBJ, because our cache utilization is already pretty good.
Have a Happy Thanksgiving!
Posted in Development by | Comments Off on (More) Triangle Optimizations

Triangle Optimizations

I’ve been looking a bit at triangle optimization – first some terminology:

  • Indexed triangles means that the vertices in a mesh are referred to by index numbers.  This is the scheme OBJ8 uses.  The advantage of indexing is that if a single vertex is used by many triangles (that share a corner) you only have to include the vertex data once, and then use that data many times by index.  (The savings from indexing depend on how often vertices are shared.)
  • Triangle strips are strips of triangles sharing common edges.  Because triangles in strips share so many common vertices, they can be stored in a compact form, for a savings of almost 3x.
Back in the old days, triangle strips were critical for performance (hence the presence of strips in the OBJ2 and OBJ7 formats).  However with modern hardware, indexing is more efficient – the slight increase in data size (due to the index) isn’t as expensive as the cost of specifying “we’re done with one strip, start the next one”.  (Consider that if we use indexed triangles, we can submit all triangles in one batch – with strips, we need one batch per strip.)  Thus OBJ8 uses indexing and doesn’t provide any strip primitives.
There is one other concept to be aware of: cache utilization.  Graphics cards remember the last few vertices they processed, so if a mesh repeats a vertex shortly after using it, the graphics card can save work.  Triangle strips naturally use a cache somewhat well because vertices occur in close succession.
Strips and DSF
DSF allows for triangle strips (and triangle fans) as a space-saving measure.  Even with indexing, the indices can be compressed if strips and fans are used, and with DSF, file size was a very high priority.
When the DSF file is loaded, the data is rebuilt into indexed triangles (and reindexed – the DSF internal structures don’t provide as good indexing as the DSF loader can create) – in version 803 we first started using indexed triangles and found it to be a big win.
MeshTool will generate triangle fans (as a space saving measure) – if you build a DSF by hand (using DSF2Text), use strips/fans to compress file size.
Because DSF focuses on file size, the quality of mesh output is a function of the DSF loader, which has to run while flying.  So while I can imagine some improvements in future performance, I don’t expect to be able to get huge wins because the very best mesh optimizing algorithms are much too slow for real-time use.
The DSF loader already produces full indexing and preserves cache utilization from strips and fans – the next logical optimization would be to reorder non-strip, non-fan triangles for better cache use on load; the order in the DSF file may be optimized for file size and not cache utilization.
Optimizing OBJs
Where I believe there could be real improvement is in OBJ8 generation.  The OBJ loader currently loads the indexed OBJ triangles exactly as specified in the file – build a smarter file and we can get faster framerate.  There are two possible ways to win:
  • Cache utilization – by ordering vertices for cache use, we can get better throughput.
  • Hidden surface removal – by putting the exterior triangle earlier in the OBJ, we can draw them first, occluding the interior of an object, which cuts down fill rate.  (In an airplane, you would want the exterior fuselage first in the OBJ, before the seats inside, so that only the pixels visible through the window are drawn.)

This second form of optimization may be of limited utility in that an OBJ8 optimizer has to respect authoring decisions about translucency, attributes, etc.

I am investigating OBJ optimization now – my hope would be to put optimization into a new version of the ac3d exporter and ObjConverter.
Strips and the iphone
There is one place that triangle strips do matter: the iphone.  It turns out that the iphone will process triangles a lot faster if they are presented in a strip-like order.  So the iphone DSFs are the first to use triangle strips (instead of fans), and the OBJ exporter for the iphone optimizes the OBJ mesh into triangle strip order.
My tests indicate that strip order makes no difference on modern ATI and nVidia GPUs, so there is no point in releasing these optimizations in the main X-Plane tools.  In the long term, I expect our OBJ tools will have two optimization paths – a strip-based path for the iphone and a cache utilization-based path for the desktop.
Posted in Development by | 2 Comments

Why Animating Cars Doesn’t Always Work Right

I saw a post about this on X-Plane.org…authors sometimes try to make a vehicle (a car, truck, etc) modeled via an OBJ “drive around” using animation translate commands.  The problem is that sometimes the objects disappear.  Here’s what is going on:

X-Plane uses a bounding sphere to decide whether to draw an object.  The bounding sphere is the smallest sphere X-Plane can fit around the entire object; if the sphere is on screen, the object is drawn (even if the object itself isn’t on screen).  We do this because we can test whether the sphere is on screen very quickly.
But what if the object has animation?  X-Plane attempts to guess how animation might affect the sphere by looking at animation commands and making the sphere a bit bigger where animation might move the object outside the sphere.  This process works, well, rather poorly. In particular, X-Plane doesn’t know exactly how your datarefs will change.  This results in two error cases:
  • If X-Plane assumes the animation is more drastic than it really is, we make the sphere too big.  The object will then be drawn even when it is not on screen (because the sphere is on screen because it is too big).  This case hurts fps but does not cause objects to disappear.
  • If X-Plane assumes the animation is less drastic than it really is, we do not make the sphere big enough, and sometimes the object “disappears” because the object is on screen but the (too small) sphere is not.

Now let’s apply this to objects that are driving around.  Usually this is done via a translate animation command where datarefs feed the object’s position.

X-Plane estimates the effects of a translate animation using the largest and smallest key frame values.  But the animation engine will extrapolate beyond these key frames.  So consider these three cases:
  • As your dataref goes from -1 to 1, you translate by +/- 1 meter.  In this case, the bounding sphere will be increased in radius by one meter.
  • As your dataref goes from -25 to 25, you translate by +/- 25 meters.  In this case, the bounding sphere is increased in radius by twenty five meters.
  • As your dataref goes from -1000 to 1000, you translate +/- 1 kilometer.  In this case, the bounding sphere is increased in radius by 1000 meters.

Note that in all three of these cases, the animation works exactly the same!  But by using different dataref and value extremes, X-Plane’s estimate of the effects of the animation (and its change to the boundign sphere) can be quite different.

So…if you animate an object and it disappears, it is probably because the bounding sphere has not been increased, perhaps because a translation animation is being sent values outside its minimum and maximum values.
The problem is of course that to have an object “roam” over a large area, it must have a very large bounding sphere, which means it is being drawn a lot more than necessary.
Posted in Development, File Formats by | Comments Off on Why Animating Cars Doesn’t Always Work Right