hardware Archives - Page 4 of 7

Moving Features to the GPU

A hidden detail of my previous post on variation and terrain textures: variation for flat textures was implemented using more triangles in the DSF in X-Plane 8, but is implemented in a shader in X-Plane 9. This means that you don’t get this feature in X-Plane 9 if shaders are off.

My guess is that this is perfectly acceptable to just about every user.

If you don’t have shaders, you have something like a GeForce 4 or Radeon 8500, and are fighting for frame-rate. In this case, not paying the price of layer-based variation is a win.
If you have shaders, you’re getting better performance because the shader creates variation more efficiently than the old layering scheme did.

This kind of move of a feature to the GPU can only happen at major versions when we recut the global scenery, because (to utilize the benefit) the DSFs are recut with fewer (now unneeded) layers. So features aren’t going to mysteriously disappear mid-version.

I do have a goal to move more layering-type features to the GPU for future global scenery renders. There are a number of good reasons:

DSF file size is limited – we have distribution requirements on the number of DVDs we ship. So DSF file size is better spent on more detailed meshes than on layers.
GPU power is increasing faster than anything else, so it’s good to put these effects on the GPU – the GPU is still hungry for more!
If a feature is run on the GPU, we can scale it up or down or turn it on or off, for more flexible rendering settings on a wide variety of hardware. A feature baked into the DSF is there for everyone, no way to turn it off.

My hope for the next render is to (somehow) move the cliff algorithm (which is currently done with 2-4 layers) to the GPU, which would shrink DSFs, improve performance, and probably create nicer looking output.

Posted in Development, File Formats by Ben Supnik | Comments Off

Two Video Cards, Two Vendors

The short answer is: this is not a very good idea.

Now with OS X, this configuration is supported, and OS X will cleverly copy graphic output from one video card to another to make the system work well. You will get a fps hit when this happens.

With Vista, this configuration isn’t supported. (Snarky comment: it is lame that Microsoft completely rewrote their video driver infrastructure and went backward in terms of configuration support.)

With Linux, I have no idea if this configuration can run. I do know that trying to change my configuration hosed Ubuntu thoroughly and I decided not to break my Linux boxes any more, having spent plenty of time doing that already in the last few days.

For X-Plane, we can’t handle this case very well (at best you get the framerate hit) because we need to share textures between the IOS screen and main screen. So if you are trying to set up an IOS screen, you really do need a dual-headed graphics card. For what it’s worth, every card I’ve gotten in the last few years has had two video outputs.

Posted in Development by Ben Supnik | 3 Comments

Fun With Menubars

My Mac Pro has just gotten weirder – I put a Radeon HD 3870 into the second PCIe x16 slot. (The machine comes with a GeForce 8800.) I now have one monitor in each.

So here’s where things get fun:

Start X-Plane. 60 fps.
Drag the window to the second monitor. 30 fps.
Quit, move the menu bar to the second monitor, restart. (X-Plane is now on the right.) 160 fps.
Drag the window back to the primary monitor on the left. 100 fps.

What’s going on? Two things:

On OS X, X-Plane’s graphics are rendered by one video card, and that video card (in 921) is the card that has the menu on one of its monitors.
When an OpenGL window is displayed on a monitor that is not attached to the video card that is doing the rendering, OS X will copy the image from one video card to another, at a cost of some framerate.

So what’s going on above? Well, the 60 fps is my 8800. When I drag the window, the OS starts copying the graphics, slowing fps. When I move the menu bar, the 3870 does the rendering, and we get much higher fps. Once again, put the window on the monitor that is not attached to the video card, and fps hit.

Final note: fps tests of the 8800 vs 3870 with X-Plane 921:

Fps test 2, 8800: 46,49,51

Fps test 2, 3870: 70,75,80

Fps test 3, 8800: 24,25,25

Fps test 3, 3870: 40,41,43

In other words, the 3870 is significantly faster. I believe that this is due to the OS X drivers, not the cards themselves. Note that the 3870 is in a PCIe 1.0 slot and the 8800 is in a PCIe 2.0 slot.

Posted in Development by Ben Supnik | 9 Comments

Hardware Guidance: Four Cores and DX10

I think we’ve reached the point where, if you are putting together a new computer and have X-Plane in mind:

Get a quad-core machine if the pricing is favorable (and I think it should be now).
Get a “Direct X 10” compatible graphics card. That would be an nVidia 8, or 9 series (or I guess that crazy new 280 card) or a Radeon HD 2000/3000/4000. DX10-type cards can be had for $100 to $150.

Quad core is easy: X-Plane 921 will use as many cores as yo have for texture loading (especially in paged scenery), uses two cores all the time, and uses 3 during DSF load. The infrastructure for this additional scalability (previous builds used two cores, more or less) will let us put 3-d generation on 4 cores or more. More on this in another post, but basically X-Plane’s utilization of cores is good and getting better, so four cores is good, particularly if it’s not a lot more expensive.

Now for DX10, first I have to say two things:

We don’t use DirectX. We have no intention of switching to DirectX, dropping OpenGL support, or dropping OS X/Linux support. I just say “DX10” to indicate a level of hardware functionality (specified by Microsoft). The DX10 cards have to have certain hardware tricks, and those tricks can be accessed both in OpenGL and Direct3D. We will access them by OpenGL.
We are not going to drop support for non-DX10 cards! (We’re not that crazy.)

X-Plane does not yet utilize those new DX10 features, but the DX10-compatible cards are better cards than the past generations, and are now affordable*. By making sure you get one of these, you’ll be able to use new graphic features when they come out.

* The roll-out of DX10 cards has been similar to DX9. With the first generation cards there was one expensive but fast card and one cheap but slow card. With DX10, NVidia got there first, with DX9 ATI did. Like a few years ago, now that we’re a few revs into the new spec, both vendors are making high quality cards that aren’t too expensive.

Posted in Development by Ben Supnik | 4 Comments

(More) Triangle Optimizations

Yesterday I described how triangles and meshes can be optimized and hypothesized that building OBJs carefully could improve vertex throughput. Having looked at some numbers today, I think the potential for framerate improvement isn’t that great…an improvement would come from cache utilization (post vertex shader), and our cache usage seems to be pretty good already.

Simulating a FIFO vertex cache with 16 vertices (an average number – very old hardware might have 8 or 12, and newer hardware has at least 24 slots) I found that we miss the cache preventably around 15% of the time (using a random set of OBJs from LOWI to test) – sometimes we really missed bad (20-25%) but a lot of the time the miss rate might be as low as 5%.

What these numbers mean is that at the very best, index optimizations in OBJs to improve vertex throughput might only improve vertex processing by about 15% (with the FPS improvement being less, since vertex throughput isn’t the only thing that slows us down).

In other words, if I solve the cache problem perfectly (which may be impossible) we get at best 15%.

So this could be a nice optimization (every 5% win counts, and they matter if you can improve fps by 5% over and over) but cache utilization isn’t going to change the nature of what you can model with an OBJ, because our cache utilization is already pretty good.

Have a Happy Thanksgiving!

Posted in Development by Ben Supnik | Comments Off

Triangle Optimizations

I’ve been looking a bit at triangle optimization – first some terminology:

Indexed triangles means that the vertices in a mesh are referred to by index numbers. This is the scheme OBJ8 uses. The advantage of indexing is that if a single vertex is used by many triangles (that share a corner) you only have to include the vertex data once, and then use that data many times by index. (The savings from indexing depend on how often vertices are shared.)
Triangle strips are strips of triangles sharing common edges. Because triangles in strips share so many common vertices, they can be stored in a compact form, for a savings of almost 3x.

Back in the old days, triangle strips were critical for performance (hence the presence of strips in the OBJ2 and OBJ7 formats). However with modern hardware, indexing is more efficient – the slight increase in data size (due to the index) isn’t as expensive as the cost of specifying “we’re done with one strip, start the next one”. (Consider that if we use indexed triangles, we can submit all triangles in one batch – with strips, we need one batch per strip.) Thus OBJ8 uses indexing and doesn’t provide any strip primitives.

There is one other concept to be aware of: cache utilization. Graphics cards remember the last few vertices they processed, so if a mesh repeats a vertex shortly after using it, the graphics card can save work. Triangle strips naturally use a cache somewhat well because vertices occur in close succession.

Strips and DSF

DSF allows for triangle strips (and triangle fans) as a space-saving measure. Even with indexing, the indices can be compressed if strips and fans are used, and with DSF, file size was a very high priority.

When the DSF file is loaded, the data is rebuilt into indexed triangles (and reindexed – the DSF internal structures don’t provide as good indexing as the DSF loader can create) – in version 803 we first started using indexed triangles and found it to be a big win.

MeshTool will generate triangle fans (as a space saving measure) – if you build a DSF by hand (using DSF2Text), use strips/fans to compress file size.

Because DSF focuses on file size, the quality of mesh output is a function of the DSF loader, which has to run while flying. So while I can imagine some improvements in future performance, I don’t expect to be able to get huge wins because the very best mesh optimizing algorithms are much too slow for real-time use.

The DSF loader already produces full indexing and preserves cache utilization from strips and fans – the next logical optimization would be to reorder non-strip, non-fan triangles for better cache use on load; the order in the DSF file may be optimized for file size and not cache utilization.

Optimizing OBJs

Where I believe there could be real improvement is in OBJ8 generation. The OBJ loader currently loads the indexed OBJ triangles exactly as specified in the file – build a smarter file and we can get faster framerate. There are two possible ways to win:

Cache utilization – by ordering vertices for cache use, we can get better throughput.
Hidden surface removal – by putting the exterior triangle earlier in the OBJ, we can draw them first, occluding the interior of an object, which cuts down fill rate. (In an airplane, you would want the exterior fuselage first in the OBJ, before the seats inside, so that only the pixels visible through the window are drawn.)

This second form of optimization may be of limited utility in that an OBJ8 optimizer has to respect authoring decisions about translucency, attributes, etc.

I am investigating OBJ optimization now – my hope would be to put optimization into a new version of the ac3d exporter and ObjConverter.

Strips and the iphone

There is one place that triangle strips do matter: the iphone. It turns out that the iphone will process triangles a lot faster if they are presented in a strip-like order. So the iphone DSFs are the first to use triangle strips (instead of fans), and the OBJ exporter for the iphone optimizes the OBJ mesh into triangle strip order.

My tests indicate that strip order makes no difference on modern ATI and nVidia GPUs, so there is no point in releasing these optimizations in the main X-Plane tools. In the long term, I expect our OBJ tools will have two optimization paths – a strip-based path for the iphone and a cache utilization-based path for the desktop.

Posted in Development by Ben Supnik | 2 Comments

Threaded FM – Probably Not

I always have to hesitate before posting a possible future direction to my blog – our future plans are a road map, a direction we intend to follow, but if circumstances change, our plans change. (This is one of the great powers of software: the ability to be flexible!) Unfortunately in the past, I’ve posted ideas, and then when we didn’t productize them, gotten back “but you promised X” from users. So now I’m a little bit gun-shy.

But let’s try the reverse: what about a feature that I am now pretty sure won’t go into the sim?

We were looking at running the flight model on a separate core from the rendering engine. The idea is that the less work must be done in series with that main rendering thread, the higher the total frame-rate. But now it looks like it’s not worth it. Here’s my logic:

The rendering engine now runs best on at least two cores, because all loading is done on a second core. So unless you have a 4+ core machine, X-Plane is utilizing close to all of your hardware already.
The flight model isn’t very expensive – and the faster the machine, the less percent of time the flight model takes (because it does not become more expensive with higher rendering settings).
Therefore I must conclude: threading the flight model would only help framerate on hardware that doesn’t need the help – modern 4+ core machines.

So why not code it? (Even if the improvement in framerate would be pretty low, it would be more than zero.) Well, besides the opportunity cost of not coding something more useful, there’s one thing that makes a threaded flight model very expensive: plugins.

Plugins can run during various parts of the rendering engine, and they can write data into the flight model. I bounced a number of ways of coping with this off of Sandy, Andy, and others, and I don’t see a good way to do it. Basically every scheme includes some combination of a huge performance hit if a plugin writes data from render time, a lot of complexity, or both.

So the simplest thing to do is to not try to thread the FM against the rendering engine, and instead continue to use more cores to improve the rendering engine.

This doesn’t apply to running more than one FM at the same time (e.g. AI planes and the main plane at the same time). It’s the question of the FM vs. the rendering engine that I think now is not worth the benefit.

Posted in Development by Ben Supnik | 1 Comment

Releases, Bugs, and the iphone

Sometimes I have one of those weeks where all I do is look at crashes and weird behavior. This is turning into one of those weeks. So here’s some status on the various bugs floating around.

I should say: you’ll find a lot of developers blaming the technology providers for bugs (just look at how many OpenGL developers have blamed ATI for their apps crashing). Sometimes it’s the app, sometimes it’s the driver. More importantly:

You don’t know who’s fault it is until you fully understand the bug.
The fix for a bug might not be in the broken code. That is, one piece of code can work around a bug in another.

So…you can’t necessarily tell whose fault it was from new drivers coming out, us changing the sim, etc. But…when it’s my stupid code, I’ll admit it openly – no one should think that a bunch of other smart programmers are screwing up on my behalf. (This is also useful to other apps developers, who can know that my bug isn’t the same as their bug, since my bug was in my code.)

X-Plane: we’re happy with 921r2 finally…the final bug (crash on startup on the Mac) was due to an incompatibility between Apple’s OpenAL implementation and special memory allocator (which is really just a wrapper around NEDMalloc). I still don’t know exactly what the rules should be (you try reading the C++ spec!) but for now we turned the allocator off on Mac.

(This brings up another issue about bugs – you can’t tell whose bug it is by whose code crashes, since one piece of code can sabotage another.)

So the next X-Plane release will probably be 930, with new features. We may have a few more language patches if needed.

iPhone: the 9.01 iPhone patch is out, and it improves framerate a bit. We are still seeing crashes on startup for users who have just downloaded the app. Rebooting the phone will fix this, but please see this post for more info! We need your help to collect more data.

Radeon 9800 on Windows: for the longest time we’ve had users reporting “framebuffer incomplete” errors when using catalyst 8.x drivers, an R300 chipset, and Windows with X-Plane 9.x. I have been trying to reproduce this problem “in the lab” off and on for months, but finally saw it this week. From what I can tell, we’re getting into some low-memory condition and the driver is freaking out in various ways. The command line options people sometimes use to get past this are probably rearranging memory, not saving it. I don’t know why the Catalyst 7.x drivers don’t have this problem. But…at least I am making more progress than I was before. Please see this post for more info.

Installers: I am working on the 2.05 installer. I have seen a number of users report problems running a full install from DVDs, so I am just starting to investigate that. I will post more when I have something to test. Unfortunately the problems reported are not something we see here.

Posted in Development, News by Ben Supnik | 1 Comment

The Future of Triangles Part 5: The Technology of the Future

I’ve rewritten this post about four times now…let me try the brief version.

Basically, X-Plane is not an early adopter of graphics technology. Because of the nature of the rendering we do, we can directly benefit from “more of the same”, e.g. if you simply gave me twice as many objects per second or twice as many polygons, we could make the sim look a lot nicer. So we don’t need to adopt new graphics technologies until they’re proven in games that need them more, like first person shooters. We’re a small company with no influence on the industry, so we write the tightest message we can and use new features when the dust settles.

(From a utilization standpoint, we also provide the best graphics to the most people by using card features that are going to become wide spread, so it doesn’t make sense for us to gamble on vendor-specific extensions that might not become available to everyone.)

With that in mind, there is some cool stuff that people are talking about that maybe someday we’ll get to play with:

Irregular Shadow Mapping – given a super-programmable card, you can create a rendering scheme that optimizes shadow map creation to remove artifacts.
Out-of-order blending – the graphics card resorts incoming geometry so that all translucent geometry is drawn back to front. Doing this on the CPU is expensive (and in X-Plane’s case, we often just don’t get it right at all).
Multiple dispatch to multiple targets. Even on a big multi-chip GPU (a lot of modern cards are two cards stuck together) the only render to one screen or texture at a time, even if there are a lot of parallel elements. This is good for a few big complex scenes but not good for lots of small scenes. I’d like to see all vendors support dispatch to multiple targets – this will make things like dynamic reflection via environment cube maps potentially a lot faster.
Voxel Octrees. This is the one I hear a lot about – basically it’s a change from 2-d to 3-d data structures on the graphics card to manage fast access to large chunks of graphics data. (Shadow maps, z-buffers, and environment maps are all more or less 2-d data structures.)

Will we see this? I don’t know. Will Larabbee change everything? Who knows…Intel has to build a high-end graphics card to fight ATI and NV’s attempt to get into supercomputing, but if they happen to also build a really nice video card, I can live with that. But I won’t hold my breath – the titans need to duke it out without me!

Posted in Development, Scenery by Ben Supnik | Comments Off

The Future of Triangles Part 4: Pie in the Sky

Per-pixel lighting is something I hope to have in X-Plane soon. A number of other features will take longer, and quite possibly might never happen. This is the “pie in the sky” list – with this list, we’re looking at higher hardware requirements, a lot of development time, and potential fundamental problems in the rendering algorithm!

High Dynamic Range (HDR) Lighting

HDR is a process whereby a program renders its scene with super bright and super dark regions, using a more detailed frame-buffer to draw. When it comes time to show the image, some kind of “mapping” algorithm then represents that image using the limited contrast available on a computer monitor. Typical approaches include:

Scaling the brightness of the scene to mimic what our eyes do in dark or bright scenes.
Creating “bloom”, or blown out white regions, around very bright areas.

Besides creating more plausible lighting, the mathematics behind an HDR render would also potentially improve the look of lit textures when they are far away. (Right now, a lit and dark pixel are blended to make semi-lit pixels when far away as the texture scales down. If a lit pixel can be “super-bright” it will still look bright even after such blending.)

Besides development time, HDR requires serious hardware; the process of drawing to a framebuffer with the range to draw chews up a lot of GPU power, so HDR would be appropriate for a card like the GeForce 8800.

While there aren’t any technical hurdles to stop us from implementing HDR, I must point out that, given a number of the “art” features of X-Plane like the sun glare, HDR might not be as noticeable as you’d think. For example, our sun “glares” when you look at it (similar to an HDR trick), but this is done simply by us detecting the view angle and drawing the glare in.

Reflection Mapped Airplanes

Reflection maps are textures of the environment that are mapped onto the airplane to create the appearance of a shiny reflective surface. We already have one reflection map: the sky and possibly scenery are mapped onto the water to create water reflections.

Reflection maps are very much possible, but they are also very expensive; we have to go through a drawing pass to prepare each one. And reflection maps for 3-d objects like airplanes usually have to be done via cube maps, which means six environment maps!

There’s a lot of room for cheating when it comes to environment maps. For example: rendering environment maps with pre-made images or with simplified worlds.

Shadows

Shadows are the biggest missing feature in the sim’s rendering path, and they are also by far the hardest to code. I always hesitate to announce any in-progress code because there is a risk it won’t work. But in this case I can do so safely:

I have already coded global shadow maps, and we are not going to enable it in X-Plane. The technique just doesn’t work. The code has been ripped out and I am going to have to try again with a different approach.

The problem with shadows is the combination of two unfortunate facts:

The X-Plane world is very, very big and
The human eye is very, very picky when it comes to shadows.

For reflections, we can cheat a lot — if we don’t get something quite right, the water waves hide a lot of sins. (To work on the water, I have to turn the waves completely off to see what I’ m doing!) By comparison, anything less than perfect shadows really sticks out.

Shadow maps fail for X-Plane because it’s a technology with limited resolution in a very large world. At best I could apply shadows to the nearest 500 – 1000 meters, which is nice for an airport, but still pretty useless for most situations.

(Lest someone send the paper to me, I already tried “TSM” – X-Plane is off by about a factor of 10 in shadow map res; TSM gives us about 50% better texture use, which isn’t even close.)

A user mentioned stencil shadow volumes, which would be an alternative to shadow maps. I don’t think they’re viable for X-Plane; stencil shadow volumes require regenerating the shadow volumes any time the relative orientation of the shadow caster and the light source change; for a plane in flight this is every single plane. Given the complexity of planes that are being created, I believe that they would perform even worse than shadow maps; where shadow maps run out of resolution, stencil shadow volumes would bury the CPU and PCIe bus with per-frame geometry. Stencil shadow volumes also have the problem of not shadowing correctly for alpha-based transparent geometry.

(Theoretically geometry shaders could be used to generate stencil shadow volumes; in practice, geometry shaders have their own performance/throughput limitations – see below for more.)

Shadows matter a lot, and I am sure I will burn a lot more of my developer time working on them. But I can also say that they’re about the hardest rendering problem I’m looking at.

Dynamic Tessellation

Finally, I’ve spent some time looking at graphics-card based tessellation. This is a process whereby the graphics card splits triangles into more triangles to make curved surfaces look more round. The advantage of this would be lower triangle counts – the graphics card can split only the triangles that are close to the foreground for super-round surfaces.

The problem with dynamic tessellation is that the performance of the hardware is not yet that good. I tried implementing tessellation using geometry shaders, and the performance is poor enough that you’d be better off simply using more triangles (which is what everyone does now).

I still have hopes for this; ATI’s Radeon HD cards have a hardware tessellator and from what I’ve heard its performance is very good. If this kind of functionality ends up in the DirectX 11 specification, we’ll see comparable hardware on nVidia’s side and an OpenGL extension.

(I will comment more on this later, but: X-Plane does not use DirectX – we use OpenGL. We have no plans to switch from OpenGL to DirectX, or to drop support for Linux or the Mac. Do not panic! I mention DirectX 11 only because ATI and nVidia pay attention to the DirectX specification and thus functionality in DirectX tends to be functionality that is available on all modern cards. We will use new features when they are available via OpenGL drivers, which usually happens within a few months of the cards being released, if not sooner.)

Posted in Development, File Formats by Ben Supnik | 2 Comments

Tag: hardware