A user asked me to write a little bit about Vulkan. My first reaction was to not post anything for a simple reason: Vulkan is a feature for me, not for you. That is, Vulkan does not make you (the user of X-Plane’s) life directly better; instead it makes my development job easier and it makes it possible for me to create a better X-Plane.
Some day if we end up running on multiple drivers (e.g. Vulkan and OpenGL and Metal), you may not be able to tell the difference between the Vulkan and OpenGL version; the sky won’t be more blue, the clouds won’t be more puffy. I can’t think of any features exposed by Vulkan that aren’t in OpenGL 4.x with extensions now — we already have sparse memory, tessellation and compute in OpenGL. (But then, you might actually be able to tell because framerate might be higher.)
Anyway, having finished reading the (700+ page – that’s what down time in the airport is for) Vulkan spec last week, the rest of this post is my view on Vulkan as a 3-d graphics developer. This is definitely an “inside baseball” kind of post, so if you want to go surf cat videos, I won’t be offended!
The Problems with OpenGL
For an application like X-Plane, Vulkan is an improvement over OpenGL; to understand why we have to look at the problems with OpenGL as an API that allows applications to communicate with 3-d hardware drivers. There are a few:
- OpenGL’s approach to multi-core and threading are antithetical to performance. This is really the straw that broke the camels back for me and OpenGL. Developers who know me know that I’m not a “burn it down and start from scratch” kind of guy*. But the threading model in OpenGL requires drivers to hurt performance with safety checks and locks that an application can’t get away from. This isn’t something you can just fix with an extension.
- OpenGL’s object and binding model also make multicore performance difficult. The use of a “current” object in a context and the ability to radically change objects after they are built mean lots of driver paths require internal locks.
- OpenGL’s compatibility is its greatest strength and greatest weakness. The original plan to rebuild the object model in OpenGL 3.0 died before it was released. Instead the ARB came up with a plan to optionally deprecate APIs. Every vendor of OpenGL except for Apple has chosen to keep backward compatibility with everything. That’s great for keeping old apps working, but it means every change to OpenGL is expected to work with everything else there ever was, ever.
- OpenGL requires that shaders be compiled by the driver. This means an application is exposed to idiosyncrasies in the compiler of each driver we ship with. This isn’t as bad as it used to be, but writing shaders is still a matter of write-once, check everywhere.**
How Vulkan Helps: The design principles behind the Vulkan API address all of these issues.
- The biggest single feature of Vulkan is its new multi-core friendly threading model. Vulkan is “externally” synchronized, which basically means applications can do whatever they want, but have to talk to different parts of the driver from different threads. To use an analogy: OpenGL is filled with traffic lights. Vulkan doesn’t even have stop signs, and it’s up to two drivers to not be on the same road at the same time. (As application developers, this is a great setup, as we know what roads we are on and can plan to not have collisions.)
- The object model clearly separates expensive creation operations from inexpensive usage operations. Expensive object creation can be done on worker threads or at initialization time. Objects can’t be radically reconfigured once they are built, so the small changes that are allowed to existing objects aren’t slow.
- Shaders are compiled ahead of time into an intermediate representation; no more shader compile fails after a driver update.
Vulkan is a smaller, lower level API with clear performance guidelines and a focus on multicore from day one. This is a good fit for what we need with X-Plane.
The Problems with the OpenGL Ecosystem
The OpenGL ecosystem is the collection of all of the companies and programmers working with OpenGL. This includes the vendors of graphics chips who create OpenGL drivers (NVidia, AMD, Intel, Imagination, Qualcomm, ARM), OS vendors who provide OpenGL interfaces (Apple and Google), the major game engine developers that have the ear of the hardware vendors (think Unreal Engine 4, Unity, etc.), the major CAD application developers, etc.
More serious for OpenGL than the problems of the API is the state of the ecosystem.
- OpenGL’s API is underspecified: there is no comprehensive conformance test for OpenGL, so we can’t really know if an OpenGL driver works or is buggy.
- OpenGL’s API is underspecified: no performance guarantees or even recommendations are made. If you ever look at the tech blog Chris and I maintain, it’s full of posts about the latest witchcraft I’ve found to make vertex throughput go faster. That stuff isn’t part of the GL spec, yet you have to know it to build a real-time graphics application.
Given the lack of specifics, application developers and driver writers end up locked in a sort of death-spiral:
- Driver writers observe the behavior of applications and change the driver behavior to work around applications. This can include trying to improve performance and trying to bandage around broken behavior. (Since games are the typical benchmark for new graphics cards, NVidia and AMD are hugely incentivized to make games run faster by any means possible.)
- App developers observe the behavior of the drivers and change application behavior to work around driver issues. In X-Plane’s case this often means intentionally not using the optimal code path for driver stack X because it is slow. If the driver team ever fixes code path X, X-Plane still isn’t using it; when the driver team looks at performance they then decide to improve code path Y (that we are using, the backup plan) because that is what will make our app benchmark faster.
How Vulkan Helps: Vulkan helps by being much more highly specified in terms of both conformance and performance.
- Vulkan is a much smaller, simpler API – OpenGL has simply become too complicated to completely test. With Vulkan, we can hope to test the entire driver.
- Vulkan is being built with an open source test suite from day 1, with the goal being to build up a huge number of tests so we can know that a given driver is correct.
- The Vulkan API is very clear about what operations are fast and what operations are slow. An application that uses the fast API can expect fast performance on those code paths. Guessing is not required.
Downsides to Vulkan
For smaller OpenGL application developers like Laminar Research, I can think of just one down-side; it’s one that I haven’t seen a lot of application developers talk about, probably because it requires admitting that we (the app developers) might not be as smart as those driver guys.
OpenGL is a higher level API; OpenGL applications leave some of the hard problems of 3-d graphics up to the driver. This means that some of these very hard, very important performance problems are being solved by a team of engineers from the company that built the hardware. They have resources, they know everything about the particular hardware they are coding for, and performance is job one.
Vulkan is a low level API; with Vulkan, some of those hard problems will be solved by Chris.
Ha ha…no, I’m totally kidding. We don’t let Chris play with pointers or any other sharp objects. That code will be written by me, and it’s a safe bet that I know less about the hardware than the driver team and have less time to do pure performance work then all of the engineers at Nvidia or AMD who work on the OpenGL stack.
This is a calculated risk for Vulkan as an eco-system. The hope is that (1) with specific information about performance as part of an API, us application developers won’t screw things up too badly and (2) because we (the application developers) know more about our specific applications, we can optimize performance in ways the driver guys can’t because they don’t have the bigger picture. One of the hardest problems in a graphics API is conveying intent; OpenGL drivers spend a ton of code trying to guess what the application is trying to do. Vulkan solves this problem by letting the application make performance decisions itself.
If you know OpenGL but don’t know a lot about how 3-d graphics hardware really works, reading the Vulkan spec is a little bit like believing that Santa Claus brings your presents down the chimney and then [spoiler alert] one day reading a 700+ page PDF that explains how your parents actually buy the presents a month or two in advance, wrap and hide them, and then sneak them under the tree while you are asleep.
For X-Plane, the Vulkan spec makes clear what types of operations future drivers will support, what will be done for us, and what we have to do ourselves. This gives us a good framework to then incrementally build next-generation rendering code in X-Plane that is “Vulkan-friendly” – even if it is still using OpenGL. My guess is that this new code will be faster than the code it replaces even before we change drivers.
Once we have the rendering code restructured and modernized, we can then set it up to run on Vulkan or OpenGL, taking advantage of the Vulkan pathways where they exist.
One final thought: I have no idea how the actual experience of coding for Vulkan will be. It may be that everything “just works”, or it may be an exercise in frustration. Once we’ve been through a full port I’ll post a post-mortem, but I think it’s too soon for anyone out there to have good realistic feedback on a full Vulkan porting experience from OpenGL.***
* And when people suggest ground up rewrites I usually link to this.
** For what it’s worth, the driver shader bugs I see (and the ones that really take up my time) are bugs in the back end of the compiler, where optimized machine code for the GPU are generated. Vulkan isn’t going to fix this; Vulkan removes the front end of the compiler from the driver but not the back end.
*** Yes, there are games running on Vulkan now. But if your game is already running on multiple APIs, that’s different from porting a straight OpenGL app. The final spec and roduction drivers haven’t been out long enough for anyone to be able to say “I ported half a million lines of OpenGL code to Vulkan and here’s the result.”