Normally I try to not mention specific add-ons when talking about problems; it’s not fair to the add-on maker, it’s really hard to know what the real problem is without knowing everything about the bug, the problems I blog about usually affect a wide array of add-ons, and I don’t want to throw add-on makers under the bus. It’s not fair to them, particularly in this case.
In this case, however, approximately everybody knows that the Zibo was missing its wings in 11.50 beta 9 – it’s a very widely used add-on, and it’s a perfect illustration of the problems with third party content validation, which is what this post is about.
So…gather round children, and I will tell you the tale of how the Zibo lost its wings.
With Vulkan and Metal, X-Plane is now firmly in the driver’s seat for VRAM management. This lets us eliminate stutters that were previously present with OpenGL and almost impossible to avoid. It has, however, one big and noticeable downside: when you run out of VRAM, you get blurry textures.
Of course the goal wasn’t to replace stuttering with blurry textures, and we believe that given the normal work load of X-Plane, you should not be seeing this. The fact that so many users are seeing blurry textures, especially on big cards with lots of VRAM, points to the VRAM code being buggy in all the ways beta code can be buggy.
How much VRAM do I need?
Just to get the obvious out of the way, our system requirements have not changed for 11.50. Our minimum VRAM requirement for X-Plane 11 is still 1Gb. We expect these cards to be able to run 1080p with lowered texture resolution, providing an equal experience to X-Plane 11.41. With 2Gb we expect users to be able to run HDR with medium texture resolution on 1080p systems. 4Gb and higher should be able to run HDR at with high resolution textures with 2k monitor resolutions. With 6Gb and higher, 4k monitor resolution should be possible.
A week or two ago we had a very dead beta, and posed the question of how to incrementally test betas in the future. We got a variety of responses, ranging from “private test it first” to “roll it out in a wave” to “full speed ahead, we know betas are bumpy.”
Since then, we’ve been doing one of the easiest and probably most useful things we can: posting the betas early to third-party developers who are in our developer Slack channel.
Beta 7/8 had a ton of changes, and our third-party developers found multiple problems, some of which we wouldn’t see in our internal tests. So we held off on releasing betas 7 and 8 to the public while we fixed those issues. Until today.
X-Plane 11.50 has been similar to X-Plane 11.20 (our VR) release and different from what we normally try to do, in that when we went beta (both private and public), the work for Vulkan wasn’t done yet. We had something that you could fly with, that was delightful for some users (and unstable for others), but we also had a big list of things we still needed to do.
X-Plane 11.50b7 has been recalled before it even made it fully out the door.
We had a ton of changes in this one–at one point I pulled over 100 Git commits on our release branch. Ben and Sidney also knocked nearly all items off their features-to-do list.
But thank goodness we asked our third party developers to kick the tires early on this one. They found a beta stopping bug in about 30 minutes! In our attempts to fix some performance issues, we caused the aircraft to be blurry in almost all cases, and we knew that was not an acceptable regression bug for a flight sim.
So sit tight for beta 8 to come soon, and don’t panic when your version numbers skip b7.
In 11.50b6 we added a command line argument to run Aftermath, a debugging utility, hoping it will give us more insight into device loss errors.
A “device loss” error is specifically the crash that accompanies the on-screen (or log.txt) error message “Encountered Vulkan device loss error!” Using Aftermath will not help us investigate VRAM issues–that is a different issue entirely.
If you are on Windows, have an NVidia GPU and you see a device loss error followed by a crash, you can help us track these bugs down by running X-Plane with Aftermath enabled. We know from 11.50b5 that many devices are not compatible with Aftermath, so if you crash and burn immediately, you can go back to using beta 6 without the extra command line option.
We will be using the command line via Command Prompt. (Here are instructions on getting started with this if needed.)
Launch X-Plane from the command line with the following flag:
You can then try to reproduce the steps that caused the initial device loss, or just fly as usual. If device loss happens again, the auto crash report form should come up again. Please fill out your email and submit the auto report to us for investigation.
Well, that was something. I had a very nice post written up last week on the state of beta. We had spent a week very carefully trying to improve stability and then…beta 5 exploded on the launch pad.
So…let’s try this again. But before we get into beta 6, a few graphs:
That’s a graph of auto-reported crashes over time – the big spike up is April 2nd when 11.50 came out. The gap in the timeline at the end is when our crash reporter temporarily was shut off for exceeding quota! From this I can take derive two take-away points:
A lot of people are really excited to try the 11.50 beta even though it’s early and unstable and
The 11.50 beta crashes a lot.
The silver lining is that the crashes we have been collecting are very very informative so it’s been a really great data stream.
Here’s one more graph:
That’s bug reports and they’re up something like 1000% – we have received close to 1800 reports since then. Of these reported bugs, over 500 are in the category of “it crashed” or some other similarly catastrophic, bad thing happened.
So with those graphs in mind, let’s talk about where we are at with the beta.
This post is just targeted at plugin developers who are modernizing their object drawing – if you don’t write plugin code, the Cincinnati Zoo has been showing their animals on Youtube – it’ll be a lot more entertaining than this post. (An XPLMInstance cannot tunnel down two feet in fifteen seconds – one point for the zoo animals.)
XPLMInstance makes a persistent object that lives inside X-Plane that is visible in the 3-d world. It changes how you draw from “run some drawing code every frame” to “tell X-Plane that there is a thing and update its data every now and then.”
Instancing is actually a lot easier than draw callbacks! But there are two tricky gotchas:
1. You must create the custom DataRefs for your OBJ’s animation before you load the object itself with the SDK. (If the DataRefs do not exist at load time, the animations are disabled as “unresolved to any DataRef”.)
2. When you create the instance, make sure your custom DataRefs are on the list of DataRefs for that instance.
Here’s the really baffling thing: if you create the custom DataRef and then add it to the instance’s list, your DataRef callbacks will not be called.
Here’s the trick: the DataRef you register is a global identifier, allowing the object to refer to what it wants to listen to. That’s why you have to create the DataRef – so that the identifier exists.
But when you create an instance, each instance has memory that holds a different copy of those DataRefs.
For example, let’s say you have a truck with four DataRefs, and you make five instances. X-Plane allocates 20 slots (four DataRefs times five instances) to store five copies of each DataRef’s values.
The instances never look at the DataRef itself. They only look at their local copies. That’s why when you push different data to the instance with XPLMSetInstancePosition, each instance animates with its own values – each instance looks at its own local data.
This is also why you won’t see your DataRef callbacks called (unless you use DataRefEditor or some other tool). The object rendering engine isn’t looking at the DataRefs themselves, it’s looking at the local copies.
In other words, XPLMInstance turns DataRefs from the pull model you are used to (X-Plane pulls on your read function to get the value) to a push model (you push set with XPLMSetInstancePosition into the instance’s memory).
This implies two things about your add-on:
It doesn’t really matter what your DataRef read functions do – they can just return zero, and
You can’t use tools like DataRefEditor or DataRefTool to debug your animations. (That didn’t work well in legacy code either, but it really won’t work now.)
If you try the obvious optimization of not creating your custom DataRefs (“hey, no one calls them”) before you create your instance, you will find that animation just stops working. This is because we need the DataRef to be that global identifier to match your instance data with the animations of the object itself.
One last note: if your old code used sim/graphics/animation/draw_object_x/y/z to determine which object was being animated (from inside a plugin “get” function) you do not need to do this anymore. Because each instance has its own local copies and your DataRef function isn’t called, this technique is obsolete.
You must register custom DataRefs.
Their callbacks can just return 0 – they’ll never be called.
Always list your custom DataRefs for animation when you create an instance.
Do not use draw_object_x/y/z; use XPLMSetInstancePosition to create per-specific-instance animation.
I was going to write a post about X-Plane 11.50 beta 5 – what’s new in it, the new ways we are debugging GPU crashes, the crash bugs we’ve fixed, etc. A lot of stuff that we thought was pretty good went into beta 5. Cool new technology! Big bug fixes! Lots of winning!
As it turns out, beta 5 is dead. I hit “go” on the release this afternoon, and half an hour ago, I hit “stop.” The auto crash reporter was showing way too many new crashes in memory management that we had not seen before, and this strongly implies a new and serious bug.
Laminar Installer Users: if you were auto-notified to update to beta five and did so, and you are not crashing, you can keep flying! If your beta five is just a smoldering wreckage of crumpled VRAM and GPU parts, you can re-run the installer with “get betas” option checked, and it will take you back to beta 4.
If you were not auto-notified to update to beta five, that’s probably for the best. Please stand by and keep flying beta four; we’ll post a new beta when we’ve gotten to the bottom of this. We have enough captured crash data to investigate.
Steam Users: we did not release the beta five build to Steam and this is probably a good thing; we’ll try again with a new release that isn’t made of plutonium and unicorn hallucinations.
And if you’re going “why didn’t y’all test it before you released it”…we did! None of our machines show these crashes. But we also have probably a dozen PCs total we can run on. Moving to a new driver stack has meant learning about the weird things that happen on your computers and not ours.
Do We Need a Two Tiered Beta System
This came up in our impromptu beta five post-mortem meeting: do we need to bring people into new betas in stages? With code for new drivers, beta five probably won’t be the last beta where we code something we think is helping and discover that it fails catastrophically, but not on our hardware. We need beta victims^H^H^H^H^Htesters to find these bugs, but once we get a dozen crashes, we don’t need anyone else to stub their toe for us to fix our problems.
So we thought about two possible ways to do this:
A two-tiered system. Early adopters could get an email and hand-update to the new beta before it is put out for auto-update notification.
Send out the beta update notifications over time, e.g. 10% of users get notified immediately, then another 40% if we don’t see crashes, then the last 50%. (This practice is actually industry standard on mobile apps.)
If you are reading this blog post, this far down, you are probably participating in the beta; I’d be curious what approach you’d find most useful.
X-Plane 11.50b4 is now available if you update via the Laminar Research installer. (Steam users: it’s on the servers and we’ll hit go in a few hours if we don’t hear reports of massive crashing and pain again.)
This update was focused on crash fixes and better triaging. We’ve been seeing a huge uptick in volume of bug reports and auto reported crashes since the initial 11.50 public beta release. We are trying to cut through the noise and provide better information in logs and in the remaining crash reports to fix issues faster, and let our support team (primarily me) get the inbox under control.
The best way to help us handle crashes on Windows and Linux is still to submit the auto report form. You can include your email if you want us to be able to find your specific crash, but we do not need the message field–the log and back trace will have pretty much all the info we need. If you send an auto report, please do not also send a bug report form email.
Mac users do not have the ability to auto report, so they should fill out the bug report form, and include the Apple crash report as well as the log.txt. This can be found under your username /Libraries/Logs/DiagnosticReports. The name will include the date & time of the crash and will end in .crash. You may need to show hidden folders to access it.
We were discussing a particularly exasperated sounding bug report on one of the internal Slack channels when I realized that this might not be obvious: a crash with the error message “pipeline must not be null” – it’s one error message that covers a whole category of bugs. We fixed one major case (skycolors were broken) in b1 and added one major case (custom billboard lights on aircraft) in b2 – conservation of pipeline bugs!
Null pipelines are a new category of crash in X-Plane 11.50, so here are a few notes on what this error is and what you can do to help us fix them (and what you don’t need to bother with).
What Is a Pipeline?
A pipeline is just the Vulkan and Metal term for a shader (plus some extra gak (1)) that we use to do our drawing.
X-Plane 11.41 would ask the OpenGL driver to build shaders as it needed them, and then the driver would turn those GL shaders into hardware pipelines on the fly as it got presented with different scenarios.
Not 11.50. We build everything up front. Vulkan has two rules:
Using a pipeline is fast.
Building a pipeline is not fast.
This is a great pair of rules for us – it means if we build our pipelines at load time, we are not going to have stutters mid-frame.
Why Are We Crashing?
There is one down-side to the 11.50 way of doing things: if we don’t build all of the pipelines we need up front during load, then when it comes time to draw, we’re toast. That’s what a “pipeline must not be null” error is – it just means the loading code did not create the pipeline the drawing code needs.
Why not just build every pipeline we could ever possibly need? Load time. X-Plane can build hundreds of thousands of pipelines depending on rendering settings, scenery packs, custom aircraft, etc. We actually did “just build everything” early in our development process and the sim could take half an hour to load.
So we try to build only the pipelines we need. If we build too many, we slow load, and if we build too low, you see this error.
What Do You Do When You See This Error?
On Windows and Linux, it’s really easy: close the alert box and when the auto crash report form comes up, please press “send”. Don’t bother with you email or a message; everything we need to kill this bug is already in the auto report! (Jennifer’s edit: please DO include your email address with any auto report if you want us to be able to confirm we have your specific report! This is the only way we have of identifying who it came from.)
The good news is: the auto crash reports for the pipeline crashes are insanely easy to find and fix.
Mac users: if you see one of these, we need the Apple crash report – please send it in a bug report.
(1) for the plugin developers that know some OpenGL: a pipeline is basically a GLprogram (shader) plus a bunch of the fixed function state that goes with it: blending, depth/stencil, vertex format, FBO format, and some rando stuff thrown in.
The idea is to have the pipeline contain so much information that there is no risk that the driver has to build two hardware shaders for one Vulkan shader (to cope with other fixed function state) no matter how weird the hardware is.
On lots of actual hardware, the pipeline has stuff that’s not actually in the shader, but some surprising things, like vertex format, actually often are.