A week or two ago we had a very dead beta, and posed the question of how to incrementally test betas in the future. We got a variety of responses, ranging from “private test it first” to “roll it out in a wave” to “full speed ahead, we know betas are bumpy.”
Since then, we’ve been doing one of the easiest and probably most useful things we can: posting the betas early to third-party developers who are in our developer Slack channel.
Beta 7/8 had a ton of changes, and our third-party developers found multiple problems, some of which we wouldn’t see in our internal tests. So we held off on releasing betas 7 and 8 to the public while we fixed those issues. Until today.
X-Plane 11.50 has been similar to X-Plane 11.20 (our VR) release and different from what we normally try to do, in that when we went beta (both private and public), the work for Vulkan wasn’t done yet. We had something that you could fly with, that was delightful for some users (and unstable for others), but we also had a big list of things we still needed to do.
X-Plane 11.50b7 has been recalled before it even made it fully out the door.
We had a ton of changes in this one–at one point I pulled over 100 Git commits on our release branch. Ben and Sidney also knocked nearly all items off their features-to-do list.
But thank goodness we asked our third party developers to kick the tires early on this one. They found a beta stopping bug in about 30 minutes! In our attempts to fix some performance issues, we caused the aircraft to be blurry in almost all cases, and we knew that was not an acceptable regression bug for a flight sim.
So sit tight for beta 8 to come soon, and don’t panic when your version numbers skip b7.
In 11.50b6 we added a command line argument to run Aftermath, a debugging utility, hoping it will give us more insight into device loss errors.
A “device loss” error is specifically the crash that accompanies the on-screen (or log.txt) error message “Encountered Vulkan device loss error!” Using Aftermath will not help us investigate VRAM issues–that is a different issue entirely.
If you are on Windows, have an NVidia GPU and you see a device loss error followed by a crash, you can help us track these bugs down by running X-Plane with Aftermath enabled. We know from 11.50b5 that many devices are not compatible with Aftermath, so if you crash and burn immediately, you can go back to using beta 6 without the extra command line option.
We will be using the command line via Command Prompt. (Here are instructions on getting started with this if needed.)
Launch X-Plane from the command line with the following flag:
You can then try to reproduce the steps that caused the initial device loss, or just fly as usual. If device loss happens again, the auto crash report form should come up again. Please fill out your email and submit the auto report to us for investigation.
Well, that was something. I had a very nice post written up last week on the state of beta. We had spent a week very carefully trying to improve stability and then…beta 5 exploded on the launch pad.
So…let’s try this again. But before we get into beta 6, a few graphs:
That’s a graph of auto-reported crashes over time – the big spike up is April 2nd when 11.50 came out. The gap in the timeline at the end is when our crash reporter temporarily was shut off for exceeding quota! From this I can take derive two take-away points:
A lot of people are really excited to try the 11.50 beta even though it’s early and unstable and
The 11.50 beta crashes a lot.
The silver lining is that the crashes we have been collecting are very very informative so it’s been a really great data stream.
Here’s one more graph:
That’s bug reports and they’re up something like 1000% – we have received close to 1800 reports since then. Of these reported bugs, over 500 are in the category of “it crashed” or some other similarly catastrophic, bad thing happened.
So with those graphs in mind, let’s talk about where we are at with the beta.
This post is just targeted at plugin developers who are modernizing their object drawing – if you don’t write plugin code, the Cincinnati Zoo has been showing their animals on Youtube – it’ll be a lot more entertaining than this post. (An XPLMInstance cannot tunnel down two feet in fifteen seconds – one point for the zoo animals.)
XPLMInstance makes a persistent object that lives inside X-Plane that is visible in the 3-d world. It changes how you draw from “run some drawing code every frame” to “tell X-Plane that there is a thing and update its data every now and then.”
Instancing is actually a lot easier than draw callbacks! But there are two tricky gotchas:
1. You must create the custom DataRefs for your OBJ’s animation before you load the object itself with the SDK. (If the DataRefs do not exist at load time, the animations are disabled as “unresolved to any DataRef”.)
2. When you create the instance, make sure your custom DataRefs are on the list of DataRefs for that instance.
Here’s the really baffling thing: if you create the custom DataRef and then add it to the instance’s list, your DataRef callbacks will not be called.
Here’s the trick: the DataRef you register is a global identifier, allowing the object to refer to what it wants to listen to. That’s why you have to create the DataRef – so that the identifier exists.
But when you create an instance, each instance has memory that holds a different copy of those DataRefs.
For example, let’s say you have a truck with four DataRefs, and you make five instances. X-Plane allocates 20 slots (four DataRefs times five instances) to store five copies of each DataRef’s values.
The instances never look at the DataRef itself. They only look at their local copies. That’s why when you push different data to the instance with XPLMSetInstancePosition, each instance animates with its own values – each instance looks at its own local data.
This is also why you won’t see your DataRef callbacks called (unless you use DataRefEditor or some other tool). The object rendering engine isn’t looking at the DataRefs themselves, it’s looking at the local copies.
In other words, XPLMInstance turns DataRefs from the pull model you are used to (X-Plane pulls on your read function to get the value) to a push model (you push set with XPLMSetInstancePosition into the instance’s memory).
This implies two things about your add-on:
It doesn’t really matter what your DataRef read functions do – they can just return zero, and
You can’t use tools like DataRefEditor or DataRefTool to debug your animations. (That didn’t work well in legacy code either, but it really won’t work now.)
If you try the obvious optimization of not creating your custom DataRefs (“hey, no one calls them”) before you create your instance, you will find that animation just stops working. This is because we need the DataRef to be that global identifier to match your instance data with the animations of the object itself.
One last note: if your old code used sim/graphics/animation/draw_object_x/y/z to determine which object was being animated (from inside a plugin “get” function) you do not need to do this anymore. Because each instance has its own local copies and your DataRef function isn’t called, this technique is obsolete.
You must register custom DataRefs.
Their callbacks can just return 0 – they’ll never be called.
Always list your custom DataRefs for animation when you create an instance.
Do not use draw_object_x/y/z; use XPLMSetInstancePosition to create per-specific-instance animation.
I was going to write a post about X-Plane 11.50 beta 5 – what’s new in it, the new ways we are debugging GPU crashes, the crash bugs we’ve fixed, etc. A lot of stuff that we thought was pretty good went into beta 5. Cool new technology! Big bug fixes! Lots of winning!
As it turns out, beta 5 is dead. I hit “go” on the release this afternoon, and half an hour ago, I hit “stop.” The auto crash reporter was showing way too many new crashes in memory management that we had not seen before, and this strongly implies a new and serious bug.
Laminar Installer Users: if you were auto-notified to update to beta five and did so, and you are not crashing, you can keep flying! If your beta five is just a smoldering wreckage of crumpled VRAM and GPU parts, you can re-run the installer with “get betas” option checked, and it will take you back to beta 4.
If you were not auto-notified to update to beta five, that’s probably for the best. Please stand by and keep flying beta four; we’ll post a new beta when we’ve gotten to the bottom of this. We have enough captured crash data to investigate.
Steam Users: we did not release the beta five build to Steam and this is probably a good thing; we’ll try again with a new release that isn’t made of plutonium and unicorn hallucinations.
And if you’re going “why didn’t y’all test it before you released it”…we did! None of our machines show these crashes. But we also have probably a dozen PCs total we can run on. Moving to a new driver stack has meant learning about the weird things that happen on your computers and not ours.
Do We Need a Two Tiered Beta System
This came up in our impromptu beta five post-mortem meeting: do we need to bring people into new betas in stages? With code for new drivers, beta five probably won’t be the last beta where we code something we think is helping and discover that it fails catastrophically, but not on our hardware. We need beta victims^H^H^H^H^Htesters to find these bugs, but once we get a dozen crashes, we don’t need anyone else to stub their toe for us to fix our problems.
So we thought about two possible ways to do this:
A two-tiered system. Early adopters could get an email and hand-update to the new beta before it is put out for auto-update notification.
Send out the beta update notifications over time, e.g. 10% of users get notified immediately, then another 40% if we don’t see crashes, then the last 50%. (This practice is actually industry standard on mobile apps.)
If you are reading this blog post, this far down, you are probably participating in the beta; I’d be curious what approach you’d find most useful.
X-Plane 11.50b4 is now available if you update via the Laminar Research installer. (Steam users: it’s on the servers and we’ll hit go in a few hours if we don’t hear reports of massive crashing and pain again.)
This update was focused on crash fixes and better triaging. We’ve been seeing a huge uptick in volume of bug reports and auto reported crashes since the initial 11.50 public beta release. We are trying to cut through the noise and provide better information in logs and in the remaining crash reports to fix issues faster, and let our support team (primarily me) get the inbox under control.
The best way to help us handle crashes on Windows and Linux is still to submit the auto report form. You can include your email if you want us to be able to find your specific crash, but we do not need the message field–the log and back trace will have pretty much all the info we need. If you send an auto report, please do not also send a bug report form email.
Mac users do not have the ability to auto report, so they should fill out the bug report form, and include the Apple crash report as well as the log.txt. This can be found under your username /Libraries/Logs/DiagnosticReports. The name will include the date & time of the crash and will end in .crash. You may need to show hidden folders to access it.
We were discussing a particularly exasperated sounding bug report on one of the internal Slack channels when I realized that this might not be obvious: a crash with the error message “pipeline must not be null” – it’s one error message that covers a whole category of bugs. We fixed one major case (skycolors were broken) in b1 and added one major case (custom billboard lights on aircraft) in b2 – conservation of pipeline bugs!
Null pipelines are a new category of crash in X-Plane 11.50, so here are a few notes on what this error is and what you can do to help us fix them (and what you don’t need to bother with).
What Is a Pipeline?
A pipeline is just the Vulkan and Metal term for a shader (plus some extra gak (1)) that we use to do our drawing.
X-Plane 11.41 would ask the OpenGL driver to build shaders as it needed them, and then the driver would turn those GL shaders into hardware pipelines on the fly as it got presented with different scenarios.
Not 11.50. We build everything up front. Vulkan has two rules:
Using a pipeline is fast.
Building a pipeline is not fast.
This is a great pair of rules for us – it means if we build our pipelines at load time, we are not going to have stutters mid-frame.
Why Are We Crashing?
There is one down-side to the 11.50 way of doing things: if we don’t build all of the pipelines we need up front during load, then when it comes time to draw, we’re toast. That’s what a “pipeline must not be null” error is – it just means the loading code did not create the pipeline the drawing code needs.
Why not just build every pipeline we could ever possibly need? Load time. X-Plane can build hundreds of thousands of pipelines depending on rendering settings, scenery packs, custom aircraft, etc. We actually did “just build everything” early in our development process and the sim could take half an hour to load.
So we try to build only the pipelines we need. If we build too many, we slow load, and if we build too low, you see this error.
What Do You Do When You See This Error?
On Windows and Linux, it’s really easy: close the alert box and when the auto crash report form comes up, please press “send”. Don’t bother with you email or a message; everything we need to kill this bug is already in the auto report! (Jennifer’s edit: please DO include your email address with any auto report if you want us to be able to confirm we have your specific report! This is the only way we have of identifying who it came from.)
The good news is: the auto crash reports for the pipeline crashes are insanely easy to find and fix.
Mac users: if you see one of these, we need the Apple crash report – please send it in a bug report.
(1) for the plugin developers that know some OpenGL: a pipeline is basically a GLprogram (shader) plus a bunch of the fixed function state that goes with it: blending, depth/stencil, vertex format, FBO format, and some rando stuff thrown in.
The idea is to have the pipeline contain so much information that there is no risk that the driver has to build two hardware shaders for one Vulkan shader (to cope with other fixed function state) no matter how weird the hardware is.
On lots of actual hardware, the pipeline has stuff that’s not actually in the shader, but some surprising things, like vertex format, actually often are.
X-Plane 11.50b3 is now available if you update via the Laminar Research installer. (Steam users: it’s on the servers and we’ll hit go in a few hours if we don’t hear reports of massive crashing and pain like we did last night.)
We waited on releasing beta 2 on Steam after we started hearing reports of new, unintended crashes, and we spent the last 24 hours coding and testing the fixes. The only new fixes in beta 3 are for crashing with Linux + Vulkan, and null pipeline crashes with third party aircraft.
Hopefully this update will be more stable and we can get back to our regularly scheduled programming of working on a wider range of fixes for beta 4 next week.
Updated 4/8/2020 8:25 PM: Beta 2 is…not our best work. It crashes on start on Linux and crashes on load for a wide variety of third party aircraft (but not LR ones). We are cutting a beta 3 with these two issues fixed; it should be live in the next twenty four hours. We are holding with Beta 1 on Steam until Beta 3 is available.
X-Plane 11.50 Beta 2 is now available. (Steam users: it’s on the servers and we’ll hit go in a few hours if we don’t hear reports of massive crashing and pain.)
We received a lot of bug reports from X-Plane 11.50 beta 1. This is good! I’d much rather have multiple reports of a bug than no reports. Every now and then someone tells us about something and we go “how long has this been going on” and they so “oh for a year now” and we’re, like “why didn’t you file a bug???” Don’t assume someone else will file it!
So with beta two, here’s what we need:
Read the release notes – Jennifer puts real effort into documenting everything that is fixed to save you time.
If your bug is listed as fixed and you still see it, please file a new bug. If you mention the bug number that we listed as fixed in your “it’s still broken” report, this is really helpful for us.
If your bug is not listed as fixed, please do not re-file it. If we didn’t say it was fixed in the release notes, we already know it is still broken, and a re-file of the bug just takes time away from other bug reports.
Beta 2 does not fix all bugs – it doesn’t even come close, so most bugs do not need to be refiled.
With that in mind, there are a few high profile bug fixes in beta 2:
The sky colors dialog box does not crash! We are actually astonished at how many people reported this – we didn’t think it was a heavily used feature, but … who knew.
VR – the right eye is fixed! It turns out this was broken twice; we have fixed both bugs.
Plugins: object drawing in OpenGL for legacy plugins turns out to have been massively borked; this could cause wrong drawing and crashes in all of the pilot clients, ground traffic, push back add-ons, etc. So a large swathe of popular add-ons should work better in OpenGL mode.
Older NVidia cards should now work and not have a black screen. This covers the 600, 700, 800, and some 900 NVidia cards.
Mac users who were getting “out of memory” – this should be a lot better now.
Users with multiple GPUs and SLI should be able to launch without disabling things.
Probably the most common and annoying bug report we get that is not fixed here is blurry textures. Basically if X-Plane thinks it is running out of VRAM, it will lower the resolution on textures where it is allowed to lower the resolution. We have seen cases of this code behaving very poorly and turning texture resolution all the way down.
First, just to state the obvious, this is a bug. You do not need more VRAM to run with Vulkan than OpenGL, we just need to fix the pager. If you have less than 8 GB of VRAM, do not panic.
I am not surprised that we have seen this bug – texture paging is very much about tuning our decisions to match real-world use, and we have shipped with something that works decently in our test cases and sometimes quite badly in real-world use cases that are very different from our test cases. So we will adapt the algorithm over time based on data we collect, and it will take a few betas to get better.