A week or two ago we had a very dead beta, and posed the question of how to incrementally test betas in the future. We got a variety of responses, ranging from “private test it first” to “roll it out in a wave” to “full speed ahead, we know betas are bumpy.”
Since then, we’ve been doing one of the easiest and probably most useful things we can: posting the betas early to third-party developers who are in our developer Slack channel.
Beta 7/8 had a ton of changes, and our third-party developers found multiple problems, some of which we wouldn’t see in our internal tests. So we held off on releasing betas 7 and 8 to the public while we fixed those issues. Until today.
X-Plane 11.50 has been similar to X-Plane 11.20 (our VR) release and different from what we normally try to do, in that when we went beta (both private and public), the work for Vulkan wasn’t done yet. We had something that you could fly with, that was delightful for some users (and unstable for others), but we also had a big list of things we still needed to do.
X-Plane 11.50b7 has been recalled before it even made it fully out the door.
We had a ton of changes in this one–at one point I pulled over 100 Git commits on our release branch. Ben and Sidney also knocked nearly all items off their features-to-do list.
But thank goodness we asked our third party developers to kick the tires early on this one. They found a beta stopping bug in about 30 minutes! In our attempts to fix some performance issues, we caused the aircraft to be blurry in almost all cases, and we knew that was not an acceptable regression bug for a flight sim.
So sit tight for beta 8 to come soon, and don’t panic when your version numbers skip b7.
In 11.50b6 we added a command line argument to run Aftermath, a debugging utility, hoping it will give us more insight into device loss errors.
A “device loss” error is specifically the crash that accompanies the on-screen (or log.txt) error message “Encountered Vulkan device loss error!” Using Aftermath will not help us investigate VRAM issues–that is a different issue entirely.
If you are on Windows, have an NVidia GPU and you see a device loss error followed by a crash, you can help us track these bugs down by running X-Plane with Aftermath enabled. We know from 11.50b5 that many devices are not compatible with Aftermath, so if you crash and burn immediately, you can go back to using beta 6 without the extra command line option.
We will be using the command line via Command Prompt. (Here are instructions on getting started with this if needed.)
Launch X-Plane from the command line with the following flag:
You can then try to reproduce the steps that caused the initial device loss, or just fly as usual. If device loss happens again, the auto crash report form should come up again. Please fill out your email and submit the auto report to us for investigation.
Well, that was something. I had a very nice post written up last week on the state of beta. We had spent a week very carefully trying to improve stability and then…beta 5 exploded on the launch pad.
So…let’s try this again. But before we get into beta 6, a few graphs:
That’s a graph of auto-reported crashes over time – the big spike up is April 2nd when 11.50 came out. The gap in the timeline at the end is when our crash reporter temporarily was shut off for exceeding quota! From this I can take derive two take-away points:
A lot of people are really excited to try the 11.50 beta even though it’s early and unstable and
The 11.50 beta crashes a lot.
The silver lining is that the crashes we have been collecting are very very informative so it’s been a really great data stream.
Here’s one more graph:
That’s bug reports and they’re up something like 1000% – we have received close to 1800 reports since then. Of these reported bugs, over 500 are in the category of “it crashed” or some other similarly catastrophic, bad thing happened.
So with those graphs in mind, let’s talk about where we are at with the beta.
I was going to write a post about X-Plane 11.50 beta 5 – what’s new in it, the new ways we are debugging GPU crashes, the crash bugs we’ve fixed, etc. A lot of stuff that we thought was pretty good went into beta 5. Cool new technology! Big bug fixes! Lots of winning!
As it turns out, beta 5 is dead. I hit “go” on the release this afternoon, and half an hour ago, I hit “stop.” The auto crash reporter was showing way too many new crashes in memory management that we had not seen before, and this strongly implies a new and serious bug.
Laminar Installer Users: if you were auto-notified to update to beta five and did so, and you are not crashing, you can keep flying! If your beta five is just a smoldering wreckage of crumpled VRAM and GPU parts, you can re-run the installer with “get betas” option checked, and it will take you back to beta 4.
If you were not auto-notified to update to beta five, that’s probably for the best. Please stand by and keep flying beta four; we’ll post a new beta when we’ve gotten to the bottom of this. We have enough captured crash data to investigate.
Steam Users: we did not release the beta five build to Steam and this is probably a good thing; we’ll try again with a new release that isn’t made of plutonium and unicorn hallucinations.
And if you’re going “why didn’t y’all test it before you released it”…we did! None of our machines show these crashes. But we also have probably a dozen PCs total we can run on. Moving to a new driver stack has meant learning about the weird things that happen on your computers and not ours.
Do We Need a Two Tiered Beta System
This came up in our impromptu beta five post-mortem meeting: do we need to bring people into new betas in stages? With code for new drivers, beta five probably won’t be the last beta where we code something we think is helping and discover that it fails catastrophically, but not on our hardware. We need beta victims^H^H^H^H^Htesters to find these bugs, but once we get a dozen crashes, we don’t need anyone else to stub their toe for us to fix our problems.
So we thought about two possible ways to do this:
A two-tiered system. Early adopters could get an email and hand-update to the new beta before it is put out for auto-update notification.
Send out the beta update notifications over time, e.g. 10% of users get notified immediately, then another 40% if we don’t see crashes, then the last 50%. (This practice is actually industry standard on mobile apps.)
If you are reading this blog post, this far down, you are probably participating in the beta; I’d be curious what approach you’d find most useful.
X-Plane 11.50b4 is now available if you update via the Laminar Research installer. (Steam users: it’s on the servers and we’ll hit go in a few hours if we don’t hear reports of massive crashing and pain again.)
This update was focused on crash fixes and better triaging. We’ve been seeing a huge uptick in volume of bug reports and auto reported crashes since the initial 11.50 public beta release. We are trying to cut through the noise and provide better information in logs and in the remaining crash reports to fix issues faster, and let our support team (primarily me) get the inbox under control.
The best way to help us handle crashes on Windows and Linux is still to submit the auto report form. You can include your email if you want us to be able to find your specific crash, but we do not need the message field–the log and back trace will have pretty much all the info we need. If you send an auto report, please do not also send a bug report form email.
Mac users do not have the ability to auto report, so they should fill out the bug report form, and include the Apple crash report as well as the log.txt. This can be found under your username /Libraries/Logs/DiagnosticReports. The name will include the date & time of the crash and will end in .crash. You may need to show hidden folders to access it.
We were discussing a particularly exasperated sounding bug report on one of the internal Slack channels when I realized that this might not be obvious: a crash with the error message “pipeline must not be null” – it’s one error message that covers a whole category of bugs. We fixed one major case (skycolors were broken) in b1 and added one major case (custom billboard lights on aircraft) in b2 – conservation of pipeline bugs!
Null pipelines are a new category of crash in X-Plane 11.50, so here are a few notes on what this error is and what you can do to help us fix them (and what you don’t need to bother with).
What Is a Pipeline?
A pipeline is just the Vulkan and Metal term for a shader (plus some extra gak (1)) that we use to do our drawing.
X-Plane 11.41 would ask the OpenGL driver to build shaders as it needed them, and then the driver would turn those GL shaders into hardware pipelines on the fly as it got presented with different scenarios.
Not 11.50. We build everything up front. Vulkan has two rules:
Using a pipeline is fast.
Building a pipeline is not fast.
This is a great pair of rules for us – it means if we build our pipelines at load time, we are not going to have stutters mid-frame.
Why Are We Crashing?
There is one down-side to the 11.50 way of doing things: if we don’t build all of the pipelines we need up front during load, then when it comes time to draw, we’re toast. That’s what a “pipeline must not be null” error is – it just means the loading code did not create the pipeline the drawing code needs.
Why not just build every pipeline we could ever possibly need? Load time. X-Plane can build hundreds of thousands of pipelines depending on rendering settings, scenery packs, custom aircraft, etc. We actually did “just build everything” early in our development process and the sim could take half an hour to load.
So we try to build only the pipelines we need. If we build too many, we slow load, and if we build too low, you see this error.
What Do You Do When You See This Error?
On Windows and Linux, it’s really easy: close the alert box and when the auto crash report form comes up, please press “send”. Don’t bother with you email or a message; everything we need to kill this bug is already in the auto report! (Jennifer’s edit: please DO include your email address with any auto report if you want us to be able to confirm we have your specific report! This is the only way we have of identifying who it came from.)
The good news is: the auto crash reports for the pipeline crashes are insanely easy to find and fix.
Mac users: if you see one of these, we need the Apple crash report – please send it in a bug report.
(1) for the plugin developers that know some OpenGL: a pipeline is basically a GLprogram (shader) plus a bunch of the fixed function state that goes with it: blending, depth/stencil, vertex format, FBO format, and some rando stuff thrown in.
The idea is to have the pipeline contain so much information that there is no risk that the driver has to build two hardware shaders for one Vulkan shader (to cope with other fixed function state) no matter how weird the hardware is.
On lots of actual hardware, the pipeline has stuff that’s not actually in the shader, but some surprising things, like vertex format, actually often are.
X-Plane 11.50b3 is now available if you update via the Laminar Research installer. (Steam users: it’s on the servers and we’ll hit go in a few hours if we don’t hear reports of massive crashing and pain like we did last night.)
We waited on releasing beta 2 on Steam after we started hearing reports of new, unintended crashes, and we spent the last 24 hours coding and testing the fixes. The only new fixes in beta 3 are for crashing with Linux + Vulkan, and null pipeline crashes with third party aircraft.
Hopefully this update will be more stable and we can get back to our regularly scheduled programming of working on a wider range of fixes for beta 4 next week.
Updated 4/8/2020 8:25 PM: Beta 2 is…not our best work. It crashes on start on Linux and crashes on load for a wide variety of third party aircraft (but not LR ones). We are cutting a beta 3 with these two issues fixed; it should be live in the next twenty four hours. We are holding with Beta 1 on Steam until Beta 3 is available.
X-Plane 11.50 Beta 2 is now available. (Steam users: it’s on the servers and we’ll hit go in a few hours if we don’t hear reports of massive crashing and pain.)
We received a lot of bug reports from X-Plane 11.50 beta 1. This is good! I’d much rather have multiple reports of a bug than no reports. Every now and then someone tells us about something and we go “how long has this been going on” and they so “oh for a year now” and we’re, like “why didn’t you file a bug???” Don’t assume someone else will file it!
So with beta two, here’s what we need:
Read the release notes – Jennifer puts real effort into documenting everything that is fixed to save you time.
If your bug is listed as fixed and you still see it, please file a new bug. If you mention the bug number that we listed as fixed in your “it’s still broken” report, this is really helpful for us.
If your bug is not listed as fixed, please do not re-file it. If we didn’t say it was fixed in the release notes, we already know it is still broken, and a re-file of the bug just takes time away from other bug reports.
Beta 2 does not fix all bugs – it doesn’t even come close, so most bugs do not need to be refiled.
With that in mind, there are a few high profile bug fixes in beta 2:
The sky colors dialog box does not crash! We are actually astonished at how many people reported this – we didn’t think it was a heavily used feature, but … who knew.
VR – the right eye is fixed! It turns out this was broken twice; we have fixed both bugs.
Plugins: object drawing in OpenGL for legacy plugins turns out to have been massively borked; this could cause wrong drawing and crashes in all of the pilot clients, ground traffic, push back add-ons, etc. So a large swathe of popular add-ons should work better in OpenGL mode.
Older NVidia cards should now work and not have a black screen. This covers the 600, 700, 800, and some 900 NVidia cards.
Mac users who were getting “out of memory” – this should be a lot better now.
Users with multiple GPUs and SLI should be able to launch without disabling things.
Probably the most common and annoying bug report we get that is not fixed here is blurry textures. Basically if X-Plane thinks it is running out of VRAM, it will lower the resolution on textures where it is allowed to lower the resolution. We have seen cases of this code behaving very poorly and turning texture resolution all the way down.
First, just to state the obvious, this is a bug. You do not need more VRAM to run with Vulkan than OpenGL, we just need to fix the pager. If you have less than 8 GB of VRAM, do not panic.
I am not surprised that we have seen this bug – texture paging is very much about tuning our decisions to match real-world use, and we have shipped with something that works decently in our test cases and sometimes quite badly in real-world use cases that are very different from our test cases. So we will adapt the algorithm over time based on data we collect, and it will take a few betas to get better.
X-Plane 11.50 has been out for a little bit more than 24 hours, and things have been a little bit nuts. Here are a few quick notes, in no particular order.
Bug Fixes and Work Arounds
While I don’t have work-arounds for the missing right eye in VR or older NVidia cards that won’t run in Vulkan, the good news is that we have fixes for these already. We are going to start testing beta two on Monday and try to get the fixes we have out as soon as possible. While we don’t have every major reported bug fixed, beta two should make a real difference.
Users who can’t start and have SLI setups: disable SLI in the Nvidia control panel and you will be able to run Vulkan. We are still investigating this – our goal is a bug fix so you don’t need to turn SLI off. (We do not expect X-Plane to leverage both cards – we expect it to run without failing.)
Finally, one thing I should have mentioned in the announcement: if you have scripts that modify art controls, please remove them, and don’t put them back.
The art controls are undocumented and subject to change, and they have changed a lot since X-Plane 11.41. Realistically the authors of these tweak scripts need to go back and re-evaluate every one of their tweaks in 11.50 to see if they actually help or are actually making things worse.
This is not like plugins or scenery; we tell you to get a second installation for the beta not because we want you to run add-on free but rather so that if the beta fails you are not locked out of X-Plane. We expect add-ons to work and we are taking plugin, scenery and aircraft bugs seriously.
By comparison, the art controls are “do what you want, but you void the warranty if you mess with this.” If you are running scripts that hack the art controls, we cannot tell the difference between real bugs in the early betas and art controls screwing things up.
The Road Map For Betas
Looking over the bug reports we have received, I think we are going to take on the 11.50 bugs in three phases:
Stability and compatibility. We’ll start by making sure that we run Vulkan and Metal on every platform that should be able to run them, with add-ons just working in the cases where we expect them to. We’ll start by focusing on fixing crashes, black screens, device lost, unstable plugins, etc.
VRAM use. We’ve received a number of reports that make it sound like VRAM management is not working properly. Once we can run, we’ll dig into blurry textures, running out of VRAM, etc. Sidney has built some great tools to get a good picture of how VRAM is being managed. VRAM management is one of the newest and most complex parts of 11.50 so it isn’t surprising that we’ve seen things that look buggy.
Performance. Once we are running where we should and using VRAM that we should, we can look at the cases where users are not seeing performance benefits from Metal and Vulkan, as well as remaining stutters. Once again, Sidney has built some fantastic tools that should help us dig into this quite efficiently.
This is the only order that we can reasonably approach the bugs. If the app won’t run on all qualifying hardware, we can’t test our VRAM use everywhere, and if our VRAM use isn’t correct, it can bias performance testing.