Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache
Moderator: GZDoom Developers
Re: Asynchronous Vulkan shader/pipeline generation & local shader cache
Yes that really is the crux of it, it's really unfortunate Nvidia decided go down whatever path it is they chose. After 2 years of it being this way it seems unlikely to change at this point, and I felt like I might as well put this out there and see if it interested anyone else. There's nothing to apologize for, I can't expect anyone to put in serious effort for something that doesn't affect them, that they aren't bothered by, or that they don't even find interesting.
Maybe next time I upgrade my GPU I'll go AMD and see if it fares any better. I often wonder exactly what Nvidia did to make Vulkan shader compilation become so much slower, and why they would choose to do it... The other thing I want to try whenever I find the time is to see how the Nvidia Linux driver performs in this regard.
Maybe next time I upgrade my GPU I'll go AMD and see if it fares any better. I often wonder exactly what Nvidia did to make Vulkan shader compilation become so much slower, and why they would choose to do it... The other thing I want to try whenever I find the time is to see how the Nvidia Linux driver performs in this regard.
Re: Asynchronous Vulkan shader/pipeline generation & local shader cache
AMD has their issues, too, and this is actually particularly problematic on the driver end - at least for Windows.
I can't tell you one way or another if changing to an AMD is going to eliminate your micro-stutters but I can tell you for sure that AMD drivers definitely have issues when presenting both OpenGL and Vulkan.
However, if you plan to migrate to Linux, AMD is actually an excellent choice; the chipsets are well supported with open source drivers and this is where AMD truly thrives.
I can't tell you one way or another if changing to an AMD is going to eliminate your micro-stutters but I can tell you for sure that AMD drivers definitely have issues when presenting both OpenGL and Vulkan.
However, if you plan to migrate to Linux, AMD is actually an excellent choice; the chipsets are well supported with open source drivers and this is where AMD truly thrives.
- Graf Zahl
- Lead GZDoom+Raze Developer
- Posts: 49230
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: Asynchronous Vulkan shader/pipeline generation & local shader cache
My main problem with this entire thing is that the drivers you claim to be affected are up to two years old - and nobody else ever reported these issues.
If this was a commonly experienced problem reported by many people, yes there might be some pressure to act - but as it stands, you seem to be alone here - and that's hardly enough to do more than adding an internal pipeline cache as dpJudas did.
If this was a commonly experienced problem reported by many people, yes there might be some pressure to act - but as it stands, you seem to be alone here - and that's hardly enough to do more than adding an internal pipeline cache as dpJudas did.
Re: Asynchronous Vulkan shader/pipeline generation & local shader cache
True, it could be a bug that only manifests on certain configs, like perhaps it only affects 20 series cards or particular hardware combination. It's easy enough for anyone to test though:
1) Make sure you are on Nvidia driver 456.38 or newer
2) Clean your shader cache with the script I posted earlier
3) Run a frame time graph while playing
4) Make sure to run at a capped frame rate either via vid_maxfps or vsync enforced
5) Do a run through E1M1 with the Vulkan renderer, preferably with some mods loaded to increase shader count, something like Nashgore should be enough (even without mods some spikes will occur, this just increases the frequency to make it more apparent)
Now if you measure any frame time spikes during the run at all (graph doesn't stay flat during gameplay), quit and restart and do another run through the same level and those spikes should no longer happen because the shaders have been cached. Clean the shader cache again and you can force them to return at will. Reference the videos in my first post and watch the frame time graph to see what a flawless run looks like compared to a flawed run; the flawed run has a couple spikes right at the first encounter, and at least one more in the slime pit when an Imp throws a fireball off screen.
Now if you go back to a driver older than 456.38 (I favor 446.14), you should be able do as many runs as you like, cleaning the cache before every run, and never see a single spike.
Unfortunately I think you need a 20 series or older to use drivers that old... But even without being able to prove to yourself that older drivers work flawlessly, you should be able to prove current drivers have an issue by following this method, if the issue isn't specific to my particular config.
Anyone on 30 series or newer that wants to perform this test for me would be most appreciated, because if you can perform this test on 30 series and up with an empty shader cache and not get spikes, I will lean towards this being a case of Nvidia changing something to optimize for newer cards and not caring about how it affects older cards, aka forced obsolescence. If no one can measure spikes with this method even on older hardware, I will assume my system is just cursed and look into building a new one ASAP.
1) Make sure you are on Nvidia driver 456.38 or newer
2) Clean your shader cache with the script I posted earlier
3) Run a frame time graph while playing
4) Make sure to run at a capped frame rate either via vid_maxfps or vsync enforced
5) Do a run through E1M1 with the Vulkan renderer, preferably with some mods loaded to increase shader count, something like Nashgore should be enough (even without mods some spikes will occur, this just increases the frequency to make it more apparent)
Now if you measure any frame time spikes during the run at all (graph doesn't stay flat during gameplay), quit and restart and do another run through the same level and those spikes should no longer happen because the shaders have been cached. Clean the shader cache again and you can force them to return at will. Reference the videos in my first post and watch the frame time graph to see what a flawless run looks like compared to a flawed run; the flawed run has a couple spikes right at the first encounter, and at least one more in the slime pit when an Imp throws a fireball off screen.
Now if you go back to a driver older than 456.38 (I favor 446.14), you should be able do as many runs as you like, cleaning the cache before every run, and never see a single spike.
Unfortunately I think you need a 20 series or older to use drivers that old... But even without being able to prove to yourself that older drivers work flawlessly, you should be able to prove current drivers have an issue by following this method, if the issue isn't specific to my particular config.
Anyone on 30 series or newer that wants to perform this test for me would be most appreciated, because if you can perform this test on 30 series and up with an empty shader cache and not get spikes, I will lean towards this being a case of Nvidia changing something to optimize for newer cards and not caring about how it affects older cards, aka forced obsolescence. If no one can measure spikes with this method even on older hardware, I will assume my system is just cursed and look into building a new one ASAP.
Last edited by 0mnicydle on Sat Jan 07, 2023 8:14 pm, edited 1 time in total.
Re: Asynchronous Vulkan shader/pipeline generation & local shader cache
So I realize no one really cares about this right now but me (and that’s ok), but I just wanted to share this because I found it potentially relevant and very interesting:
Ubershaders: A Ridiculous Solution to an Impossible Problem
Wayback mirror
For all I know, there’s no way something like this is applicable here, but it got me thinking… Would it be possible (in theory) to create a giant dynamic/self-configuring generalized pipeline shader at launch (or several that can run in parallel if need be), that can interpret and execute raw shader code, until such frame as a compiled specialized shader for the relevant effect can be located? The idea is while horribly inefficient compared to a specialized shader, it’s still faster than the compilation process itself. This “ubershader” only has to do the heavy lifting until a specialized pipeline has been compiled, with the initial performance hit not being enough to prevent the user’s target frame time from being achieved (within reason).
I’m very curious if there’s a path here that could be more widely adopted to address the larger issues of shader compilation stutter in modern PC gaming in general, or if there’s something specific to the emulator use case that makes the concept a non-starter elsewhere. I know Steam has their whole crowd sourced shader pre-caching initiative going on where they feed you shaders from other users based on your hw config, but idk, not the biggest fan of that as a “proper” solution really… I guess because it’s kind of like if it helps you that’s great and everything, but it came at the expense of someone else’s experience, and if you want to be an early adopter well sucks to be you.
EDIT/ And to be clear, this isn't really a request for such a feature in GZDOOM, as it sounds like a monumental task even if it's applicable. I was posting this more as an intellectual curiosity that happens to be on topic. The amount of shader stutter in GZDOOM is absolutely miniscule compared to most modern games, but the larger issue is something that bothers me a great deal. I spent decades playing PC games before the advent of shader compilation stutter, and over the years I have always been able to achieve great success in eliminating all forms of stutter, but this new "unavoidable" fact of modern PC gaming is very distressing to me and I really hope there's a future where PC games can go back to being smooth as long as you put together a powerful enough system.
Ubershaders: A Ridiculous Solution to an Impossible Problem
Wayback mirror
For all I know, there’s no way something like this is applicable here, but it got me thinking… Would it be possible (in theory) to create a giant dynamic/self-configuring generalized pipeline shader at launch (or several that can run in parallel if need be), that can interpret and execute raw shader code, until such frame as a compiled specialized shader for the relevant effect can be located? The idea is while horribly inefficient compared to a specialized shader, it’s still faster than the compilation process itself. This “ubershader” only has to do the heavy lifting until a specialized pipeline has been compiled, with the initial performance hit not being enough to prevent the user’s target frame time from being achieved (within reason).
I’m very curious if there’s a path here that could be more widely adopted to address the larger issues of shader compilation stutter in modern PC gaming in general, or if there’s something specific to the emulator use case that makes the concept a non-starter elsewhere. I know Steam has their whole crowd sourced shader pre-caching initiative going on where they feed you shaders from other users based on your hw config, but idk, not the biggest fan of that as a “proper” solution really… I guess because it’s kind of like if it helps you that’s great and everything, but it came at the expense of someone else’s experience, and if you want to be an early adopter well sucks to be you.
EDIT/ And to be clear, this isn't really a request for such a feature in GZDOOM, as it sounds like a monumental task even if it's applicable. I was posting this more as an intellectual curiosity that happens to be on topic. The amount of shader stutter in GZDOOM is absolutely miniscule compared to most modern games, but the larger issue is something that bothers me a great deal. I spent decades playing PC games before the advent of shader compilation stutter, and over the years I have always been able to achieve great success in eliminating all forms of stutter, but this new "unavoidable" fact of modern PC gaming is very distressing to me and I really hope there's a future where PC games can go back to being smooth as long as you put together a powerful enough system.
- Graf Zahl
- Lead GZDoom+Raze Developer
- Posts: 49230
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: Asynchronous Vulkan shader/pipeline generation & local shader cache
We already use such a shader - but in Vulkan there's lot of state tied to the pipeline so you stil have to create a ridiculous amount of them - and I am fairly certain that NVidia's hardware does not need this - but apparently still compiles multiple iterations of the same shader.
Re: Asynchronous Vulkan shader/pipeline generation & local shader cache
Very interesting, thank you for the insight. So I guess GZDOOM is already doing everything it can in this regard, short of implementing workarounds (the first post requests) for a poor driver implementation on Nvidia's part. Some of the notes about the Nvidia driver difficulties in that ubershader article were very upsetting... I'm guessing the stutter as it exists in GZDOOM right now is something akin to remaining stutter Dolphin was unable to eliminate on the Nvidia side. It must be frustrating to try and develop software that has to interact with what seems to be in many cases a black box like that (one that appears in many cases to be performing less than ideally on top of it).
At this point migrating to Linux and using open source drivers is becoming more and more attractive. I'd much rather sacrifice whatever performance benefits may exist in the closed source drivers, in exchange for sensible, solid and dependable implementations that are auditable.
At this point migrating to Linux and using open source drivers is becoming more and more attractive. I'd much rather sacrifice whatever performance benefits may exist in the closed source drivers, in exchange for sensible, solid and dependable implementations that are auditable.
Re: Asynchronous Vulkan shader/pipeline generation & local shader cache
So I was playing around with one of the heaviest mods I can think of, BoA 3.1, and while testing on driver ver 446.14 I discovered that some of the shaders generated in this mod (thought not all, or even most), are heavy enough to cause a stutter in Vulkan even on my golden driver, though things still seem nearly flawless in OpenGL. I somehow strangely feel a little better knowing Vulkan wasn't ever performing as well as OpenGL, even if it was performing significantly better than it is now.
I just want to share a couple of examples though back on 527.56 in BoA. I have all assets in this mod precached so all you are seeing here is stutter related to an empty shader cache (empty caches in both cases). Once again, pay attention to the frame time graph, as my recorded video is never going to sync perfectly to your particular display:
OpenGL
Vulkan
These stutters really affect me viscerally, it's basically like a jump scare--I hate it so much. It would be one thing if it was like, 'oh, just run through that opening level once per driver install and everything will be fine until you update drivers again,' no... at any moment in any level there could be another one of these "jump scares" when I least expect it. I really do hope at some point someone will look into adding async shader compile and/or a transferable shader cache. As it stands now it's basically like you have play through the whole mod/game once with stutters so you can play it a second time without, and be prepared to start the whole process over every time you update drivers. And in this particular mod Vulkan is basically a requirement to have a decent frame rate too because the level complexity is just insane at times, which really stinks because I can't do what I normally do and just continue to use OpenGL. Interestingly you will see one tiny little blip towards the beginning even in the OpenGL run (just goes to show you how hard this mod goes), but that level of spike is nowhere near enough to upset me...
I just want to share a couple of examples though back on 527.56 in BoA. I have all assets in this mod precached so all you are seeing here is stutter related to an empty shader cache (empty caches in both cases). Once again, pay attention to the frame time graph, as my recorded video is never going to sync perfectly to your particular display:
OpenGL
Vulkan
These stutters really affect me viscerally, it's basically like a jump scare--I hate it so much. It would be one thing if it was like, 'oh, just run through that opening level once per driver install and everything will be fine until you update drivers again,' no... at any moment in any level there could be another one of these "jump scares" when I least expect it. I really do hope at some point someone will look into adding async shader compile and/or a transferable shader cache. As it stands now it's basically like you have play through the whole mod/game once with stutters so you can play it a second time without, and be prepared to start the whole process over every time you update drivers. And in this particular mod Vulkan is basically a requirement to have a decent frame rate too because the level complexity is just insane at times, which really stinks because I can't do what I normally do and just continue to use OpenGL. Interestingly you will see one tiny little blip towards the beginning even in the OpenGL run (just goes to show you how hard this mod goes), but that level of spike is nowhere near enough to upset me...
Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache
It would appear there is actually one more vulkan extension available that can help reduce this problem: VK_EXT_graphics_pipeline_library. The issue complained about in this thread matches this blog post perfectly.
Note that even if this was implemented it would never make the stutters go completely away - it would only reduce it to DX11/OpenGL levels.
Note that even if this was implemented it would never make the stutters go completely away - it would only reduce it to DX11/OpenGL levels.
Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache
Um, awesome?? Yes please?? That would be amazing, that's all I wanted honestly!
Like you can see in that OpenGL BoA vid I posted, there were a couple spikes about one pixel high in the graph... first of all there was only like 2, compared to several in Vulkan, and those miniscule spikes in OpenGL would never actually bother me, that is literally within the margin of error for the built-in GZDOOM frame limiter in my experience (in my vids RivaTuner is acting as the frame limiter during the video capture process).
Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache
Upon further investigation I'm not sure if it worth the effort to support this right now as only Nvidia implemented VK_EXT_graphics_pipeline_library in their drivers. Sorry to get your hopes up, but maybe in the future if the other vendors start supporting it.
- Graf Zahl
- Lead GZDoom+Raze Developer
- Posts: 49230
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache
dpJudas wrote: ↑Mon Dec 19, 2022 1:34 am Upon further investigation I'm not sure if it worth the effort to support this right now as only Nvidia implemented VK_EXT_graphics_pipeline_library in their drivers. Sorry to get your hopes up, but maybe in the future if the other vendors start supporting it.
I think long term we will have no choice but to implement support for some extensions on an optional basis, especially when they provide significant benefits.
I also had a look at VK_EXT_extended_dynamic_state, VK_EXT_extended_dynamic_state2. and VK_EXT_extended_dynamic_state3. The first one in particular seems to have a few settings that are quite frequently changed in GZDoom so maybe implementing these is more of a viable option. Considering what they offer they should be a lot easier to implement because their settings often map directly to single features in FRenderState.
If we cater the entire Vulkan backend to the lowest common denominator to have one single one-size-fits-all render path and in turn cannot get the best performance out of supporting hardware it won't really do much good and only force us to prolong OpenGL support. Long term all these features will become core anyway, because it looks like they realized their mistake by now with their overly rigid pipeline design. Keep in mind: Right now only NVidia may support this, but sufficiently modern NVidia hardware makes up more than half of our user base!
In the past I had jumped onto more recent features quite quickly - for example I implemented proper shader support the moment I got a graphics card capable of handling it, but of course left the old render path active. Why can't we do the same here?
Also, if all this doesn't help, can't we do a dry run of the backend rendering a single scene with the most frequent permutations so that in game there's less hitching?
Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache
I agree that we should not restrict us to Vulkan 1.0, but I think the challenge is to still keep the code manageable. For VK_EXT_graphics_pipeline_library my worry is that this could be one of those extensions nobody else chose to adopt.
We can create the most frequent permutations up front - the catch is knowing which which that might be. Essentially the total possible permutations we can have are described by the VkPipelineKey struct. The thing about this struct is that we actually only ever use a small subset of the total possibilities (aside from RenderStyle), but the backend doesn't know which those might be.
For example, if we have about 5-10 variants used by portal code, those could be pre-created. Of the remaining pipelines i think we only really have 3 states: RenderStyle (blend mode), SpecialEffect/EffectState/AlphaTest (which shader is active), depth bias (decal or not). For each shader type we could pre-create the standard ERenderStyle types, so that's a total of something like 100 pipelines.
In short: it isn't so much that it couldn't be done. The issue is doing it in a way that is good for the code - forcing a bunch of assumptions in the backend doesn't seem like a good way to do this.
We can create the most frequent permutations up front - the catch is knowing which which that might be. Essentially the total possible permutations we can have are described by the VkPipelineKey struct. The thing about this struct is that we actually only ever use a small subset of the total possibilities (aside from RenderStyle), but the backend doesn't know which those might be.
For example, if we have about 5-10 variants used by portal code, those could be pre-created. Of the remaining pipelines i think we only really have 3 states: RenderStyle (blend mode), SpecialEffect/EffectState/AlphaTest (which shader is active), depth bias (decal or not). For each shader type we could pre-create the standard ERenderStyle types, so that's a total of something like 100 pipelines.
In short: it isn't so much that it couldn't be done. The issue is doing it in a way that is good for the code - forcing a bunch of assumptions in the backend doesn't seem like a good way to do this.
- Graf Zahl
- Lead GZDoom+Raze Developer
- Posts: 49230
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache
I mostly agree, that's why I pointed to the 3 dynamic_state extensions. At least the first one seems to be core in 1.3.
What confuses me is that blend modes still seem to be hard tied to the pipeline object, these are often what causes the most permutations, but even so what they offer can reduce pipeline creation quite a bit.
As for VK_EXT_graphics_pipeline_library , look at the list of contributors. It's not just NVidia, but also AMD, ARM and many game developers. It is clear this is a high-in-demand feature that will get more widespread support because the relevant people need it. Of course the docs cannot give an estimate how hard it will be to implement.
What confuses me is that blend modes still seem to be hard tied to the pipeline object, these are often what causes the most permutations, but even so what they offer can reduce pipeline creation quite a bit.
As for VK_EXT_graphics_pipeline_library , look at the list of contributors. It's not just NVidia, but also AMD, ARM and many game developers. It is clear this is a high-in-demand feature that will get more widespread support because the relevant people need it. Of course the docs cannot give an estimate how hard it will be to implement.
Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache
Sorry, forgot to reply on this on specifically (why not use VK_EXT_extended_dynamic_state). In this particular case that would mean VKRenderState would have to issue different commands depending on what render path is active. Doable, sure, but also what exactly are the performance implications of using the extended dynamic state?Graf Zahl wrote:I mostly agree, that's why I pointed to the 3 dynamic_state extensions. At least the first one seems to be core in 1.3
Part of the rationale behind creating these huge pipeline state objects in vulkan was that the old model for DX11/OpenGL left too much performance on the floor. It got so bad the drivers begun to create worker threads building pipelines behind the back of the application and a bunch of heuristics deciding when it was worth it and when it wasn't. The more the pipeline is set into "DX11 mode", the further you go back to that world. One really wonderful thing about Vulkan is how changing the pipeline is super fast, easy and predictable compared to OpenGL. I'd personally therefore much prefer an alternative approach, if possible. Especially since most of the state there isn't something GZDoom really does a lot of it - most of it is used for the portal stuff in a limited set of fixed configurations.
That's also one of the reasons I was surprised that Nvidia was the only vendor with a driver supporting the extensions (added back in the spring). Maybe the others are just being slow, I don't know.Graf Zahl wrote:As for VK_EXT_graphics_pipeline_library, look at the list of contributors. It's not just NVidia, but also AMD, ARM and many game developers. It is clear this is a high-in-demand feature that will get more widespread support because the relevant people need it. Of course the docs cannot give an estimate how hard it will be to implement.
