Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

Post a reply

Smilies
:D :) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :wink: :geek: :ugeek: :!: :?: :idea: :arrow: :| :mrgreen: :3: :wub: >:( :blergh:
View more smilies

BBCode is ON
[img] is OFF
[url] is ON
Smilies are ON

Topic review
   

Expand view Topic review: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

by Chris » Mon Dec 19, 2022 1:41 pm

dpJudas wrote: Mon Dec 19, 2022 3:10 am That's also one of the reasons I was surprised that Nvidia was the only vendor with a driver supporting the extensions (added back in the spring). Maybe the others are just being slow, I don't know. :)
FWIW, it is available on my RX 580 on Linux, using the open source drivers and MESA 22.3:

Code: Select all

$ vulkaninfo | grep library
        VK_KHR_pipeline_library                     : extension revision 1
        VK_EXT_graphics_pipeline_library             : extension revision 1
        VK_KHR_pipeline_library                      : extension revision 1

Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

by 0mnicydle » Mon Dec 19, 2022 8:57 am

Whatever path ends up being taken in the future, I just want to say thank you for turning your attention to this issue. I'm just happy it's "on the radar" now.

Also I just want to say that if there's a mitigation, but it would be a ton of work and it still won't address permutations, it might not be worth the effort. If I knew that all I had to do was get through that initial flurry of stutters on the first level of something, I might not have ever complained about this. But in BoA for instance, it seems every level has at least one more shader related stutter at some point, and this is the issue I find most concerning. I verify this by immediately quitting, clearing standby memory, and relaunching the game to see if the stutter reappears. If it doesn't, I feel I can safely say it's shader related as if it was an asset or entity related stutter it would have come back, especially after clearing standby memory. And sometimes I even go the extra mile and clear the shader cache and see if that particular stutter returns in the same exact spot, and every time I have gone this extra length it has proven true. I don't know, but I strongly suspect these ongoing stutters are permutation related just from paying attention to what's in the scene when it happens; I think it's often related to transparent things, like every transparent sprite/texture needs its own permutation or something (or maybe one for every new alpha value)? Just wild speculation on my part here, sorry...

Now also just let me say that last night I came across something interesting... apparently there is a contingent of people using DXVK (on Windows) in those DX games that didn't go through "heroics" to eliminate shader related stutter, because it uses async shader compile. So I'm wondering, could async shader compile be applied selectively, or is it all or nothing? Just wondering if you could use async on all the low hanging fruit to avoid the significant time investment that was among the earliest objections to the idea, and if that would be able to cover these permutations? I feel like of all things, a transparency effect not being present for a frame or two would be among the least noticeable things, but then again maybe in reality what would actually happen is the the object just wouldn't be visible at all? I don't know... All I know is for myself, I would much prefer "pop-in" to stutter. When Rage first released and it was capped at 60fps and had texture pop-in as a result of strictly budgeted streaming operations, I remember much weeping and gnashing of teeth about it, but I thought it was absolutely glorious compared to the non-stop stutter fest of the typical Unreal engine game at the time (and sadly mostly to this day).

Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

by Graf Zahl » Mon Dec 19, 2022 4:42 am

Yeah, I noticed the same. Which makes it all the more annoying that this has never been made a shader feature, because custom blend modes would be cool.
This most likely means that at least one manufacturer still has some dependencies in there - and my guess would be AMD again.

I still think we haven't seen the last of it here. The idea behind PSOs was certainly valid but the way it was implemented turned out not to be what game designers need.
I actually wouldn't be surprised if the end result is the return of much of the API flexibility that OpenGL used to have, but with far better user side control. From what I read shader permutation is currently one of the biggest challenges 3D programmers are facing so chances are high that work is being done here to improve things.

One thing we may do ourselves is adding some pipeline diagnostics, i.e. a CCMD that prints out all currently active pipelines, just to get an idea which ones are being created and maybe making some targeted optimizations to get those pre-initialized.

Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

by dpJudas » Mon Dec 19, 2022 4:30 am

Graf Zahl wrote: Mon Dec 19, 2022 3:54 amLong story short: Since this was made core I think we can assume that the cost of switching (and maintaining) pipelines is far higher than any perceived boost in shader performance.
I guess that makes sense. One thing is interestingly absent from the dynamic state: blend modes. Seems this particular thing is something they really don't want to be dynamic. Too bad for that is the part that creates the most permutations for us.

Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

by Graf Zahl » Mon Dec 19, 2022 3:54 am

dpJudas wrote: Mon Dec 19, 2022 3:10 am
Graf Zahl wrote:I mostly agree, that's why I pointed to the 3 dynamic_state extensions. At least the first one seems to be core in 1.3
Sorry, forgot to reply on this on specifically (why not use VK_EXT_extended_dynamic_state). In this particular case that would mean VKRenderState would have to issue different commands depending on what render path is active. Doable, sure, but also what exactly are the performance implications of using the extended dynamic state?

Part of the rationale behind creating these huge pipeline state objects in vulkan was that the old model for DX11/OpenGL left too much performance on the floor. It got so bad the drivers begun to create worker threads building pipelines behind the back of the application and a bunch of heuristics deciding when it was worth it and when it wasn't. The more the pipeline is set into "DX11 mode", the further you go back to that world. One really wonderful thing about Vulkan is how changing the pipeline is super fast, easy and predictable compared to OpenGL. I'd personally therefore much prefer an alternative approach, if possible. Especially since most of the state there isn't something GZDoom really does a lot of it - most of it is used for the portal stuff in a limited set of fixed configurations.
I know what the rationale was, and I also know that in particular for NVidia it crippled their hardware because these items are all hardware flags. I think AMD's first GCN generation was the main culprit for this design choice because it implemented too much state as shader instructions but from what I read AMD backpedaled from this design by now for the precise reason that it turned out to be the wrong way of doing things due to the explosion in shader code that needs to be created. If you ask me it's a typical case of micro-optimization gone wrong and missing the forest for the trees, i.e. seeing a minuscule performance improvement and hunting for it while totally neglecting the overhead this induced elsewhere.

Long story short: Since this was made core I think we can assume that the cost of switching (and maintaining) pipelines is far higher than any perceived boost in shader performance.
dpJudas wrote: Mon Dec 19, 2022 3:10 am
Graf Zahl wrote:As for VK_EXT_graphics_pipeline_library, look at the list of contributors. It's not just NVidia, but also AMD, ARM and many game developers. It is clear this is a high-in-demand feature that will get more widespread support because the relevant people need it. Of course the docs cannot give an estimate how hard it will be to implement.
That's also one of the reasons I was surprised that Nvidia was the only vendor with a driver supporting the extensions (added back in the spring). Maybe the others are just being slow, I don't know. :)
NVidia has always been the first with releasing drivers with new extensions, so nothing new here. AMD has always taken their time with these things, even in OpenGL land.

Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

by dpJudas » Mon Dec 19, 2022 3:10 am

Graf Zahl wrote:I mostly agree, that's why I pointed to the 3 dynamic_state extensions. At least the first one seems to be core in 1.3
Sorry, forgot to reply on this on specifically (why not use VK_EXT_extended_dynamic_state). In this particular case that would mean VKRenderState would have to issue different commands depending on what render path is active. Doable, sure, but also what exactly are the performance implications of using the extended dynamic state?

Part of the rationale behind creating these huge pipeline state objects in vulkan was that the old model for DX11/OpenGL left too much performance on the floor. It got so bad the drivers begun to create worker threads building pipelines behind the back of the application and a bunch of heuristics deciding when it was worth it and when it wasn't. The more the pipeline is set into "DX11 mode", the further you go back to that world. One really wonderful thing about Vulkan is how changing the pipeline is super fast, easy and predictable compared to OpenGL. I'd personally therefore much prefer an alternative approach, if possible. Especially since most of the state there isn't something GZDoom really does a lot of it - most of it is used for the portal stuff in a limited set of fixed configurations.
Graf Zahl wrote:As for VK_EXT_graphics_pipeline_library, look at the list of contributors. It's not just NVidia, but also AMD, ARM and many game developers. It is clear this is a high-in-demand feature that will get more widespread support because the relevant people need it. Of course the docs cannot give an estimate how hard it will be to implement.
That's also one of the reasons I was surprised that Nvidia was the only vendor with a driver supporting the extensions (added back in the spring). Maybe the others are just being slow, I don't know. :)

Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

by Graf Zahl » Mon Dec 19, 2022 2:56 am

I mostly agree, that's why I pointed to the 3 dynamic_state extensions. At least the first one seems to be core in 1.3.

What confuses me is that blend modes still seem to be hard tied to the pipeline object, these are often what causes the most permutations, but even so what they offer can reduce pipeline creation quite a bit.

As for VK_EXT_graphics_pipeline_library , look at the list of contributors. It's not just NVidia, but also AMD, ARM and many game developers. It is clear this is a high-in-demand feature that will get more widespread support because the relevant people need it. Of course the docs cannot give an estimate how hard it will be to implement.

Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

by dpJudas » Mon Dec 19, 2022 2:39 am

I agree that we should not restrict us to Vulkan 1.0, but I think the challenge is to still keep the code manageable. For VK_EXT_graphics_pipeline_library my worry is that this could be one of those extensions nobody else chose to adopt.

We can create the most frequent permutations up front - the catch is knowing which which that might be. Essentially the total possible permutations we can have are described by the VkPipelineKey struct. The thing about this struct is that we actually only ever use a small subset of the total possibilities (aside from RenderStyle), but the backend doesn't know which those might be.

For example, if we have about 5-10 variants used by portal code, those could be pre-created. Of the remaining pipelines i think we only really have 3 states: RenderStyle (blend mode), SpecialEffect/EffectState/AlphaTest (which shader is active), depth bias (decal or not). For each shader type we could pre-create the standard ERenderStyle types, so that's a total of something like 100 pipelines.

In short: it isn't so much that it couldn't be done. The issue is doing it in a way that is good for the code - forcing a bunch of assumptions in the backend doesn't seem like a good way to do this.

Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

by Graf Zahl » Mon Dec 19, 2022 2:09 am

dpJudas wrote: Mon Dec 19, 2022 1:34 am Upon further investigation I'm not sure if it worth the effort to support this right now as only Nvidia implemented VK_EXT_graphics_pipeline_library in their drivers. Sorry to get your hopes up, but maybe in the future if the other vendors start supporting it.

I think long term we will have no choice but to implement support for some extensions on an optional basis, especially when they provide significant benefits.

I also had a look at VK_EXT_extended_dynamic_state, VK_EXT_extended_dynamic_state2. and VK_EXT_extended_dynamic_state3. The first one in particular seems to have a few settings that are quite frequently changed in GZDoom so maybe implementing these is more of a viable option. Considering what they offer they should be a lot easier to implement because their settings often map directly to single features in FRenderState.

If we cater the entire Vulkan backend to the lowest common denominator to have one single one-size-fits-all render path and in turn cannot get the best performance out of supporting hardware it won't really do much good and only force us to prolong OpenGL support. Long term all these features will become core anyway, because it looks like they realized their mistake by now with their overly rigid pipeline design. Keep in mind: Right now only NVidia may support this, but sufficiently modern NVidia hardware makes up more than half of our user base!

In the past I had jumped onto more recent features quite quickly - for example I implemented proper shader support the moment I got a graphics card capable of handling it, but of course left the old render path active. Why can't we do the same here?

Also, if all this doesn't help, can't we do a dry run of the backend rendering a single scene with the most frequent permutations so that in game there's less hitching?

Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

by dpJudas » Mon Dec 19, 2022 1:34 am

Upon further investigation I'm not sure if it worth the effort to support this right now as only Nvidia implemented VK_EXT_graphics_pipeline_library in their drivers. Sorry to get your hopes up, but maybe in the future if the other vendors start supporting it.

Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

by 0mnicydle » Mon Dec 19, 2022 1:15 am

dpJudas wrote: Mon Dec 19, 2022 12:42 am Note that even if this was implemented it would never make the stutters go completely away - it would only reduce it to DX11/OpenGL levels.
Um, awesome?? Yes please?? That would be amazing, that's all I wanted honestly!

Like you can see in that OpenGL BoA vid I posted, there were a couple spikes about one pixel high in the graph... first of all there was only like 2, compared to several in Vulkan, and those miniscule spikes in OpenGL would never actually bother me, that is literally within the margin of error for the built-in GZDOOM frame limiter in my experience (in my vids RivaTuner is acting as the frame limiter during the video capture process).

Re: Asynchronous Vulkan Shader/Pipeline Generation & Portable Cache

by dpJudas » Mon Dec 19, 2022 12:42 am

It would appear there is actually one more vulkan extension available that can help reduce this problem: VK_EXT_graphics_pipeline_library. The issue complained about in this thread matches this blog post perfectly.

Note that even if this was implemented it would never make the stutters go completely away - it would only reduce it to DX11/OpenGL levels.

Re: Asynchronous Vulkan shader/pipeline generation & local shader cache

by 0mnicydle » Sat Dec 17, 2022 6:40 pm

So I was playing around with one of the heaviest mods I can think of, BoA 3.1, and while testing on driver ver 446.14 I discovered that some of the shaders generated in this mod (thought not all, or even most), are heavy enough to cause a stutter in Vulkan even on my golden driver, though things still seem nearly flawless in OpenGL. I somehow strangely feel a little better knowing Vulkan wasn't ever performing as well as OpenGL, even if it was performing significantly better than it is now.

I just want to share a couple of examples though back on 527.56 in BoA. I have all assets in this mod precached so all you are seeing here is stutter related to an empty shader cache (empty caches in both cases). Once again, pay attention to the frame time graph, as my recorded video is never going to sync perfectly to your particular display:

OpenGL
Vulkan

These stutters really affect me viscerally, it's basically like a jump scare--I hate it so much. It would be one thing if it was like, 'oh, just run through that opening level once per driver install and everything will be fine until you update drivers again,' no... at any moment in any level there could be another one of these "jump scares" when I least expect it. I really do hope at some point someone will look into adding async shader compile and/or a transferable shader cache. As it stands now it's basically like you have play through the whole mod/game once with stutters so you can play it a second time without, and be prepared to start the whole process over every time you update drivers. And in this particular mod Vulkan is basically a requirement to have a decent frame rate too because the level complexity is just insane at times, which really stinks because I can't do what I normally do and just continue to use OpenGL. Interestingly you will see one tiny little blip towards the beginning even in the OpenGL run (just goes to show you how hard this mod goes), but that level of spike is nowhere near enough to upset me...

Re: Asynchronous Vulkan shader/pipeline generation & local shader cache

by 0mnicydle » Sat Dec 17, 2022 1:51 pm

Very interesting, thank you for the insight. So I guess GZDOOM is already doing everything it can in this regard, short of implementing workarounds (the first post requests) for a poor driver implementation on Nvidia's part. Some of the notes about the Nvidia driver difficulties in that ubershader article were very upsetting... I'm guessing the stutter as it exists in GZDOOM right now is something akin to remaining stutter Dolphin was unable to eliminate on the Nvidia side. It must be frustrating to try and develop software that has to interact with what seems to be in many cases a black box like that (one that appears in many cases to be performing less than ideally on top of it).

At this point migrating to Linux and using open source drivers is becoming more and more attractive. I'd much rather sacrifice whatever performance benefits may exist in the closed source drivers, in exchange for sensible, solid and dependable implementations that are auditable.

Re: Asynchronous Vulkan shader/pipeline generation & local shader cache

by Graf Zahl » Sat Dec 17, 2022 1:29 pm

We already use such a shader - but in Vulkan there's lot of state tied to the pipeline so you stil have to create a ridiculous amount of them - and I am fairly certain that NVidia's hardware does not need this - but apparently still compiles multiple iterations of the same shader.

Top