The new OpenGL ES Renderer... WOW!
-
-
- Posts: 3134
- Joined: Sat May 28, 2016 1:01 pm
Re: The new OpenGL ES Renderer... WOW!
Never mind about the shadowContribution thing. I managed to confuse myself while launching GZDoom. It is still slow with it commented out.
-
- Lead GZDoom+Raze Developer
- Posts: 49183
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: The new OpenGL ES Renderer... WOW!
Yeah, I noticed.
BTW, what tipped it over in 3.4 vs. 3.3.2 was that flat rendering was changed from per-subsector to per-sector. Normally this won't make much of a difference, but MAP18 with its large sector that contains over 100 lights and covering a large part of the screen is a real edge case for this.
I think the GLES renderer is cheating and not rendering all lights. I was running another test with Waterlabs GZD and that one looks that a lot of lights are just missing. In MAP18 that's barely noticable.
BTW, what tipped it over in 3.4 vs. 3.3.2 was that flat rendering was changed from per-subsector to per-sector. Normally this won't make much of a difference, but MAP18 with its large sector that contains over 100 lights and covering a large part of the screen is a real edge case for this.
I think the GLES renderer is cheating and not rendering all lights. I was running another test with Waterlabs GZD and that one looks that a lot of lights are just missing. In MAP18 that's barely noticable.
-
- Posts: 160
- Joined: Fri Nov 15, 2019 4:28 am
- Graphics Processor: Intel with Vulkan/Metal Support
- Location: Australia
Re: The new OpenGL ES Renderer... WOW!
I have gl_light_shadowmap=false in my ini.Oh, so it's the shadowmap
Sounds good! I'm just glad I accidentally triggered thisSo I guess for the dynamic light stuff a bit of code optimizaton is needed then
Interesting you should mention that. I've always noticed that the 'finish' part was always taking a significant chunk of the render time.Been testing some more and it seems to be related to the "pipelining" part. It is the Finish that is slow
I have these pseudo-benchmark results from the starting view of Demonastery. Will provide that file below just for funzies, but please note that this is from years ago, probably from v4.2-v4.4 era of GZDoom, with older drivers on Windows 10. Same hardware though. I had annotated the file with driver versions etc. at the beginning of each block.
Graf, dpJudas, emile_b, Rachael... Let me know if you'd like me to test specific things. I'm at your service.
You do not have the required permissions to view the files attached to this post.
Last edited by bLUEbYTE on Fri Feb 17, 2023 1:33 am, edited 1 time in total.
-
- Posts: 255
- Joined: Mon Jan 09, 2023 2:02 am
- Graphics Processor: nVidia (Modern GZDoom)
Re: The new OpenGL ES Renderer... WOW!
I played around with it a bit, too, and did some test edits of the light definitions. Removing the dynamic lights from the health and armor bonus items gave a significant speed boost to MAP18.
So what's going to be the solution for this? I see two things that could be done - first add a light limiter the user can set and second, offer an optimized light.pk3 for low end systems that omits a few faint lights that don't add much but may cost a lot of performance. Especially these bonuses often come in large numbers so removing the light definitions for those may help a lot already.
Another thing I noticed is that when attenuated lights were added the radii of all affected items were increased by a factor of 1.5, so would it make sense to un-attenuate the lights in such a low end setup to reduce the radius again and let them affect fewer elements?
So what's going to be the solution for this? I see two things that could be done - first add a light limiter the user can set and second, offer an optimized light.pk3 for low end systems that omits a few faint lights that don't add much but may cost a lot of performance. Especially these bonuses often come in large numbers so removing the light definitions for those may help a lot already.
Another thing I noticed is that when attenuated lights were added the radii of all affected items were increased by a factor of 1.5, so would it make sense to un-attenuate the lights in such a low end setup to reduce the radius again and let them affect fewer elements?
-
- Posts: 255
- Joined: Mon Jan 09, 2023 2:02 am
- Graphics Processor: nVidia (Modern GZDoom)
Re: The new OpenGL ES Renderer... WOW!
Once you understand how a GPU works it is quite obvious why so much time often ends up in there.
Depending on your GPU's performance the renderer may still be able to submit its draw calls faster than the GPU is able to process. So normally, when the CPU is finished the GPU still needs to do some work, so before presenting the current frame the CPU needs to wait for the GPU. This wait time is what will show up in the 'finish' part.
-
- Lead GZDoom+Raze Developer
- Posts: 49183
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: The new OpenGL ES Renderer... WOW!
I wouldn't do a different lights.pk3. A better option would be to give lights a priority and then in performance mode disable low priority lights like the bonuses or other pickups that can appear in larger numbers.
Disabling attenuation and shrinking the radius would also be an option.
The third one, of course, is what GLES does, i.e. limit the number of lights per surface.
That way it can be better tuned to the actual hardware.
Disabling attenuation and shrinking the radius would also be an option.
The third one, of course, is what GLES does, i.e. limit the number of lights per surface.
That way it can be better tuned to the actual hardware.
-
-
- Posts: 3134
- Joined: Sat May 28, 2016 1:01 pm
Re: The new OpenGL ES Renderer... WOW!
Shrinking the radius will have limited effect on MAP18. The reason being that the entire floor in the courtyard is one big surface that shares the same light list. Even though the shader's light code attempts to early out based on distance, it still have to examine the full list of lights for every pixel it draws.
-
- Lead GZDoom+Raze Developer
- Posts: 49183
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: The new OpenGL ES Renderer... WOW!
That's correct for this place, but on lower end hardware the size reduction may actually help in other places. The goal here should be to give users of low end hardware some different means to reduce GPU load for different kinds of maps. In a map like Waterlabs GZD which is overstuffed with dynamic lights the effect will be very different already.
-
- Lead GZDoom+Raze Developer
- Posts: 49183
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: The new OpenGL ES Renderer... WOW!
I did some more experimentation and I think I now know the cause of the performance problem - at least on NVidia.
Doing branching depending on SSBO contents does not work. The shader will always execute all parts, i.e. run the entire lighting logic on every pixel for every light - including the very costly spotlight code, even though there are no spot lights present.
This also explains why most of my tests yesterday did not provide any results - I mainly removed code depending on global uniforms, not the buffer content
Just switching back to uniform buffers makes things go 2-3x faster on MAP18.
I wonder what's up on Intel then. At least on OpenGL it should have defaulted to uniform buffers.
Doing branching depending on SSBO contents does not work. The shader will always execute all parts, i.e. run the entire lighting logic on every pixel for every light - including the very costly spotlight code, even though there are no spot lights present.
This also explains why most of my tests yesterday did not provide any results - I mainly removed code depending on global uniforms, not the buffer content
Just switching back to uniform buffers makes things go 2-3x faster on MAP18.
I wonder what's up on Intel then. At least on OpenGL it should have defaulted to uniform buffers.
-
- Lead GZDoom+Raze Developer
- Posts: 49183
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: The new OpenGL ES Renderer... WOW!
Pushed a change to master, OpenGL only so far. I need feedback how this fares on other hardware before continuing here. Anyone interested should compare the upcoming devbuild with the 4.10 release
EDIT: Tried to add it to Vulkan, too, but it looks the entire infrastructure for using uniform buffers here has never been implemented. Som I'm going to wait with this until there's some feedback.
EDIT: Tried to add it to Vulkan, too, but it looks the entire infrastructure for using uniform buffers here has never been implemented. Som I'm going to wait with this until there's some feedback.
-
- Posts: 2119
- Joined: Thu May 02, 2013 1:27 am
- Operating System Version (Optional): Windows 10
- Graphics Processor: nVidia with Vulkan support
- Location: Brazil
Re: The new OpenGL ES Renderer... WOW!
What was it using SSBOs for? SSBOs are generally slower, and I believe the way they work makes it harder for the GPU to determine if threads will diverge, so they're more likely to go with the "run both branches and mask the results" way of branching and all the register pressure that ensues. And I don't believe they can ever be treated as "constant" values, even if indexed with a constant value, so that's another way to get that kind of branching because of SSBOs, whereas UBOs can be treated as "constant" by the GPU (AFAIK even array access in an UBO can, as long as the index is constant or "constant") and IIRC Nvidia can even specialize the shader on the uniforms.Graf Zahl wrote: ↑Sat Feb 18, 2023 3:49 am I did some more experimentation and I think I now know the cause of the performance problem - at least on NVidia.
Doing branching depending on SSBO contents does not work. The shader will always execute all parts, i.e. run the entire lighting logic on every pixel for every light - including the very costly spotlight code, even though there are no spot lights present.
This also explains why most of my tests yesterday did not provide any results - I mainly removed code depending on global uniforms, not the buffer content
Just switching back to uniform buffers makes things go 2-3x faster on MAP18.
-
- Lead GZDoom+Raze Developer
- Posts: 49183
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: The new OpenGL ES Renderer... WOW!
The buffer for the light data was an SSBO because UBOs have a size limit. In most situations the performance difference was not noticable because overall processing time of lights is low. I guess MAP18 is the only place where this could be tested effectively but since this is still at 140 fps on a Geforce 1060 there was no way to notice it without explicitly disabling vsync and switching on the FPS display.
I sincerely hope they don't. Recompiling shaders for these things leads to those infamous microstutters that are impossible to control on the application side.
and IIRC Nvidia can even specialize the shader on the uniforms.
I sincerely hope they don't. Recompiling shaders for these things leads to those infamous microstutters that are impossible to control on the application side.
-
- Posts: 2119
- Joined: Thu May 02, 2013 1:27 am
- Operating System Version (Optional): Windows 10
- Graphics Processor: nVidia with Vulkan support
- Location: Brazil
Re: The new OpenGL ES Renderer... WOW!
I imagine they're specialized at a fairly low level. If it knows it's a constant value, it can have it already set up to specialize it without recompiling the shaders. After all, if it knows the value is gonna be constant over a whole warp at execution time, it can assign registers and such based on the knowledge that only one of the branches can ever be taken per warp. It might have specialized instructions on the ISA for that, or rely on how a warp can elide an unused branch if it can determine all threads will go with the same branch, or it might even just remove the branch from the shader assembly before execution.
-
- Posts: 160
- Joined: Fri Nov 15, 2019 4:28 am
- Graphics Processor: Intel with Vulkan/Metal Support
- Location: Australia
Re: The new OpenGL ES Renderer... WOW!
Just did some performance testing on MAP18.
FIrst of all, compared to my initial benchmark in the original post, I have changed some settings with hopes of getting more accurate results:
FPS limit: changed to unlimited from 120 FPS
Vsync: turned off
4.10.0, OpenGL ES: ~ 200 FPS
4.10.0, OpenGL: ~ 70 FPS
devbuild 087050c20, OpenGL: ~105 FPS
So I can confirm a significant 50% boost in performance with the devbuild. Furthermore, I can also confirm the 'skipped' lights with the GLES backend; 12 of of the armor bonuses did not show dynamic lights - the ones closer to the closet in the center on two of the four lines, 6 on each 'line' was missing the lights. I am not sure if it's particular to my setup, but none of the health bonuses had lights around them regardless of backend.
FIrst of all, compared to my initial benchmark in the original post, I have changed some settings with hopes of getting more accurate results:
FPS limit: changed to unlimited from 120 FPS
Vsync: turned off
4.10.0, OpenGL ES: ~ 200 FPS
4.10.0, OpenGL: ~ 70 FPS
devbuild 087050c20, OpenGL: ~105 FPS
So I can confirm a significant 50% boost in performance with the devbuild. Furthermore, I can also confirm the 'skipped' lights with the GLES backend; 12 of of the armor bonuses did not show dynamic lights - the ones closer to the closet in the center on two of the four lines, 6 on each 'line' was missing the lights. I am not sure if it's particular to my setup, but none of the health bonuses had lights around them regardless of backend.
Last edited by bLUEbYTE on Sat Feb 18, 2023 7:08 pm, edited 2 times in total.
-
-
- Posts: 3134
- Joined: Sat May 28, 2016 1:01 pm