[g3.5pre-44-g1455111dd] 50% performance regression since 3.3

Bugs that have been investigated and resolved somehow.

Moderator: GZDoom Developers

Forum rules
Please don't bump threads here if you have a problem - it will often be forgotten about if you do. Instead, make a new thread here.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by drfrag »

That one is much slower for me on GL 3.3.
dpJudas
 
 
Posts: 3040
Joined: Sat May 28, 2016 1:01 pm

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by dpJudas »

Okay, I think the conclusion here is then that on older cards (pre OpenGL 4.4) we will do a memcpy.

About a speed regression also in the software renderer itself, that is of course always a possibility. Only major change I can think of lately would be the added models support. That can be turned off using r_models 0. It shouldn't affect anything unless there's models present in the map, though.

There is one other minor OpenGL difference since the older release: glFinish. The 3.5pre calls that while the old version did not. On Nvidia this has a price (*), but its not very big as I know the GL renderer can reach 1200 fps on my computer. But maybe it is different for AMD? What is the maximum fps you (blzut3) get from the GL renderer? 400 fps to 600 fps is just 0.9 ms, so it doesn't take much.

*) I debugged the pixel-path synchronized to 3D warning mentioned earlier to being the glFinish call done at the end of the frame.
Blzut3
 
 
Posts: 3144
Joined: Wed Nov 24, 2004 12:59 pm
Graphics Processor: ATI/AMD with Vulkan/Metal Support
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Blzut3 »

The maximum I can get from the OpenGL renderer face against a wall is 1180fps. As I believe I mentioned earlier in this thread, if I point the software renderer to a C++ array the frame rate is the same as if it was drawing to OpenGL while no renderer at all is 3000+fps. So unless there's something other than DCanvas that indicates where the software renderer draws to, I'm pretty sure that the frame rate I'm getting now is the CPU limit. r_models 0 doesn't make a difference. When I tried to find the original issue with a profiler it looked like wall drawing was taking more time, but too many variables to draw any conclusion based off that profiler run.

By the way the regression is also fixed on the Intel laptop under Linux (although now that I think about it, it is weird that there was a regression at all there given shared memory and all, but who knows). Waiting for tomorrow's build to check Windows.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49066
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Graf Zahl »

dpJudas wrote:Okay, I think the conclusion here is then that on older cards (pre OpenGL 4.4) we will do a memcpy.
Not that I care much about older hardware, but my suspicion here is that it's an AMD problem, not a GL 3 one. If there has been one universal constant with OpenGL over the last 15 years is that if given a choice by the spec, AMD always uses the bad option.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by drfrag »

That could well be the case, which nvidia cards do support GL 3.x? 8000 and 9000 series?
We could ask for help testing two devbuilds. Anyone with one of those cards out there?
dpJudas
 
 
Posts: 3040
Joined: Sat May 28, 2016 1:01 pm

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by dpJudas »

Graf Zahl wrote:Not that I care much about older hardware, but my suspicion here is that it's an AMD problem, not a GL 3 one.
Pretty sure of that too, but I don't feel like throwing more time after old hardware than I've already done. :)
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by drfrag »

And that's fine, i've stuck to the memcpy version in the legacy build. Also performance should be good enough (albeit not optimal) on modern cards.

@Blzut3: have you tried the memcpy version on your Windows machine? (http://devbuilds.drdteam.org/gzdoom/gzd ... 3d1e3eb.7z)
How about performance there?

I'm no expert in OpenGL (i admit i pretty much have no idea :) ) but out of curiosity i've 'solved' the problem in the old ill-fated legacy build (now LZDoom) this way and performance here has gone from 30 to 90 fps: https://github.com/drfrag666/gzdoom/com ... 5037024258
Blzut3
 
 
Posts: 3144
Joined: Wed Nov 24, 2004 12:59 pm
Graphics Processor: ATI/AMD with Vulkan/Metal Support
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Blzut3 »

No change on Windows using today's DRD Team build although since I retested I can confirm that the drawers being through the roof earlier was not a transcription error. There's definitely something going on there. Thought it might have to do with my CPU having 32 threads, but taking it down to 8 cores/8 threads makes no difference.

http://maniacsvault.net/loosefiles/gzdo ... 32_d3d.png
http://maniacsvault.net/loosefiles/gzdo ... 35_ogl.png
http://maniacsvault.net/loosefiles/gzdo ... eiling.png

No change goes for both Intel and AMD. With Intel showing a 50% regression still OGL to OGL, 66% D3D to OGL. On the AMD side the regression seems to be the drawers taking forever and this is effectively a D3D->OGL regression since 3.3.2's OGL mode shows the same issue. gl_debug doesn't tell me anything on AMD, but Intel gives "[API] low severity, performance: API_ID_REDUNDANT_FBO performance warning has been generated. Redundant state change in GlBindFramebuffer API call, FBO 1, "PipelineFB", already bound."

So in summary:

Threadripper+FirePro on Linux = Better than 3.3.2 but only because DYNAMIC_DRAW is capping 3.3.2. Small regression in performance elsewhere once that is lifted.
Threadripper+FirePro on Windows = Same bad performance as 3.3.2 modulo small additional regression (likely same as above?). 3.3.2 is 3x faster in D3D. See edit below
Intel Linux = More or less same as 3.3.2.
Intel Windows = Heavily regressed still.

Edit:
drfrag wrote: @Blzut3: have you tried the memcpy version on your Windows machine? (http://devbuilds.drdteam.org/gzdoom/gzd ... 3d1e3eb.7z)
How about performance there?
This version fixes the issue on AMD. Roughly same performance as Linux with 4d35b128089da52efc69c62dc0993f1aa47778bd. (Side-note: D3D 3.3.2 performs similarly to 3.3.2 OGL on Linux with STREAM_DRAW.) Linux is slightly slower with that commit than the latest.

Doesn't help Intel. Still ~120fps vs ~220fps OGL/~300fps D3D.
dpJudas
 
 
Posts: 3040
Joined: Sat May 28, 2016 1:01 pm

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by dpJudas »

That is some rather depressing results. Seems the driver behavior is all over the map, depending on both vendor and platform.

Based on all of this I've reached the conclusion that we can't use a pixel buffer object reliably in any way beyond copying the finished frame to it. My latest commit to master changes it so that it now uses DSimpleCanvas, like the D3D9 target did, which includes calculating the pitch to align with cache lines. Then to at least try accelerate the memcpy I made it so that the worker threads executes the the copy.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49066
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Graf Zahl »

dpJudas wrote:Seems the driver behavior is all over the map, depending on both vendor and platform.

Welcome to the wonderful world of OpenGL... :?
This is precisely why Vulkan requires explicit specification of nearly everything (what you called "micromanaging"...)
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by drfrag »

The new version is faster again here (legacy branch on AMD GL 3.3), 60 vs 50 fps @1024.
As a side note on the g3.3mgw branch back to drawing directly to the PBO for intel (according to Blzut's report is faster). I'll try to port the new DSimpleCanvas thing there. Edit: that thing is not portable, i had some junk unversioned files after trying to cherry-pick something.
Blzut3
 
 
Posts: 3144
Joined: Wed Nov 24, 2004 12:59 pm
Graphics Processor: ATI/AMD with Vulkan/Metal Support
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Blzut3 »

dpJudas wrote:There is one other minor OpenGL difference since the older release: glFinish. The 3.5pre calls that while the old version did not. On Nvidia this has a price (*), but its not very big as I know the GL renderer can reach 1200 fps on my computer. But maybe it is different for AMD? What is the maximum fps you (blzut3) get from the GL renderer? 400 fps to 600 fps is just 0.9 ms, so it doesn't take much.
Tried removing the call to glFinish and that is indeed the last piece of the regression from 3.3.2 at least on Linux. Without them I get the same perf as 3.3.2.

Also fun side note, Without glFinish and enabling r_scene_multithreading my threadripper obtains 75% of OpenGL performance. Had to force the number of threads down to 12 through since 32 gives what I assume is false sharing and reduces performance. (And honestly the scaling mostly stops at 8 threads.) Even outperforms in some of the more complex scenes. 25% faster than OpenGL in Frozen Time for example. :P
dpJudas
 
 
Posts: 3040
Joined: Sat May 28, 2016 1:01 pm

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by dpJudas »

Blzut3 wrote:Also fun side note, Without glFinish and enabling r_scene_multithreading my threadripper obtains 75% of OpenGL performance. Had to force the number of threads down to 12 through since 32 gives what I assume is false sharing and reduces performance. (And honestly the scaling mostly stops at 8 threads.) Even outperforms in some of the more complex scenes. 25% faster than OpenGL in Frozen Time for example. :P
That is pretty cool! I always wanted to know how a threadripper performed on this. :)

r_scene_multithreaded works by splitting the scene into field-of-view slices, one for each thread. Each slice of the full view traverses differently through the BSP and that allows it to spread the load across cores. The catch is that some subsectors and lines are seen by several slices. When that happens the cores do the same work. Sounds like 8-12 slices is where the shared subsectors begin to outnumber the gained speed.

It is a shame there are some rare cases where r_scene_multithreaded causes render glitches which prevents me from turning it on per default.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49066
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Graf Zahl »

Blzut3 wrote:Tried removing the call to glFinish and that is indeed the last piece of the regression from 3.3.2 at least on Linux. Without them I get the same perf as 3.3.2.
What glFinish call? Keep in mind that this cannot be removed unconditionally, because it may cause some glitches on other hardware. For example, on my system I need this call to get tearing-free display with a frame rate less than 60 fps but without making the engine drop to 30 fps right away.

dpJudas wrote: It is a shame there are some rare cases where r_scene_multithreaded causes render glitches which prevents me from turning it on per default.

Being able to do this with hardware rendering is actually the main reason for me to do a Vulkan backend. Are those glitches related to having to render those subsectors multiple times and this causing problems with how the software renderer orders things?
Blzut3
 
 
Posts: 3144
Joined: Wed Nov 24, 2004 12:59 pm
Graphics Processor: ATI/AMD with Vulkan/Metal Support
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Blzut3 »

Graf Zahl wrote:What glFinish call? Keep in mind that this cannot be removed unconditionally, because it may cause some glitches on other hardware. For example, on my system I need this call to get tearing-free display with a frame rate less than 60 fps but without making the engine drop to 30 fps right away.
The ones in gl_framebuffer.cpp. The hardware renderer does need them, but as far as I can tell the software renderer does not. At least I can say I extensively used 3.3.2 which didn't call glFinish and haven't experienced any issues there.
Post Reply

Return to “Closed Bugs [GZDoom]”