[g3.5pre-44-g1455111dd] 50% performance regression since 3.3

Bugs that have been investigated and resolved somehow.

Moderator: GZDoom Developers

Forum rules
Please don't bump threads here if you have a problem - it will often be forgotten about if you do. Instead, make a new thread here.
dpJudas
 
 
Posts: 3037
Joined: Sat May 28, 2016 1:01 pm

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by dpJudas »

JJP wrote:The CPU is an AMD Phenom II X4 945 (yeah, it's kinda old), and the GPU is an Nvidia GeForce GT 710. The graphics driver being used is Nvidia's proprietary blob, version 390.48.
Seems it isn't an AMD driver thing anymore then. Your card and driver is close enough to what I'm using to roughly put it in the same family.
JJP wrote:Unfortunately, I'm unable to get decent data from either 'stat rendertimes' or the 'bench' console commands because the numbers that are reported, besides the FPS and map coordinates in the case of the 'bench' command, are all 0 or 0.000 or similar, which is certainly wrong. I have no idea why that is the case. This occurs with both GZDoom versions tested. I do get proper data if the OpenGL renderer is active, but this issue is about the software renderer, and I don't get sane data with the software renderer in all cases.
Try type 'stat fps_accumulated' on both while standing completely still at the spawn time and wait for the Frame and Drawers numbers to stabilize. If I can get the numbers for both builds on your computer that would be useful. The drawers number tells us how long time it spent drawing to the mapped pixel buffer object.

I only need the numbers for the software renderer in paletted mode.
JJP
Posts: 11
Joined: Wed Aug 10, 2005 11:08 pm

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by JJP »

dpJudas wrote: Try type 'stat fps_accumulated' on both while standing completely still at the spawn time and wait for the Frame and Drawers numbers to stabilize. If I can get the numbers for both builds on your computer that would be useful. The drawers number tells us how long time it spent drawing to the mapped pixel buffer object.

I only need the numbers for the software renderer in paletted mode.
Here they are:
  • 3.3.2: Frame=04.2ms, Walls: 00.7ms, Planes=00.4ms, Masked=00.0ms, Drawers=02.4ms
  • 3.4.1: Frame=08.2ms, Walls: 00.7ms, Planes=00.4ms, Masked=00.0ms, Drawers=03.0ms
Blzut3
 
 
Posts: 3144
Joined: Wed Nov 24, 2004 12:59 pm
Graphics Processor: ATI/AMD with Vulkan/Metal Support
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Blzut3 »

Graf Zahl wrote:I wouldn't be surprised if the buffer mapping is in some way responsible. That part seems to be so incredibly spotty on some systems that even looking at it strangely might make it throw up.
Do you think you can pinpoint the problem?
I'll try poking at the code to narrow it down soon.

Until then though a few more data points to round off the set: Can reproduce the issue on my aforementioned broadwell cpu+gpu laptop on Windows 10 (driver 15.40.38.4963). Making sure to use the OpenGL canvas at 720p it goes from ~229fps (~265fps with D3D) to ~129fps in 3.4. On my main AMD+AMD system (Radeon Pro Software 18.5.2 Beta) at 1080 much smaller hit is observed going from ~125fps (~315fps with D3D) to ~103fps.

stat fps_accumulated:
Intel (720p) 3.3: frame=04.4ms walls=00.8ms planes=00.4ms masked=00.1ms drawers=01.8ms
Intel (720p) 3.4: frame=07.6ms walls=00.5ms planes=00.3ms masked=00.0ms drawers=01.1ms
AMD (1080p) 3.3: frame=08.6ms walls=01.1ms planes=00.8ms masked=00.1ms drawers=06.1ms
AMD (1080p) 3.4: frame=09.9ms walls=01.1ms planes=00.7ms masked=00.1ms drawers=06.1ms

So basically every OS and system combination I have that can still run GZDoom exhibits a regression to some extent or another.
dpJudas wrote:Seems it isn't an AMD driver thing anymore then.
Well I did reproduce it on the Linux Intel driver so we knew that already. :P
dpJudas
 
 
Posts: 3037
Joined: Sat May 28, 2016 1:01 pm

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by dpJudas »

I'm not really sure why the drawers are so high on the AMD 1080p - is that the Threadripper?

For the other numbers it more or less puts the software renderer itself in the clear. It almost has to be a mapping delay issue. There is one change between master and the old swglfb thing: it used to use two buffers it would flip-flop between to make sure it wouldn't be stalled by the GPU still using the buffer. The code on master only uses one buffer (*), so it could be that on some cards it thinks the GPU is still using the buffer.

*) Master actually has an array made for two buffers, but from what I could tell it only ever uses the first index.
dpJudas
 
 
Posts: 3037
Joined: Sat May 28, 2016 1:01 pm

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by dpJudas »

On the 'modern' branch I tried adding the few differences I could spot between what the old code did and what the new code does:

1) It now creates two buffers so that two frames in a row doesn't use the same pixel buffer object (avoids possible GPU locking contention on the buffer)
2) It uses glTexSubImage2D for uploads rather than glTexImage2D (avoids possible recreating the texture on the GPU each frame)

The only difference left that I can spot would be that the new code does a glFinish call (required by the persistent buffer used by the 2d drawer) while the old code did not.

None of this makes any difference on my computer, but perhaps it does on those affected by the 50% performance regression.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49056
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Graf Zahl »

dpJudas wrote:On the 'modern' branch...
Wouldn't this have been better on master for testing?
dpJudas
 
 
Posts: 3037
Joined: Sat May 28, 2016 1:01 pm

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by dpJudas »

It would have been, but I accidentally coded it on the modern branch and then 'git stash pop' gave me enough merge conflicts to convince me I had to code it all over if I wanted it on the master branch.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49056
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Graf Zahl »

The commit worked as-is when being cherry-picked. I don't think that stash is the best way of transferring stuff between branches. As soon as you got other changes in the affected files it will throw up its hands in despair, but a regular merge will do just fine.
Blzut3
 
 
Posts: 3144
Joined: Wed Nov 24, 2004 12:59 pm
Graphics Processor: ATI/AMD with Vulkan/Metal Support
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Blzut3 »

dpJudas wrote:I'm not really sure why the drawers are so high on the AMD 1080p - is that the Threadripper?
Might have been a transcription error. On Linux at least the drawers are 00.8ms and frame is 08.6ms. But yes that is the Threadripper.
dpJudas wrote:None of this makes any difference on my computer, but perhaps it does on those affected by the 50% performance regression.
Graf's cherry-pick commit 01bda6348ed346223a296a77bd4292f13c571693 does not fix the issue.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by drfrag »

Same here in the legacy branch, exactly same results @1024 (CRT). This is a MinGW debug build on GL 3.3 AMD hardware.

frame=30.2 ms walls=13.2 ms planes=06.4 ms masked=00.3 ms drawers=00.0 ms 552 counts
Blzut3
 
 
Posts: 3144
Joined: Wed Nov 24, 2004 12:59 pm
Graphics Processor: ATI/AMD with Vulkan/Metal Support
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Blzut3 »

Found it!

Compare: https://github.com/coelckers/gzdoom/blo ... e.cpp#L299
To: https://github.com/coelckers/gzdoom/blo ... r.cpp#L252

Changing GL_DYNAMIC_DRAW to GL_STREAM_DRAW there makes master 20% faster than 3.3.2 on my Threadripper.
dpJudas
 
 
Posts: 3037
Joined: Sat May 28, 2016 1:01 pm

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by dpJudas »

Actually, it is this line that decides the buffer type for the frame buffer: https://github.com/coelckers/gzdoom/blo ... .cpp#L1352

Maybe UseMappedMemBuffer is false for you in 3.3.2? The old code basically had two modes: one where GZDoom would create a system memory buffer itself and then memcpy it to a mapped pixel buffer object (GL_STREAM_DRAW), and then a mode where it maps the pixel buffer object and the software renderer would draw directly to it (GL_DYNAMIC_DRAW).

The OpenGL buffer allocation semantics are rather unclear, but it is my understanding that GL_STREAM_DRAW will generally create a GPU memory buffer, while GL_DYNAMIC_DRAW will create a system memory buffer. Reading from GPU memory is extremely expensive, so if I'm correct about this then you should be getting very poor performance now from maps with translucency (no vanilla map does this).
Blzut3
 
 
Posts: 3144
Joined: Wed Nov 24, 2004 12:59 pm
Graphics Processor: ATI/AMD with Vulkan/Metal Support
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Blzut3 »

OK, you're correct that the similar/better frame rate is just a coincidence. Forcing 3.3.2 to GL_STREAM_DRAW where you indicated results in 3.3.2 performing better again.

I'm not sure what constitutes a lot of tranlucency, but BFG balls covering the screen still perform better with STREAM than DYNAMIC.
dpJudas
 
 
Posts: 3037
Joined: Sat May 28, 2016 1:01 pm

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by dpJudas »

Translucency here is anything with actual alpha blending - i.e. if you can see partially through the BFG balls. In classic vanilla that wasn't the case, so it depends a bit on how you configured GZDoom.

It sounds like GL_DYNAMIC_DRAW doesn't work that reliably on some GPU's. That is, there's no knowing what the driver might decide to do. To be honest I'm not too surprised as I doubt that many applications use that mode. Generally those flags were just plain shit and they replaced the entire thing with glBufferStorage in OpenGL 4.4.

Maybe the best approach here is to use a system memory buffer if OpenGL is less than 4.4 and then memcpy it to a PBO created with GL_STREAM_DRAW. If it is 4.4 or later it will use glBufferStorage(GL_DYNAMIC_STORAGE_BIT|GL_MAP_READ_BIT|GL_MAP_WRITE_BIT|GL_CLIENT_STORAGE_BIT) + glInvalidateBufferData() + glMapBuffer(GL_READ_WRITE). That way we're at least sure what we will get.
Blzut3
 
 
Posts: 3144
Joined: Wed Nov 24, 2004 12:59 pm
Graphics Processor: ATI/AMD with Vulkan/Metal Support
Contact:

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Post by Blzut3 »

Some notes: With DYNAMIC_DRAW if I disable the software renderer from doing any drawing (comment out SWRenderer->RenderView in SWSceneDrawer::RenderView along with the stuff after CreateTexture) things cap out at about 200fps, with STREAM_DRAW it caps at about 1700fps. Removing Renderer->RenderView from D_Display on 3.3.2 results in basically the same frame rates between the two for both modes.

After changing SWSceneDrawer::RenderView to give DCanvas a C++ allocated static array and disabling the call to MapBuffer the frame rate sits at 240fps which is what STREAM_DRAW gives. Given that 3.3.2 can hit 300fps with STREAM_DRAW I'm guessing this indicates a performance loss across versions in the software renderer itself as well (which was masked by hitting the OpenGL limit in stock 3.3.2), but still doesn't tell me where the original regression is coming from. Will need to dig more into that and see what the actual rendering limit is.

At this point though I'm not sure where to poke the way the buffer is handled between versions is too different to for someone with zero OpenGL knowledge like myself to really narrow it down to one specific thing.
dpJudas wrote:so it depends a bit on how you configured GZDoom.
GZDoom adds translucency to most projectiles stock so yes the BFG balls were translucent. I just wasn't sure if that was an adequate test case.
Post Reply

Return to “Closed Bugs [GZDoom]”