[g3.5pre-44-g1455111dd] 50% performance regression since 3.3

Is there something that doesn't work right in the latest GZDoom? Post about it here.

Moderator: Developers

Forum rules
Please construct and post a simple demo whenever possible for all bug reports. Please provide links to everything.

If you can include a wad demonstrating the problem, please do so. Bug reports that include fully-constructed demos have a much better chance of being investigated in a timely manner than those that don't.

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby dpJudas » Sun Jun 17, 2018 5:56 pm

JJP wrote:The CPU is an AMD Phenom II X4 945 (yeah, it's kinda old), and the GPU is an Nvidia GeForce GT 710. The graphics driver being used is Nvidia's proprietary blob, version 390.48.

Seems it isn't an AMD driver thing anymore then. Your card and driver is close enough to what I'm using to roughly put it in the same family.

JJP wrote:Unfortunately, I'm unable to get decent data from either 'stat rendertimes' or the 'bench' console commands because the numbers that are reported, besides the FPS and map coordinates in the case of the 'bench' command, are all 0 or 0.000 or similar, which is certainly wrong. I have no idea why that is the case. This occurs with both GZDoom versions tested. I do get proper data if the OpenGL renderer is active, but this issue is about the software renderer, and I don't get sane data with the software renderer in all cases.

Try type 'stat fps_accumulated' on both while standing completely still at the spawn time and wait for the Frame and Drawers numbers to stabilize. If I can get the numbers for both builds on your computer that would be useful. The drawers number tells us how long time it spent drawing to the mapped pixel buffer object.

I only need the numbers for the software renderer in paletted mode.
dpJudas
 
 
 
Joined: 28 May 2016

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby JJP » Sun Jun 17, 2018 6:32 pm

dpJudas wrote:Try type 'stat fps_accumulated' on both while standing completely still at the spawn time and wait for the Frame and Drawers numbers to stabilize. If I can get the numbers for both builds on your computer that would be useful. The drawers number tells us how long time it spent drawing to the mapped pixel buffer object.

I only need the numbers for the software renderer in paletted mode.

Here they are:
  • 3.3.2: Frame=04.2ms, Walls: 00.7ms, Planes=00.4ms, Masked=00.0ms, Drawers=02.4ms
  • 3.4.1: Frame=08.2ms, Walls: 00.7ms, Planes=00.4ms, Masked=00.0ms, Drawers=03.0ms
JJP
 
Joined: 11 Aug 2005

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby Blzut3 » Sun Jun 17, 2018 11:19 pm

Graf Zahl wrote:I wouldn't be surprised if the buffer mapping is in some way responsible. That part seems to be so incredibly spotty on some systems that even looking at it strangely might make it throw up.
Do you think you can pinpoint the problem?

I'll try poking at the code to narrow it down soon.

Until then though a few more data points to round off the set: Can reproduce the issue on my aforementioned broadwell cpu+gpu laptop on Windows 10 (driver 15.40.38.4963). Making sure to use the OpenGL canvas at 720p it goes from ~229fps (~265fps with D3D) to ~129fps in 3.4. On my main AMD+AMD system (Radeon Pro Software 18.5.2 Beta) at 1080 much smaller hit is observed going from ~125fps (~315fps with D3D) to ~103fps.

stat fps_accumulated:
Intel (720p) 3.3: frame=04.4ms walls=00.8ms planes=00.4ms masked=00.1ms drawers=01.8ms
Intel (720p) 3.4: frame=07.6ms walls=00.5ms planes=00.3ms masked=00.0ms drawers=01.1ms
AMD (1080p) 3.3: frame=08.6ms walls=01.1ms planes=00.8ms masked=00.1ms drawers=06.1ms
AMD (1080p) 3.4: frame=09.9ms walls=01.1ms planes=00.7ms masked=00.1ms drawers=06.1ms

So basically every OS and system combination I have that can still run GZDoom exhibits a regression to some extent or another.
dpJudas wrote:Seems it isn't an AMD driver thing anymore then.

Well I did reproduce it on the Linux Intel driver so we knew that already. :P
Blzut3
Pronounced: B-l-zut
 
Joined: 24 Nov 2004

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby dpJudas » Mon Jun 18, 2018 4:59 am

I'm not really sure why the drawers are so high on the AMD 1080p - is that the Threadripper?

For the other numbers it more or less puts the software renderer itself in the clear. It almost has to be a mapping delay issue. There is one change between master and the old swglfb thing: it used to use two buffers it would flip-flop between to make sure it wouldn't be stalled by the GPU still using the buffer. The code on master only uses one buffer (*), so it could be that on some cards it thinks the GPU is still using the buffer.

*) Master actually has an array made for two buffers, but from what I could tell it only ever uses the first index.
dpJudas
 
 
 
Joined: 28 May 2016

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby dpJudas » Mon Jun 18, 2018 2:22 pm

On the 'modern' branch I tried adding the few differences I could spot between what the old code did and what the new code does:

1) It now creates two buffers so that two frames in a row doesn't use the same pixel buffer object (avoids possible GPU locking contention on the buffer)
2) It uses glTexSubImage2D for uploads rather than glTexImage2D (avoids possible recreating the texture on the GPU each frame)

The only difference left that I can spot would be that the new code does a glFinish call (required by the persistent buffer used by the 2d drawer) while the old code did not.

None of this makes any difference on my computer, but perhaps it does on those affected by the 50% performance regression.
dpJudas
 
 
 
Joined: 28 May 2016

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby Graf Zahl » Mon Jun 18, 2018 2:28 pm

dpJudas wrote:On the 'modern' branch...


Wouldn't this have been better on master for testing?
User avatar
Graf Zahl
Lead GZDoom Developer
 
Joined: 19 Jul 2003
Location: Germany

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby dpJudas » Mon Jun 18, 2018 2:38 pm

It would have been, but I accidentally coded it on the modern branch and then 'git stash pop' gave me enough merge conflicts to convince me I had to code it all over if I wanted it on the master branch.
dpJudas
 
 
 
Joined: 28 May 2016

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby Graf Zahl » Mon Jun 18, 2018 5:11 pm

The commit worked as-is when being cherry-picked. I don't think that stash is the best way of transferring stuff between branches. As soon as you got other changes in the affected files it will throw up its hands in despair, but a regular merge will do just fine.
User avatar
Graf Zahl
Lead GZDoom Developer
 
Joined: 19 Jul 2003
Location: Germany

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby Blzut3 » Mon Jun 18, 2018 8:24 pm

dpJudas wrote:I'm not really sure why the drawers are so high on the AMD 1080p - is that the Threadripper?

Might have been a transcription error. On Linux at least the drawers are 00.8ms and frame is 08.6ms. But yes that is the Threadripper.
dpJudas wrote:None of this makes any difference on my computer, but perhaps it does on those affected by the 50% performance regression.

Graf's cherry-pick commit 01bda6348ed346223a296a77bd4292f13c571693 does not fix the issue.
Blzut3
Pronounced: B-l-zut
 
Joined: 24 Nov 2004

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby drfrag » Tue Jun 19, 2018 4:10 am

Same here in the legacy branch, exactly same results @1024 (CRT). This is a MinGW debug build on GL 3.3 AMD hardware.

frame=30.2 ms walls=13.2 ms planes=06.4 ms masked=00.3 ms drawers=00.0 ms 552 counts
User avatar
drfrag
I.R developer, I.R smart
 
Joined: 23 Apr 2004
Location: Spain

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby Blzut3 » Sat Jul 14, 2018 6:50 pm

Found it!

Compare: https://github.com/coelckers/gzdoom/blo ... e.cpp#L299
To: https://github.com/coelckers/gzdoom/blo ... r.cpp#L252

Changing GL_DYNAMIC_DRAW to GL_STREAM_DRAW there makes master 20% faster than 3.3.2 on my Threadripper.
Blzut3
Pronounced: B-l-zut
 
Joined: 24 Nov 2004

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby dpJudas » Sat Jul 14, 2018 7:43 pm

Actually, it is this line that decides the buffer type for the frame buffer: https://github.com/coelckers/gzdoom/blob/dd5a673f9d93deeda9b67c3e5142ef7ba8cf8ada/src/gl/system/gl_swframebuffer.cpp#L1352

Maybe UseMappedMemBuffer is false for you in 3.3.2? The old code basically had two modes: one where GZDoom would create a system memory buffer itself and then memcpy it to a mapped pixel buffer object (GL_STREAM_DRAW), and then a mode where it maps the pixel buffer object and the software renderer would draw directly to it (GL_DYNAMIC_DRAW).

The OpenGL buffer allocation semantics are rather unclear, but it is my understanding that GL_STREAM_DRAW will generally create a GPU memory buffer, while GL_DYNAMIC_DRAW will create a system memory buffer. Reading from GPU memory is extremely expensive, so if I'm correct about this then you should be getting very poor performance now from maps with translucency (no vanilla map does this).
dpJudas
 
 
 
Joined: 28 May 2016

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby Blzut3 » Sat Jul 14, 2018 7:55 pm

OK, you're correct that the similar/better frame rate is just a coincidence. Forcing 3.3.2 to GL_STREAM_DRAW where you indicated results in 3.3.2 performing better again.

I'm not sure what constitutes a lot of tranlucency, but BFG balls covering the screen still perform better with STREAM than DYNAMIC.
Blzut3
Pronounced: B-l-zut
 
Joined: 24 Nov 2004

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby dpJudas » Sat Jul 14, 2018 8:23 pm

Translucency here is anything with actual alpha blending - i.e. if you can see partially through the BFG balls. In classic vanilla that wasn't the case, so it depends a bit on how you configured GZDoom.

It sounds like GL_DYNAMIC_DRAW doesn't work that reliably on some GPU's. That is, there's no knowing what the driver might decide to do. To be honest I'm not too surprised as I doubt that many applications use that mode. Generally those flags were just plain shit and they replaced the entire thing with glBufferStorage in OpenGL 4.4.

Maybe the best approach here is to use a system memory buffer if OpenGL is less than 4.4 and then memcpy it to a PBO created with GL_STREAM_DRAW. If it is 4.4 or later it will use glBufferStorage(GL_DYNAMIC_STORAGE_BIT|GL_MAP_READ_BIT|GL_MAP_WRITE_BIT|GL_CLIENT_STORAGE_BIT) + glInvalidateBufferData() + glMapBuffer(GL_READ_WRITE). That way we're at least sure what we will get.
dpJudas
 
 
 
Joined: 28 May 2016

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

Postby Blzut3 » Sat Jul 14, 2018 10:06 pm

Some notes: With DYNAMIC_DRAW if I disable the software renderer from doing any drawing (comment out SWRenderer->RenderView in SWSceneDrawer::RenderView along with the stuff after CreateTexture) things cap out at about 200fps, with STREAM_DRAW it caps at about 1700fps. Removing Renderer->RenderView from D_Display on 3.3.2 results in basically the same frame rates between the two for both modes.

After changing SWSceneDrawer::RenderView to give DCanvas a C++ allocated static array and disabling the call to MapBuffer the frame rate sits at 240fps which is what STREAM_DRAW gives. Given that 3.3.2 can hit 300fps with STREAM_DRAW I'm guessing this indicates a performance loss across versions in the software renderer itself as well (which was masked by hitting the OpenGL limit in stock 3.3.2), but still doesn't tell me where the original regression is coming from. Will need to dig more into that and see what the actual rendering limit is.

At this point though I'm not sure where to poke the way the buffer is handled between versions is too different to for someone with zero OpenGL knowledge like myself to really narrow it down to one specific thing.
dpJudas wrote:so it depends a bit on how you configured GZDoom.

GZDoom adds translucency to most projectiles stock so yes the BFG balls were translucent. I just wasn't sure if that was an adequate test case.
Blzut3
Pronounced: B-l-zut
 
Joined: 24 Nov 2004

PreviousNext

Return to Bugs

Who is online

Users browsing this forum: No registered users and 2 guests