[g3.5pre-44-g1455111dd] 50% performance regression since 3.3

Forum rules
Please don't bump threads here if you have a problem - it will often be forgotten about if you do. Instead, make a new thread here.

Post a reply

Smilies
:D :) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :wink: :geek: :ugeek: :!: :?: :idea: :arrow: :| :mrgreen: :3: :wub: >:( :blergh:
View more smilies

BBCode is OFF
Smilies are ON

Topic review
   

Expand view Topic review: [g3.5pre-44-g1455111dd] 50% performance regression since 3.3

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by Graf Zahl » Sat Jul 28, 2018 12:41 am

Blzut3 wrote:Confirmed that the Windows Intel regression is glFinish. So I think we can consider this closed?
I think so, yes.

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by Blzut3 » Sat Jul 21, 2018 2:28 pm

Confirmed that the Windows Intel regression is glFinish. So I think we can consider this closed?

Current summary:

On Linux with the open source Intel or AMD drivers should see double the frame rate from before. On Windows with AMD/ATi graphics should see similarly, although still regressed compared to D3D*. AMD performance is currently mostly the same between Windows and Linux with Linux slightly in the lead**. Intel on Windows and my 2009 Mac Mini with nvidia are still regressed, but likely nothing can be done about them.

* Gap can be closed by removing glFinish, but since that can't safely be done will likely have to wait for Vulkan to do anything about it.
** This should surprise no one considering the open source AMD driver has been exceeding the closed source performance for some time now even without per game optimizations.
Chris wrote:What if you use glFlush instead of glFinish?
If you're still curious for whatever reason glFlush doesn't drop my performance.

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by Graf Zahl » Sat Jul 21, 2018 10:05 am

Actually, no. glFlush does queue all pending operations but it doesn't do any synchronization. And the main vertex buffer needs to be synchronized with the rendered frame because its contents can change at any time and cause render glitches if the commands referencing the data haven't been processed.

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by Chris » Sat Jul 21, 2018 7:01 am

Blzut3 wrote:Tried removing the call to glFinish and that is indeed the last piece of the regression from 3.3.2 at least on Linux. Without them I get the same perf as 3.3.2.
What if you use glFlush instead of glFinish? IIRC, the main difference between them is that glFinish pushes out whatever commands have been buffered internally and waits for the card to complete them before returning, whereas glFlush just pushes out whatever commands have been buffered internally and does not wait any further. If glFinish was added for synchronization purposes, to ensure subsequent GL calls don't stomp all over previous calls where the GL allows command reordering, glFlush may serve the same purpose without having to wait as long.

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by Graf Zahl » Sat Jul 21, 2018 2:05 am

The 2D drawer does not use the persistent buffer. It collects all its data in hardware independent code and then does a single glBufferData upload before dispatching any draw calls.
The wipe code is a problem, though, because it actually does use the buffer - that one needs to be rewritten anyway sooner or later. It cannot stay the way it is.

The main issue with the persistent buffer is not that it's being reset (that could be changed if needed) but that there's only one copy of the flat data, and that gets updated live.

dpJudas wrote:
Graf Zahl wrote:Being able to do this with hardware rendering is actually the main reason for me to do a Vulkan backend. Are those glitches related to having to render those subsectors multiple times and this causing problems with how the software renderer orders things?
The order shouldn't change but each slice might skip some subsectors and segs (those out of view for a single slice). Each slice is literally like if you had rendered the scene with a very narrow FOV. It works 99% of the time, but there are a few places in kdizd I noticed that it did not. I never figured out exactly what is triggering it.
With the hardware renderer it'd have to work a bit differently anyway because of how view pitch is applied. Regarding KDiZD, was is something with the stacked sector portals? This was one of the first map sets using these and its use was quite haphazard. I actually don't think there's a single place that doesn't glitch if being looked at from the wrong angle in the software renderer, single sector portals excluded. It wouldn't surprise me the least if those won't work if being split up.

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by dpJudas » Sat Jul 21, 2018 1:57 am

Graf Zahl wrote:Being able to do this with hardware rendering is actually the main reason for me to do a Vulkan backend. Are those glitches related to having to render those subsectors multiple times and this causing problems with how the software renderer orders things?
The order shouldn't change but each slice might skip some subsectors and segs (those out of view for a single slice). Each slice is literally like if you had rendered the scene with a very narrow FOV. It works 99% of the time, but there are a few places in kdizd I noticed that it did not. I never figured out exactly what is triggering it.

About glFinish, it is required by the persistently mapped vertex buffer. The offset into it is reset each frame and if the GPU hasn't finished drawing its contents it will cause corruption. The old SWFB did its own 2D drawing, so it wasn't affected and didn't need glFinish. Personally I wouldn't bother to do anything about this - but it is good to know this was the source of the last speed difference.

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by Graf Zahl » Sat Jul 21, 2018 1:23 am

Blzut3 wrote: I mean a cvar would be nice for exploratory purposes, but I can understand why you might not want to do that. Compared to an unmodified 3.3.2 it looks like the only people that are losing any performance with the current state are Intel Windows users, and unfortunately Vulkan won't help most of them since even though Ivy Bridge-Broadwell are capable of it there appear to be no plans to bring it to Windows.
Sadly the state of Intel Windows drivers is utterly pathetic and this has a long history.

Intel GMA would theoretically have been compatible with OpenGL 2, yet Intel stopped updating their drivers at version 1.5, totally obsoleting this hardware long before its time
Intel HD3000 would be compatible with OpenGL 3.3, yet Intel stopped updating their drivers at the totally useless OpenGL 3.1, essentially bumping this hardware down to GL 2.1 because GL 3.1 is missing crucial features and the driver is too broken to properly use extensions.
And now the same again because they are just too lazy to give their hardware a few years of proper support?

Shame on you, Intel! If they actually had to compete in that market segment they'd already have lost.

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by Blzut3 » Sat Jul 21, 2018 1:02 am

I never implied that you added the glFinishes for no reason (I mean I can clearly see why they exist in the hardware renderer), but rather that it's a side effect of going from the separate GL software framebuffer to the unified HW/SW one (since these calls exist in the same place in 3.3.2). I can definitely say that 3.3.2 had no issues without glFinish because they're not there and with many of hours of play time I've had no issues. I can not confirm that there are no issues with the new 2D features, and you're probably right that there would be. You certainly know more than I do about the differences between the feature sets of the old gl_swframebuffer and gl_framebuffer.

Now what you or dpJudas want to do with this information is up to you. I'm happy either way since the other changes have brought performance to an acceptable level on the configuration I care most about. Presumably vulkan can more safely take it the rest of the way.

I mean a cvar would be nice for exploratory purposes, but I can understand why you might not want to do that. Compared to an unmodified 3.3.2 it looks like the only people that are losing any performance with the current state are Intel Windows users, and unfortunately Vulkan won't help most of them since even though Ivy Bridge-Broadwell are capable of it there appear to be no plans to bring it to Windows (I should try doing a custom build on my laptop to see if glFinish is indeed the culprit there).

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by Graf Zahl » Sat Jul 21, 2018 12:30 am

Are you sure? Keep in mind that the software renderer uses the hardware renderer's 2D drawer as its backing implementation. In any case it must be toggleable by the user if it gets disabled. Like I said, on my system completely removing those calls or placing them in the wrong place has very adverse effects - and that applies to both renderers!

Don't ever think I added all those weird things to that code for no reason. It all was necessary for my older Geforce 550 card. (On my current Geforce 1060 I could possibly strip it back down to what it was 5 or so years ago - the particular issue I tried to fix does not happen on that anymore.)

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by Blzut3 » Fri Jul 20, 2018 11:54 pm

Graf Zahl wrote:What glFinish call? Keep in mind that this cannot be removed unconditionally, because it may cause some glitches on other hardware. For example, on my system I need this call to get tearing-free display with a frame rate less than 60 fps but without making the engine drop to 30 fps right away.
The ones in gl_framebuffer.cpp. The hardware renderer does need them, but as far as I can tell the software renderer does not. At least I can say I extensively used 3.3.2 which didn't call glFinish and haven't experienced any issues there.

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by Graf Zahl » Fri Jul 20, 2018 11:47 pm

Blzut3 wrote:Tried removing the call to glFinish and that is indeed the last piece of the regression from 3.3.2 at least on Linux. Without them I get the same perf as 3.3.2.
What glFinish call? Keep in mind that this cannot be removed unconditionally, because it may cause some glitches on other hardware. For example, on my system I need this call to get tearing-free display with a frame rate less than 60 fps but without making the engine drop to 30 fps right away.

dpJudas wrote: It is a shame there are some rare cases where r_scene_multithreaded causes render glitches which prevents me from turning it on per default.

Being able to do this with hardware rendering is actually the main reason for me to do a Vulkan backend. Are those glitches related to having to render those subsectors multiple times and this causing problems with how the software renderer orders things?

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by dpJudas » Fri Jul 20, 2018 9:42 pm

Blzut3 wrote:Also fun side note, Without glFinish and enabling r_scene_multithreading my threadripper obtains 75% of OpenGL performance. Had to force the number of threads down to 12 through since 32 gives what I assume is false sharing and reduces performance. (And honestly the scaling mostly stops at 8 threads.) Even outperforms in some of the more complex scenes. 25% faster than OpenGL in Frozen Time for example. :P
That is pretty cool! I always wanted to know how a threadripper performed on this. :)

r_scene_multithreaded works by splitting the scene into field-of-view slices, one for each thread. Each slice of the full view traverses differently through the BSP and that allows it to spread the load across cores. The catch is that some subsectors and lines are seen by several slices. When that happens the cores do the same work. Sounds like 8-12 slices is where the shared subsectors begin to outnumber the gained speed.

It is a shame there are some rare cases where r_scene_multithreaded causes render glitches which prevents me from turning it on per default.

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by Blzut3 » Fri Jul 20, 2018 8:55 pm

dpJudas wrote:There is one other minor OpenGL difference since the older release: glFinish. The 3.5pre calls that while the old version did not. On Nvidia this has a price (*), but its not very big as I know the GL renderer can reach 1200 fps on my computer. But maybe it is different for AMD? What is the maximum fps you (blzut3) get from the GL renderer? 400 fps to 600 fps is just 0.9 ms, so it doesn't take much.
Tried removing the call to glFinish and that is indeed the last piece of the regression from 3.3.2 at least on Linux. Without them I get the same perf as 3.3.2.

Also fun side note, Without glFinish and enabling r_scene_multithreading my threadripper obtains 75% of OpenGL performance. Had to force the number of threads down to 12 through since 32 gives what I assume is false sharing and reduces performance. (And honestly the scaling mostly stops at 8 threads.) Even outperforms in some of the more complex scenes. 25% faster than OpenGL in Frozen Time for example. :P

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by drfrag » Fri Jul 20, 2018 6:37 am

The new version is faster again here (legacy branch on AMD GL 3.3), 60 vs 50 fps @1024.
As a side note on the g3.3mgw branch back to drawing directly to the PBO for intel (according to Blzut's report is faster). I'll try to port the new DSimpleCanvas thing there. Edit: that thing is not portable, i had some junk unversioned files after trying to cherry-pick something.

Re: [g3.5pre-44-g1455111dd] 50% performance regression since

by Graf Zahl » Fri Jul 20, 2018 12:47 am

dpJudas wrote:Seems the driver behavior is all over the map, depending on both vendor and platform.

Welcome to the wonderful world of OpenGL... :?
This is precisely why Vulkan requires explicit specification of nearly everything (what you called "micromanaging"...)

Top