I added a CVAR to enable different render methods with buffers to see which one is fastest on different hardware.
Like last time there's the same WADs - the voxel mod has been changed so that the savegame can load.
I need the following tests:
Frozen Time
PAR without lights
PAR with lights
Voxel test map
with both the latests official version or dev build, and this test build. The test build has a new CVAR, called gl_rendermethod. This can have values from 0-3. I need to have this test run with all 4 values.
Last but not least: If your graphics driver reports OpenGL version 4.4, don't bother to run the test. I know it's fine on modern hardware, I just need some info what to optimize in the fallback code for older GPUs.
- immediate mode (obviously not on MacOSX due to lack of compatibility profile)
- uniform arrays (that's what the last test build had)
- buffer uploads for each draw call
- map/unmap buffer for each draw call
Spoiler: Original message
I need some info how my recent rewrite to use buffers for rendering the level in GZDoom performs on different hardware.
I have prepared a package (attached to this post) that contains:
- a recent GZDoom build with all buffer based features
- the map 'Frozen Time by Alexander "Eternal" S. which currently is the most demanding map for testing renderer performance.
- a savegame.
What I need:
- Start GZDoom with the map
- bind the 'bench' CCMD to a key
- type 'gl_usevbo 1' in the console
- load the savegame
- press the 'bench' key
- type 'gl_usevbo 0' in the console
- press the 'bench' key
- type 'gl_usevbo 1' in the console
- press the 'bench' key
This will run 3 rounds of benchmarks, one with vertex buffers on, one with vertex buffers off, and for verification again one with vertex buffers on so I can see if switching between buffers and immediate mode rendering somehow affects performance.
Post this info, along with your graphics hardware, CPU and the reported OpenGL version.
Cannot test with Intel GMA X4500HD on the same machine because of crash after OpenGL initialization, despite reported OpenGL 2.1 Missing extension maybe... But their drivers are crappy, I know.
P.S. glew32.dll is missing from the package, so I took it from the latest build on DRD.
Definitely not great figures and setting gl_usevbo to 1 definitely seems to cause quite a hit. I guess my system is falling behind the curve quite a bit these days.
So far everything as expected. GL 4.x capable NVidia cards are quite a bit better with buffers and older ones take a huge hit. For the curious ones: GL 4 implements a much more efficient method to update buffers and it clearly shows in these numbers.
Using the AMD HD8870m, AMD Catalyst 14.4 beta drivers (04/25/2014, 14.100.0.0000).
opengl (3.3.12874 acording to gzdoom), OpenGL 4.3 support (acording to this)
One interesting thing.
At the titlepic part, on AMD gpu the titlepic image,menus, console are not rendered, but the mouse is, when I start the game I can see the menu and the console :S
And the same problem didn't happened on Intel GPU >.>
Should I start the loadgame and then start the bench or start the first bench and then load the game ?
EDIT: People are making wrong benchmarks, and they should use the same resolution...
Like I did, I used the main resolution when you start gzdoom for the first time (640x480)
EDIT2: when gl_usevbo is 0, the drawcalls from AMD and Intel goes to 0 , and when it's 1 AMD make 9000 draw calls while intel make 70000 draw calls
Last edited by ibm5155 on Sat Jun 14, 2014 11:46 am, edited 7 times in total.
OS: Windows 8.1 Update 1
CPU: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
GPU: AMD Radeon HD 7670M
GL Version: 3.3.12420
Resolution: 1366 x 768 (fullscreen) -- this is my max resolution
I ran into a couple of issues while running the build, like a black screen on load, with only the cursor being visible and some weirdness with the console.
ibm5155 wrote:
EDIT2: when gl_usevbo is 0, the drawcalls from AMD and Intel goes to 0 , and when it's 1 AMD make 9000 draw calls while intel make 70000 draw calls
Drawcalls doesn't measure the immediate mode code. I just wanted to know how much time the driver needs for the glDrawArrays calls.
It's interesting, though. AMD wastes tons of time on issuing draw calls - on NVidia it's completely irrelevant.
This means for speeding up things, AMD needs entirely different optimizations. What speeds up NVidia doesn't do a thing on AMD and what helps on AMD only increases CPU load on NVidia.
And holy crap - that Intel HD4000 is truly a piece of garbage...
Isn't it related to the "16 execution units" and four pixel pipelines?
And why is it a crap? :s like
the NVidia GTX 295 x2 got 9fps with gl_usevbo 0 while the hd4000 got 15
ibm5155 wrote:
the NVidia GTX 295 x2 got 9fps with gl_usevbo 0 while the hd4000 got 15
I see 9 fps with buffers and 33 fps without - that's quite a bit faster. The buffer code is not meant to be used on GL 3.x hardware, I just wanted to know how badly they actually fare.
On GL 4 I can use a persistent buffer mapping, allowing me quick'n easy updates. For GL 3.x I have to constantly map and unmap the buffer which is quite the performance killer and this certainly won't make it into production code.
I have to admit, I'm a bit disappointed with AMD's numbers. Since everyone is bragging about reducing driver overhead recently - with AMD even leading to a new API - I would have expected them to actually work on this - but still a third of the entire time is spend issuing draw calls. With NVidia it's barely 2% on GL 4.x hardware.
But I'll be blunt: This optimization will have to wait until I can afford releasing a GL 4.x only version and remove all the backwards compatibility cruft for good. Maybe AMD surprises us with a better driver in the mean time.
_mental_ wrote:
Cannot test with Intel GMA X4500HD on the same machine because of crash after OpenGL initialization, despite reported OpenGL 2.1 Missing extension maybe... But their drivers are crappy, I know.
I suspect it runs into a function meant for newer GL versions that isn't checked for in the code. GLEW isn't particularly helpful finding such occurences and unfortunately I can't test this because I have no idea how I can restrict GLEW to only retrieve functions for specific GL versions and extensions.
I'm afraid but I'm going to need some help here from somebody who can run this in the debugger on such old hardware. You got to checkout the Glew_Version_For_Real branch for that.