Just an odd mention of GZDoom speed

Post by **Nash** » Tue Apr 17, 2012 3:12 pm

mhmh wrote:Cross-community tribalism does nobody any favours

Oh don't worry about it, there's plenty of bickering even between Doom communities.

Post by **Gez** » Tue Apr 17, 2012 3:30 pm

Heh. Welcome. I remember when Graf worked on a rewrite of the GL renderer to use VBOs and stuff, and the gain was negligible. Too much overhead from that new system due to the constant updates needed for the optimizations to really factor in. Besides, the real chokehold, as shown by the latest round of benchmark, is at the CPU level rather than the graphics hardware.

When you say you have code running ten times faster than the original, what original are you talking about? The 35-FPS capped Doom engine? Something entirely different from Doom? GZDoom itself? I don't think you mean the later since you implied you didn't want to bother setting up its dependencies.

At the moment, the fastest OpenGL port for Doom is GLBoom+ (which uses immediate mode, and works generally along the same principles as GZDoom does; GZDoom is a bit slower due to more editing features to take into account); and the slowest is Doomsday, which uses VBOs and such. Now I know Doomsday is still in the middle of a massive rewrite and its development team is keeping things unoptimized at the moment while they make sure that it works at all, so maybe once they finalize it and optimize everything it'll surprise everyone with unparalleled speed.

If you want a Doom port with minimal dependencies, then you can look at [wiki]Mocha Doom[/wiki]. Ah but it's Java rather than C/C++.

Really, ZDoom does not have a lot of external dependencies. Most of the libraries it uses are internal; that's why bzip2, lzmalib, game-music-emu etc. are projects part of the solution. The only one that is and remain external is [wiki]FMOD Ex[/wiki] because due to its closed-source nature there's pretty much no way around it.

Post by **Enjay** » Tue Apr 17, 2012 3:43 pm

mhmh wrote:My objective was to find a Doom engine that I could then use as a baseline for further development work. Because I like a lightweight development environment, where I can just download a bunch of source code, compile, and run, additional external dependencies (and here I mean over and above standard OS and 3D API dependencies) are a total pain. Because I do all of my development work in a debugger, they can be even worse - I have downloaded engine sources in the past that do not compile and/or that cannot be run in a debugger on account of that. Additional features over and above those of the base game are a total pain. Every such additional feature is more porting work, more coding, and more incompatibilities while building stuff.

My comments were made without that perspective being apparent to me. I apologise for anything I said based on my assumptions from what I read here. Basing comments on a poorly informed opinion and assumptions is a bad thing to do and I did it.

There have, of course, been a number of projects based off of the ZDoom source and some have been quite successful in fulfilling their goals. However, as an outsider as far as coding an engine goes, I'm not in a position to be able say how closely those other projects match your development needs or environment.

Post by **Graf Zahl** » Tue Apr 17, 2012 4:32 pm

mhmh wrote: Regarding VBOs, I know what I'm talking about here and have the code to prove it. Running 10 times faster than the original code in heavy stress situations is not to be sniffed at.

You also need the data in a way that allows it.

The big problem with Doom is that

1) geometry is not static (moving platform affect all connected walls), so using static vertex buffers won't get you far
2) batches are extremely small (rarely more than 10 vertices)
3) the lighting model is not easily baked into a vertex so it has to be done by other means and you still got the state changes that slow it down.
4) dynamic lights completely nullify any hypothetic advantage of VBOs because they are not a natural part of the world that's being rendered.

I once made a renderer using vertex buffers only and trying to merge batches to a certain degree. The improvement was precisely: zero! In fact, the first attempt was significantly slower. I only got it back to zero after taking some stuff out and using the old fashioned methods to set it (like glColor, for example) After that I saw no point experimenting further with it.

The only thing in GZDoom where VBOs provide a significant performance boost are voxel models - because there they allow rendering the entire thing with one single command instead of hundreds of small polygons.
But for the rest, with batches that average 10 vertices, VBOs will never be able to show their strengths.

You still need some very large levels to get into problems though.

mhmh · Post by **mhmh** » Tue Apr 17, 2012 8:42 pm

It's interesting because John Carmack doubled the speed of PRBoom with his iPhone port. I've had a look at his code and all he's doing is client-side arrays, using GL_TRIANGLES with indexes. (This could be an iPhone specific gain though.) I've also almost doubled the speed of Quake 2 in the general case with an almost-pure GL 3.x-ish port, although in fairness that does place more stress on the GPU so it's more likely to benefit from it.

The lighting model I'm thinking could potentially be handled by a shader; that's technically interesting if nothing else. In general to make VBOs really work with any kind of non-static vertex data you really do need shaders, otherwise you'll lose the perf gain due to having to re-up your data each frame.

Batch sizes could be greatly increased by using texture arrays for all common-sized textures. Most of them are 64x64 so this is definitely viable. The model I use is static VBO plus dynamic indexes, using GL_ARB_map_buffer_range on the index buffer to avoid CPU/GPU sync overhead, and primitive restart to concat multiple surfaces into a single glDraw(Range)Elements call. That works very well with everything I've used it on so far, and I don't really see it not working with Doom.

Anyway, I've found Bruce Lewis' GLDoom code (or more like Bruce found it and posted it on his site) so that's one potential base to work from. I'm aware that it's not perfect, but it is closer to the id original which suits my requirement, and may give me a good batch of code that I could hopefully gift back to the community (for other engines to pick up if their authors so wish).

Post by **Graf Zahl** » Wed Apr 18, 2012 1:58 am

mhmh wrote:It's interesting because John Carmack doubled the speed of PRBoom with his iPhone port. I've had a look at his code and all he's doing is client-side arrays, using GL_TRIANGLES with indexes. (This could be an iPhone specific gain though.) I've also almost doubled the speed of Quake 2 in the general case with an almost-pure GL 3.x-ish port, although in fairness that does place more stress on the GPU so it's more likely to benefit from it.

Can't say much about the iPhone - in particular because it's using OpenGL ES. Of course, being tied to specific hardware, it can be aggressively optimized to that particular hardware.
Quake 2 doesn't really surprise me. Quake's level, unlike Doom, is a static mesh that can be perfectly transformed into vertex buffers and won't require any changes afterward. It also uses a lighting model that allows for much simpler vertex representations. Doom's strange lighting that requires changes to light values, fog values, distance values requires a lot more information.

mhmh wrote: The lighting model I'm thinking could potentially be handled by a shader; that's technically interesting if nothing else. In general to make VBOs really work with any kind of non-static vertex data you really do need shaders, otherwise you'll lose the perf gain due to having to re-up your data each frame.

I am using shaders on modern cards. But of course the all important question is, what hardware to target. GZDoom is still supposed to work with extension-less GL 1.5

mhmh wrote: Batch sizes could be greatly increased by using texture arrays for all common-sized textures. Most of them are 64x64 so this is definitely viable. The model I use is static VBO plus dynamic indexes, using GL_ARB_map_buffer_range on the index buffer to avoid CPU/GPU sync overhead, and primitive restart to concat multiple surfaces into a single glDraw(Range)Elements call. That works very well with everything I've used it on so far, and I don't really see it not working with Doom.

You are wrong about texture sizes. 64x64 is only common for flats. For walls there's 128x64, 128x128, 128x256 and many other sizes which occur less frequently. I have thought of this as well, but in the end I had to ask one important question: When do these optimizations start to become useful.

Let's face it. I have a 5 year old computer with a Geforce 8600 which was already a bit underpowered when I bought it and with this system I either have to use levels with extensive use of dynamic lights or portals - or some insanely large and complex views to experience any kind of slowdown below my monitor's refresh rate of 60 Hz. And any optimization I can think of will only speed up 'normal' geometry without effects - which is precisely the area where I don't experience performance issues.

Here's a benchmark of the most demanding effect-less level I know, Sunder's MAP10:

Code: Select all

Map map10: "The Hags Finger",
x = -7085.0000, y = -902.0000, z = -659.0000, angle = 0.0000, pitch = 0.0000
Walls: 8235 (0 splits, 0 t-splits, 27272 vertices)
Flats: 390 (3081 primitives, 15919 vertices)
Sprites: 3394, Decals=0, Portals: 1
W: Render=4.670, Split = 0.000, Setup=2.371, Clip=3.852
F: Render=0.523, Setup=0.135
S: Render=2.027, Setup=1.490
All=21.277, Render=10.532, Setup=9.666, BSP = 1.813, Portal=0.197, Finish=1.013
DLight - Walls: 0 processed, 0 rendered - Flats: 0 processed, 0 rendered
Missing textures: 0 upper, 2 lower, 0.003 ms
43 fps

Let's break down this a bit.
Rendering one frame takes 21.277 ms.
9.666 ms of these are labeled 'setup'. This is mostly visibility clipping but also stuff like creating the render lists, all pure CPU work. The same applies to the 1.813 ms labeled as BSP. This is pure BSP traversal time. So any of this can't be sped up by changing the renderer anyway.
Portal and Finish mean times where the engine is stalled on a GPU task and has to wait.
Render is 10.532 ms, which breaks down info 4.670 ms for walls, 0.523 ms for flats and 2.027 ms for sprites, the rest on purely CPU bound tasks. Speeding up the sprite handling is completely impossible because it's fully dynamic data that needs to be recalculated each frame. So what does that leave us with to optimize? Realistically, only the 4.670 ms used for rendering the walls. And even here half the time is spent processing the render lists, preparing textures and stuff like that. The actual rendering of the walls is maybe 2 ms - and that's effectively the only time you can improve by using different approaches to rendering. The rest is all CPU.
Please keep in mind that this is on a 5 year old, relatively underpowered system. On more modern systems the ratios may differ - but they'd also be faster which makes the need for optimization even less.

With dynamic lights which are processed per surface all this batching stuff won't do anything at all, portals are limited by GPU stalls on the stencil buffer and large and complex maps are mostly limited by the CPU, not the GPU. So if I add more processing to reduce GPU load it only ups the CPU time, increasing the main bottleneck on these levels.

It sure may be improvable - but I think that'd require a completely new renderer that's designed from the ground up for these optimizations. Definitely not something I want to do.

BTW, flats can be rendered with vertex buffers - but the performance increase is minimal on NVidia and on AMD it makes things even slow down. Apparently AMD can't handle switching between immediate mode and vertex buffers without some significant overhead.

mhmh · Post by **mhmh** » Wed Apr 18, 2012 6:34 pm

Interesting figures. I'm going to see about the GLDoom source, try compile it and take things from there. I'll probably post a few thoughts on the code depending on how things work out.

The kind of code I write is 100% vertex buffers, by the way. There's no immediate mode so there's no switching.

It's worth noting that with Quake you actually don't have a 100% static mesh for everything. Water and sky textures need animation, every modern engine has frame interpolation for models, meaning that their vertexes can change every frame, brush models move (and can be replicated across multiple instances), there's heavy BSP traversal, frustum culling and PVS chop out a lot of surfaces, and the end result needs to be built up into draw lists (which again can change every frame). There are also particles, sprites, lightmap uploads. So you have geometry data that is constantly changing and moving, but yet it can be done - this is a solved problem.

Post by **Graf Zahl** » Thu Apr 19, 2012 12:47 am

Of course - but the stuff that poses the biggest problem in Doom - rendering the walls - doesn't suffer from the issues that cost most of the time. If all walls were static things would be a lot easier and the time spent on them would be closer to what GZDoom spends on processing flats which - as you can see in my numbers, is significantly less - mostly due to the use of vertex buffers. The same technique doesn't work well for Doom walls though because they could move. And I also have to make sure that at each 2D vertex position the various edges of the adjoining walls match up by inserting some intermediate vertices where other planes meet at this vertex. This can be switched off by setting 'Rendering quality' from 'Quality' to 'Speed' but in return you get the infamous flashing white dots where there's gaps between the polygons.

Support of ZDoom's features also doesn't really help here because it's a lot of special cases that need to be handled and also slow things down. I guess when you try to do a plain vanilla renderer as needed for GLDoom, things will be a lot simpler due to the reduced feature set.

The moving parts in Quake don't affect them in the same way they do in Doom.

Post by **Graf Zahl** » Thu May 10, 2012 9:23 am

JUst have to add this:

I just got my new computer (Core i7, 3.4 GHz, Geforce GTX550Ti) and did some speed tests.
Of all the maps I tested with this there were exactly 2 where the frame rate dropped below 60 fps at some point:

- Sunder MAP10 after all hell broke loose - there's just too many sprites flying around eventually
- Super Sonic Doom MAP10 when flying high above the fortress (bad portal use resulting in 120 portals simulatenously in view)

Aside from that it was really hard to find stuff that made it drop below 100. Most of the problem maps on my old system were almost twice as fast on this one.

(So much for 'Immediate mode is inherently slow'.)

ZDoom

Just an odd mention of GZDoom speed

Re: Just an odd mention of GZDoom speed

Re: Just an odd mention of GZDoom speed

Re: Just an odd mention of GZDoom speed

Re: Just an odd mention of GZDoom speed

Re: Just an odd mention of GZDoom speed

Re: Just an odd mention of GZDoom speed

Re: Just an odd mention of GZDoom speed

Re: Just an odd mention of GZDoom speed

Re: Just an odd mention of GZDoom speed