QZDoom - ZDoom with True-Color (Version 1.3.0 released!)

Game Engines like EDGE, LZDoom, QZDoom, ECWolf, and others, go in this forum
Forum rules
The Projects forums are ONLY for YOUR PROJECTS! If you are asking questions about a project, either find that project's thread, or start a thread in the General section instead.

Got a cool project idea but nothing else? Put it in the project ideas thread instead!

Projects for any Doom-based engine are perfectly acceptable here too.

Please read the full rules for more details.
User avatar
Rachael
Posts: 13836
Joined: Tue Jan 13, 2004 1:31 pm
Preferred Pronouns: She/Her

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by Rachael »

Well - dpJudas has struck again.

In QZDoom's latest dev build, the bridge scene in Frozen Time is now playable in the software renderer (provided you have a decent enough processor).
dpJudas
 
 
Posts: 3145
Joined: Sat May 28, 2016 1:01 pm

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by dpJudas »

I'm only getting about 15-20 fps at the bridge (35 fps with r_scene_multithreading on). Technically playable, yes, but far from the rendering deadline. Sniff! Stupid map! :)
User avatar
Rachael
Posts: 13836
Joined: Tue Jan 13, 2004 1:31 pm
Preferred Pronouns: She/Her

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by Rachael »

It's still better than the 5-10 it was getting before. :)
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49188
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by Graf Zahl »

Gotta have to try. I was getting 12 fps there, which is virtually unplayable.

EDIT: The latest build runs at 20 fps, that's a nice speedup.

And just for fun: ZDoomGL renders that scene at 3 fps. Ouch! I'd really like to find out one day what makes that renderer tank this badly but that's not going to be easy to find.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49188
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by Graf Zahl »

One more thing: Have you ever compared performance of STL classes vs. ZDoom's own TArray and TMap? I generally found that ZDoom's are a bit better because they are not so convolutedly programmed as the STL.
dpJudas
 
 
Posts: 3145
Joined: Sat May 28, 2016 1:01 pm

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by dpJudas »

I haven't compared the performance, no. I'm using the STL classes out of laziness - I already know their performance characteristics, while I'm not 100% sure for TArray and TMap. In particular in this case I'm heavily relying on the containers building up a reserve so that allocations stop once they reached a natural large size. TArray and TMap probably does the same, but then I had to stop up and check.

By the way, if you enable r_scene_multithreaded it splits the screen into N segments and then runs the entire BSP walking N times on worker threads. I'm able to do this now because there are no globals left in the renderer except for camera light and viewport setup. What is show-stopping r_scene_multithreaded is that loading textures is not thread-safe and portals need to change the viewport variables. Still trying to figure out a good way to make the texture manager access thread safe, and for the viewport thing the problem is that ViewPos and friends in r_utility are globals also used by the GL renderer.

About ZDoomGL, that speed is the same as 3dge is getting on my computer. This is one evil map. :)
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49188
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by Graf Zahl »

dpJudas wrote:About ZDoomGL, that speed is the same as 3dge is getting on my computer. This is one evil map. :)
I'd like to know what kills performance so much on other port because to this day I haven't found out which part in GZDoom and PrBoom is so different that it avoids these stalls. I don't even sort textures anymore, but even that hardly had any impact on any system I ever tested this on. On the other hand, ZDoomGL stalls for large amounts of time when it tries to access the render state at the beginning of each frame, the more complex the map, the worse. The pure C++ performance is ok - it's not great but only twice as slow as GZDoom.

Neither side does anything that's an obvious showstopper and yet those other ports completely tank into low single digit FPS while GZDoom and PrBoom have little problems running that map at decent speeds.

About the texture manager, which part is a concern? I'd guess it's GetPixels and related things that can get concurrently accessed by different threads, isn't it? Wouldn't adding a mutex be the solution then, or do you need to protect some more things?

The viewpoint variables should probably be put into some variable that gets returned by R_SetupFrame instead of storing it globally. So far I didn't bother because of the software renderer and its overdependence on global variables, but once everything has been neatly put away this should be done as well, but I guess it may be better if both ports actually get merged before that so that it's easier to work on that stuff in the future. The current split doesn't make it easy for me, because I basically cannot do anything at all for the software renderer, e.g. implementing the Doom64 sector colors there which should be easy to do except for the interpolation option.

One other things I noticed while playing around with threads is that starting and ending threads while NVidia's GL driver is active can cause some bad performance degradation, easily nullifying all advantages. So if I ever add threads to the GL renderer they probably need to be kept for the lifetime of the program and not be created when needed and ended when rendering is done.
User avatar
Rachael
Posts: 13836
Joined: Tue Jan 13, 2004 1:31 pm
Preferred Pronouns: She/Her

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by Rachael »

@ Graf: Honestly, I really would like a merge at this point, because the divide is getting harder to maintain depending on the stuff that you ZScriptify. Would you like myself and dpJudas to certify a certain commit point that we believe you can safely merge into GZDoom, sans version.h changes? I think we can continue on QZDoom for now until the refactoring is completely done (although that may be a while), but having a narrower base would be immensely helpful to both you and I at this point, I think, especially with more oncoming ZScript changes.

Keep in mind, also, that you have access to our repo, so if you have time and you want to merge in GZDoom code and then make your changes to the software renderer and then merge back - that's fine.
dpJudas
 
 
Posts: 3145
Joined: Sat May 28, 2016 1:01 pm

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by dpJudas »

Graf Zahl wrote:Neither side does anything that's an obvious showstopper and yet those other ports completely tank into low single digit FPS while GZDoom and PrBoom have little problems running that map at decent speeds.
When I ran the Very Sleepy profiler on 3dge, it blamed the Nvidia driver. When I then looked at the actual code, their way of batching draws basically involved queuing one 'unit' per wall drawn. It then sorted those units by texture and state setup, and finally drew them with a glBegin(GL_QUAD) for each unit. So 500 walls would mean 500 times glBegin + glVertexAttribute * 4 * 3 + glEnd (total 7,000 OpenGL calls), plus checks between each wall/unit to see if the state setup changed. Add to that it does it on a subsector level, meaning more walls than GZDoom. Also, when I remarked the entire unit drawing code, it still drew the map, but it had just become completely black - maybe it draws the entire scene multiple times.

I think part of the explanation is that the overhead builds up. But you have much more experience with the fixed function pipeline than I do, so you know better than me how big the overhead of the glBegin family is. There's of course also always the possibility that their clipper is buggy somehow, making them draw much more than what is needed (a problem softpoly currently has). I noticed the 3dge node builder created some errors in the bridge - if it made errors like that in the castle itself, maybe it ended up drawing far more than GZDoom does.
Graf Zahl wrote:About the texture manager, which part is a concern? I'd guess it's GetPixels and related things that can get concurrently accessed by different threads, isn't it? Wouldn't adding a mutex be the solution then, or do you need to protect some more things?
It is the loading of textures that is the problem. Once the pixels have been loaded then the call to GetPixels is safe enough as all the threads only read from it. A mutex lock would do the trick, although ideally it would only attempt to make such a lock if it already concluded the texture is not loaded.
Graf Zahl wrote:The viewpoint variables should probably be put into some variable that gets returned by R_SetupFrame instead of storing it globally. So far I didn't bother because of the software renderer and its overdependence on global variables, but once everything has been neatly put away this should be done as well, but I guess it may be better if both ports actually get merged before that so that it's easier to work on that stuff in the future. The current split doesn't make it easy for me, because I basically cannot do anything at all for the software renderer, e.g. implementing the Doom64 sector colors there which should be easy to do except for the interpolation option.
Agree - it is better if I don't make changes to this part until after the merger. How/when do you suggest we do this? Main refactor work is more or less done, although there's of course always things that could be further improved. I think it is probably best we leave out the TC drawers for now until I find a better way to deal with the LLVM situation.
Graf Zahl wrote:One other things I noticed while playing around with threads is that starting and ending threads while NVidia's GL driver is active can cause some bad performance degradation, easily nullifying all advantages. So if I ever add threads to the GL renderer they probably need to be kept for the lifetime of the program and not be created when needed and ended when rendering is done.
The threads I'm using only gets launched once. They use a condition variable to then wait for the main thread to start the next batch of work handed to them. I only really stop them if the desired thread count changes (only happens if r_multithreaded is toggled on or off).
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49188
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by Graf Zahl »

dpJudas wrote:
Graf Zahl wrote:Neither side does anything that's an obvious showstopper and yet those other ports completely tank into low single digit FPS while GZDoom and PrBoom have little problems running that map at decent speeds.
When I ran the Very Sleepy profiler on 3dge, it blamed the Nvidia driver. When I then looked at the actual code, their way of batching draws basically involved queuing one 'unit' per wall drawn. It then sorted those units by texture and state setup, and finally drew them with a glBegin(GL_QUAD) for each unit. So 500 walls would mean 500 times glBegin + glVertexAttribute * 4 * 3 + glEnd (total 7,000 OpenGL calls), plus checks between each wall/unit to see if the state setup changed. Add to that it does it on a subsector level, meaning more walls than GZDoom.
Get an older GZDoom and it works exactly like that. Don't fall for the urban myth that immediate mode is the root of all evil, I got that debunked years ago. ;)
The things you mentioned were all once present, too, I gradually removed them over time, but if you grab an older GZDoom, version 1.3 or lower you'll have an engine that works mostly the same - glBegin/glEnd for each wall/subsector and still is a lot faster.
I think part of the explanation is that the overhead builds up. But you have much more experience with the fixed function pipeline than I do, so you know better than me how big the overhead of the glBegin family is. There's of course also always the possibility that their clipper is buggy somehow, making them draw much more than what is needed (a problem softpoly currently has). I noticed the 3dge node builder created some errors in the bridge - if it made errors like that in the castle itself, maybe it ended up drawing far more than GZDoom does.
The actual overhead is the function calls. It just takes a bit longer to get the vertex data into a buffer, but at no point it causes the frame rate to tank. It's just some gradual degradation, let's say 50 fps instead of 60 fps, because more code needs to be executed.
One thing I haven't checked yet is that GZDoom draws flats grouped by sector, i.e. it only sets up the material properties once per sector and then renders all visible subsectors of that sector. Doing this differently can actually make a difference on maps with heavily split sectors because it induces quite a bit more of state changes - but with the maps I tested this would at most be a factor of 3-4, not 20.
It is the loading of textures that is the problem. Once the pixels have been loaded then the call to GetPixels is safe enough as all the threads only read from it. A mutex lock would do the trick, although ideally it would only attempt to make such a lock if it already concluded the texture is not loaded.
Precisely what I thought. I don't know if it makes sense for the software renderer to abstract the texture manager like I did in the GL renderer where I only use it as a store for the raw resources, the actual texture data gets managed by other classes. Unfortunately, when Randi wrote this code it was singlemindedly geared toward the precise requirements of the software renderer so the GL additions may appear a bit awkward as a result.

Agree - it is better if I don't make changes to this part until after the merger. How/when do you suggest we do this? Main refactor work is more or less done, although there's of course always things that could be further improved. I think it is probably best we leave out the TC drawers for now until I find a better way to deal with the LLVM situation.
I think that LLVM is the major blocker for a full merge right now, how does this come along?
The threads I'm using only gets launched once. They use a condition variable to then wait for the main thread to start the next batch of work handed to them. I only really stop them if the desired thread count changes (only happens if r_multithreaded is toggled on or off).
Ok. I'll definitely investigate time here to see if this can be made useful in the GL renderer as well, once I have more time when ZScript is further along, a map like Frozen Time spends approx. 7 ms per frame in code that should be somewhat multithreadable. Maybe we can get it to run at more than 60fps on my system. If the renderer has more time to process this data it may also be easier to keep some of it preprocessed for quicker rendering.
dpJudas
 
 
Posts: 3145
Joined: Sat May 28, 2016 1:01 pm

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by dpJudas »

Graf Zahl wrote:Get an older GZDoom and it works exactly like that. Don't fall for the urban myth that immediate mode is the root of all evil, I got that debunked years ago. ;)
Okay, I'm not sure why those renderers perform as badly as they do then. :)

As for the myth, I think it comes from the fact that drawing static geometry lists from buffers already on the GPU really is much much faster. Back when the myth was born, people drew all their static meshes using those functions. The mistake people make (including myself) is to assume the speed difference is just as great between glBegin vs more modern dynamic mesh streaming methods.
Precisely what I thought. I don't know if it makes sense for the software renderer to abstract the texture manager like I did in the GL renderer where I only use it as a store for the raw resources, the actual texture data gets managed by other classes. Unfortunately, when Randi wrote this code it was singlemindedly geared toward the precise requirements of the software renderer so the GL additions may appear a bit awkward as a result.
Hmm, yes. Maybe it is a good idea to move as much of the texture handling into the software renderer, leaving the shared part to only manage the loading. Maybe if done right it could allow for async texture loading and sharing the upscalers.
I think that LLVM is the major blocker for a full merge right now, how does this come along?
I haven't really spent much time on it. I probably should dedicate my full attention to this problem. :)
Ok. I'll definitely investigate time here to see if this can be made useful in the GL renderer as well, once I have more time when ZScript is further along, a map like Frozen Time spends approx. 7 ms per frame in code that should be somewhat multithreadable. Maybe we can get it to run at more than 60fps on my system. If the renderer has more time to process this data it may also be easier to keep some of it preprocessed for quicker rendering.
Getting GZDoom to run it at always above 60fps would be really cool. That would mark mission accomplished for that map. :)
User avatar
Marisa the Magician
Posts: 3886
Joined: Fri Feb 08, 2008 9:15 am
Preferred Pronouns: She/Her
Operating System Version (Optional): (btw I use) Arch
Graphics Processor: nVidia with Vulkan support
Location: Vigo, Galicia

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by Marisa the Magician »

After building the latest QZDoom with Clang (because GCC throws an error *cough*), I decided to give that bridge a whirl myself. With my trusty i5-6400 and at 1080p, I get about 20 fps on truecolor, and 35 on paletted. r_scene_multithreaded is enabled, and also I built with -march=native -O3 because I wanted to squeeze this as hard as I could.

As a lil' bonus, in 320x200 paletted it gives me 110 FPS.

Considering how the bridge floors are made, it makes a lot of sense that it would spend a lot of time drawing there.
Gez
 
 
Posts: 17936
Joined: Fri Jul 06, 2007 3:22 pm

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by Gez »

Frozen Time is where I'd optimize by using a compatibility hack to replace the ten gazillion midtextures with a simple 3D floor. :p
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49188
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by Graf Zahl »

I once modified the map like that. FPS will go up from 20 to 24 with that, for the latest version.
User avatar
ibm5155
Posts: 1268
Joined: Wed Jul 20, 2011 4:24 pm

Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)

Post by ibm5155 »

Indeed replacing the fake bridge by a proper 3D floor one will save alot of fps...
Sad the last time I tried that I broke half the map with alot of missing sectors D:

EDIT: if you guys want to test some multithread with dynamic lights and zscript, I have an almost finished map with ehm more than 10.000 objects that Interact with other lights each tic :D

Return to “Game Engines”