QZDoom - ZDoom with True-Color (Version 1.3.0 released!)
Forum rules
The Projects forums are ONLY for YOUR PROJECTS! If you are asking questions about a project, either find that project's thread, or start a thread in the General section instead.
Got a cool project idea but nothing else? Put it in the project ideas thread instead!
Projects for any Doom-based engine are perfectly acceptable here too.
Please read the full rules for more details.
The Projects forums are ONLY for YOUR PROJECTS! If you are asking questions about a project, either find that project's thread, or start a thread in the General section instead.
Got a cool project idea but nothing else? Put it in the project ideas thread instead!
Projects for any Doom-based engine are perfectly acceptable here too.
Please read the full rules for more details.
-
- Posts: 13836
- Joined: Tue Jan 13, 2004 1:31 pm
- Preferred Pronouns: She/Her
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
Well - dpJudas has struck again.
In QZDoom's latest dev build, the bridge scene in Frozen Time is now playable in the software renderer (provided you have a decent enough processor).
In QZDoom's latest dev build, the bridge scene in Frozen Time is now playable in the software renderer (provided you have a decent enough processor).
-
-
- Posts: 3145
- Joined: Sat May 28, 2016 1:01 pm
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
I'm only getting about 15-20 fps at the bridge (35 fps with r_scene_multithreading on). Technically playable, yes, but far from the rendering deadline. Sniff! Stupid map!
-
- Posts: 13836
- Joined: Tue Jan 13, 2004 1:31 pm
- Preferred Pronouns: She/Her
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
It's still better than the 5-10 it was getting before.
-
- Lead GZDoom+Raze Developer
- Posts: 49188
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
Gotta have to try. I was getting 12 fps there, which is virtually unplayable.
EDIT: The latest build runs at 20 fps, that's a nice speedup.
And just for fun: ZDoomGL renders that scene at 3 fps. Ouch! I'd really like to find out one day what makes that renderer tank this badly but that's not going to be easy to find.
EDIT: The latest build runs at 20 fps, that's a nice speedup.
And just for fun: ZDoomGL renders that scene at 3 fps. Ouch! I'd really like to find out one day what makes that renderer tank this badly but that's not going to be easy to find.
-
- Lead GZDoom+Raze Developer
- Posts: 49188
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
One more thing: Have you ever compared performance of STL classes vs. ZDoom's own TArray and TMap? I generally found that ZDoom's are a bit better because they are not so convolutedly programmed as the STL.
-
-
- Posts: 3145
- Joined: Sat May 28, 2016 1:01 pm
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
I haven't compared the performance, no. I'm using the STL classes out of laziness - I already know their performance characteristics, while I'm not 100% sure for TArray and TMap. In particular in this case I'm heavily relying on the containers building up a reserve so that allocations stop once they reached a natural large size. TArray and TMap probably does the same, but then I had to stop up and check.
By the way, if you enable r_scene_multithreaded it splits the screen into N segments and then runs the entire BSP walking N times on worker threads. I'm able to do this now because there are no globals left in the renderer except for camera light and viewport setup. What is show-stopping r_scene_multithreaded is that loading textures is not thread-safe and portals need to change the viewport variables. Still trying to figure out a good way to make the texture manager access thread safe, and for the viewport thing the problem is that ViewPos and friends in r_utility are globals also used by the GL renderer.
About ZDoomGL, that speed is the same as 3dge is getting on my computer. This is one evil map.
By the way, if you enable r_scene_multithreaded it splits the screen into N segments and then runs the entire BSP walking N times on worker threads. I'm able to do this now because there are no globals left in the renderer except for camera light and viewport setup. What is show-stopping r_scene_multithreaded is that loading textures is not thread-safe and portals need to change the viewport variables. Still trying to figure out a good way to make the texture manager access thread safe, and for the viewport thing the problem is that ViewPos and friends in r_utility are globals also used by the GL renderer.
About ZDoomGL, that speed is the same as 3dge is getting on my computer. This is one evil map.
-
- Lead GZDoom+Raze Developer
- Posts: 49188
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
I'd like to know what kills performance so much on other port because to this day I haven't found out which part in GZDoom and PrBoom is so different that it avoids these stalls. I don't even sort textures anymore, but even that hardly had any impact on any system I ever tested this on. On the other hand, ZDoomGL stalls for large amounts of time when it tries to access the render state at the beginning of each frame, the more complex the map, the worse. The pure C++ performance is ok - it's not great but only twice as slow as GZDoom.dpJudas wrote:About ZDoomGL, that speed is the same as 3dge is getting on my computer. This is one evil map.
Neither side does anything that's an obvious showstopper and yet those other ports completely tank into low single digit FPS while GZDoom and PrBoom have little problems running that map at decent speeds.
About the texture manager, which part is a concern? I'd guess it's GetPixels and related things that can get concurrently accessed by different threads, isn't it? Wouldn't adding a mutex be the solution then, or do you need to protect some more things?
The viewpoint variables should probably be put into some variable that gets returned by R_SetupFrame instead of storing it globally. So far I didn't bother because of the software renderer and its overdependence on global variables, but once everything has been neatly put away this should be done as well, but I guess it may be better if both ports actually get merged before that so that it's easier to work on that stuff in the future. The current split doesn't make it easy for me, because I basically cannot do anything at all for the software renderer, e.g. implementing the Doom64 sector colors there which should be easy to do except for the interpolation option.
One other things I noticed while playing around with threads is that starting and ending threads while NVidia's GL driver is active can cause some bad performance degradation, easily nullifying all advantages. So if I ever add threads to the GL renderer they probably need to be kept for the lifetime of the program and not be created when needed and ended when rendering is done.
-
- Posts: 13836
- Joined: Tue Jan 13, 2004 1:31 pm
- Preferred Pronouns: She/Her
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
@ Graf: Honestly, I really would like a merge at this point, because the divide is getting harder to maintain depending on the stuff that you ZScriptify. Would you like myself and dpJudas to certify a certain commit point that we believe you can safely merge into GZDoom, sans version.h changes? I think we can continue on QZDoom for now until the refactoring is completely done (although that may be a while), but having a narrower base would be immensely helpful to both you and I at this point, I think, especially with more oncoming ZScript changes.
Keep in mind, also, that you have access to our repo, so if you have time and you want to merge in GZDoom code and then make your changes to the software renderer and then merge back - that's fine.
Keep in mind, also, that you have access to our repo, so if you have time and you want to merge in GZDoom code and then make your changes to the software renderer and then merge back - that's fine.
-
-
- Posts: 3145
- Joined: Sat May 28, 2016 1:01 pm
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
When I ran the Very Sleepy profiler on 3dge, it blamed the Nvidia driver. When I then looked at the actual code, their way of batching draws basically involved queuing one 'unit' per wall drawn. It then sorted those units by texture and state setup, and finally drew them with a glBegin(GL_QUAD) for each unit. So 500 walls would mean 500 times glBegin + glVertexAttribute * 4 * 3 + glEnd (total 7,000 OpenGL calls), plus checks between each wall/unit to see if the state setup changed. Add to that it does it on a subsector level, meaning more walls than GZDoom. Also, when I remarked the entire unit drawing code, it still drew the map, but it had just become completely black - maybe it draws the entire scene multiple times.Graf Zahl wrote:Neither side does anything that's an obvious showstopper and yet those other ports completely tank into low single digit FPS while GZDoom and PrBoom have little problems running that map at decent speeds.
I think part of the explanation is that the overhead builds up. But you have much more experience with the fixed function pipeline than I do, so you know better than me how big the overhead of the glBegin family is. There's of course also always the possibility that their clipper is buggy somehow, making them draw much more than what is needed (a problem softpoly currently has). I noticed the 3dge node builder created some errors in the bridge - if it made errors like that in the castle itself, maybe it ended up drawing far more than GZDoom does.
It is the loading of textures that is the problem. Once the pixels have been loaded then the call to GetPixels is safe enough as all the threads only read from it. A mutex lock would do the trick, although ideally it would only attempt to make such a lock if it already concluded the texture is not loaded.Graf Zahl wrote:About the texture manager, which part is a concern? I'd guess it's GetPixels and related things that can get concurrently accessed by different threads, isn't it? Wouldn't adding a mutex be the solution then, or do you need to protect some more things?
Agree - it is better if I don't make changes to this part until after the merger. How/when do you suggest we do this? Main refactor work is more or less done, although there's of course always things that could be further improved. I think it is probably best we leave out the TC drawers for now until I find a better way to deal with the LLVM situation.Graf Zahl wrote:The viewpoint variables should probably be put into some variable that gets returned by R_SetupFrame instead of storing it globally. So far I didn't bother because of the software renderer and its overdependence on global variables, but once everything has been neatly put away this should be done as well, but I guess it may be better if both ports actually get merged before that so that it's easier to work on that stuff in the future. The current split doesn't make it easy for me, because I basically cannot do anything at all for the software renderer, e.g. implementing the Doom64 sector colors there which should be easy to do except for the interpolation option.
The threads I'm using only gets launched once. They use a condition variable to then wait for the main thread to start the next batch of work handed to them. I only really stop them if the desired thread count changes (only happens if r_multithreaded is toggled on or off).Graf Zahl wrote:One other things I noticed while playing around with threads is that starting and ending threads while NVidia's GL driver is active can cause some bad performance degradation, easily nullifying all advantages. So if I ever add threads to the GL renderer they probably need to be kept for the lifetime of the program and not be created when needed and ended when rendering is done.
-
- Lead GZDoom+Raze Developer
- Posts: 49188
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
Get an older GZDoom and it works exactly like that. Don't fall for the urban myth that immediate mode is the root of all evil, I got that debunked years ago.dpJudas wrote:When I ran the Very Sleepy profiler on 3dge, it blamed the Nvidia driver. When I then looked at the actual code, their way of batching draws basically involved queuing one 'unit' per wall drawn. It then sorted those units by texture and state setup, and finally drew them with a glBegin(GL_QUAD) for each unit. So 500 walls would mean 500 times glBegin + glVertexAttribute * 4 * 3 + glEnd (total 7,000 OpenGL calls), plus checks between each wall/unit to see if the state setup changed. Add to that it does it on a subsector level, meaning more walls than GZDoom.Graf Zahl wrote:Neither side does anything that's an obvious showstopper and yet those other ports completely tank into low single digit FPS while GZDoom and PrBoom have little problems running that map at decent speeds.
The things you mentioned were all once present, too, I gradually removed them over time, but if you grab an older GZDoom, version 1.3 or lower you'll have an engine that works mostly the same - glBegin/glEnd for each wall/subsector and still is a lot faster.
The actual overhead is the function calls. It just takes a bit longer to get the vertex data into a buffer, but at no point it causes the frame rate to tank. It's just some gradual degradation, let's say 50 fps instead of 60 fps, because more code needs to be executed.I think part of the explanation is that the overhead builds up. But you have much more experience with the fixed function pipeline than I do, so you know better than me how big the overhead of the glBegin family is. There's of course also always the possibility that their clipper is buggy somehow, making them draw much more than what is needed (a problem softpoly currently has). I noticed the 3dge node builder created some errors in the bridge - if it made errors like that in the castle itself, maybe it ended up drawing far more than GZDoom does.
One thing I haven't checked yet is that GZDoom draws flats grouped by sector, i.e. it only sets up the material properties once per sector and then renders all visible subsectors of that sector. Doing this differently can actually make a difference on maps with heavily split sectors because it induces quite a bit more of state changes - but with the maps I tested this would at most be a factor of 3-4, not 20.
Precisely what I thought. I don't know if it makes sense for the software renderer to abstract the texture manager like I did in the GL renderer where I only use it as a store for the raw resources, the actual texture data gets managed by other classes. Unfortunately, when Randi wrote this code it was singlemindedly geared toward the precise requirements of the software renderer so the GL additions may appear a bit awkward as a result.It is the loading of textures that is the problem. Once the pixels have been loaded then the call to GetPixels is safe enough as all the threads only read from it. A mutex lock would do the trick, although ideally it would only attempt to make such a lock if it already concluded the texture is not loaded.
I think that LLVM is the major blocker for a full merge right now, how does this come along?Agree - it is better if I don't make changes to this part until after the merger. How/when do you suggest we do this? Main refactor work is more or less done, although there's of course always things that could be further improved. I think it is probably best we leave out the TC drawers for now until I find a better way to deal with the LLVM situation.
Ok. I'll definitely investigate time here to see if this can be made useful in the GL renderer as well, once I have more time when ZScript is further along, a map like Frozen Time spends approx. 7 ms per frame in code that should be somewhat multithreadable. Maybe we can get it to run at more than 60fps on my system. If the renderer has more time to process this data it may also be easier to keep some of it preprocessed for quicker rendering.The threads I'm using only gets launched once. They use a condition variable to then wait for the main thread to start the next batch of work handed to them. I only really stop them if the desired thread count changes (only happens if r_multithreaded is toggled on or off).
-
-
- Posts: 3145
- Joined: Sat May 28, 2016 1:01 pm
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
Okay, I'm not sure why those renderers perform as badly as they do then.Graf Zahl wrote:Get an older GZDoom and it works exactly like that. Don't fall for the urban myth that immediate mode is the root of all evil, I got that debunked years ago.
As for the myth, I think it comes from the fact that drawing static geometry lists from buffers already on the GPU really is much much faster. Back when the myth was born, people drew all their static meshes using those functions. The mistake people make (including myself) is to assume the speed difference is just as great between glBegin vs more modern dynamic mesh streaming methods.
Hmm, yes. Maybe it is a good idea to move as much of the texture handling into the software renderer, leaving the shared part to only manage the loading. Maybe if done right it could allow for async texture loading and sharing the upscalers.Precisely what I thought. I don't know if it makes sense for the software renderer to abstract the texture manager like I did in the GL renderer where I only use it as a store for the raw resources, the actual texture data gets managed by other classes. Unfortunately, when Randi wrote this code it was singlemindedly geared toward the precise requirements of the software renderer so the GL additions may appear a bit awkward as a result.
I haven't really spent much time on it. I probably should dedicate my full attention to this problem.I think that LLVM is the major blocker for a full merge right now, how does this come along?
Getting GZDoom to run it at always above 60fps would be really cool. That would mark mission accomplished for that map.Ok. I'll definitely investigate time here to see if this can be made useful in the GL renderer as well, once I have more time when ZScript is further along, a map like Frozen Time spends approx. 7 ms per frame in code that should be somewhat multithreadable. Maybe we can get it to run at more than 60fps on my system. If the renderer has more time to process this data it may also be easier to keep some of it preprocessed for quicker rendering.
-
- Posts: 3886
- Joined: Fri Feb 08, 2008 9:15 am
- Preferred Pronouns: She/Her
- Operating System Version (Optional): (btw I use) Arch
- Graphics Processor: nVidia with Vulkan support
- Location: Vigo, Galicia
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
After building the latest QZDoom with Clang (because GCC throws an error *cough*), I decided to give that bridge a whirl myself. With my trusty i5-6400 and at 1080p, I get about 20 fps on truecolor, and 35 on paletted. r_scene_multithreaded is enabled, and also I built with -march=native -O3 because I wanted to squeeze this as hard as I could.
As a lil' bonus, in 320x200 paletted it gives me 110 FPS.
Considering how the bridge floors are made, it makes a lot of sense that it would spend a lot of time drawing there.
As a lil' bonus, in 320x200 paletted it gives me 110 FPS.
Considering how the bridge floors are made, it makes a lot of sense that it would spend a lot of time drawing there.
-
-
- Posts: 17936
- Joined: Fri Jul 06, 2007 3:22 pm
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
Frozen Time is where I'd optimize by using a compatibility hack to replace the ten gazillion midtextures with a simple 3D floor.
-
- Lead GZDoom+Raze Developer
- Posts: 49188
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
I once modified the map like that. FPS will go up from 20 to 24 with that, for the latest version.
-
- Posts: 1268
- Joined: Wed Jul 20, 2011 4:24 pm
Re: QZDoom - ZDoom with True-Color (Version 1.2.2 released!)
Indeed replacing the fake bridge by a proper 3D floor one will save alot of fps...
Sad the last time I tried that I broke half the map with alot of missing sectors D:
EDIT: if you guys want to test some multithread with dynamic lights and zscript, I have an almost finished map with ehm more than 10.000 objects that Interact with other lights each tic
Sad the last time I tried that I broke half the map with alot of missing sectors D:
EDIT: if you guys want to test some multithread with dynamic lights and zscript, I have an almost finished map with ehm more than 10.000 objects that Interact with other lights each tic