Well - dpJudas has struck again.
In QZDoom's latest dev build, the bridge scene in Frozen Time is now playable in the software renderer (provided you have a decent enough processor).
dpJudas wrote:About ZDoomGL, that speed is the same as 3dge is getting on my computer. This is one evil map.
Graf Zahl wrote:Neither side does anything that's an obvious showstopper and yet those other ports completely tank into low single digit FPS while GZDoom and PrBoom have little problems running that map at decent speeds.
Graf Zahl wrote:About the texture manager, which part is a concern? I'd guess it's GetPixels and related things that can get concurrently accessed by different threads, isn't it? Wouldn't adding a mutex be the solution then, or do you need to protect some more things?
Graf Zahl wrote:The viewpoint variables should probably be put into some variable that gets returned by R_SetupFrame instead of storing it globally. So far I didn't bother because of the software renderer and its overdependence on global variables, but once everything has been neatly put away this should be done as well, but I guess it may be better if both ports actually get merged before that so that it's easier to work on that stuff in the future. The current split doesn't make it easy for me, because I basically cannot do anything at all for the software renderer, e.g. implementing the Doom64 sector colors there which should be easy to do except for the interpolation option.
Graf Zahl wrote:One other things I noticed while playing around with threads is that starting and ending threads while NVidia's GL driver is active can cause some bad performance degradation, easily nullifying all advantages. So if I ever add threads to the GL renderer they probably need to be kept for the lifetime of the program and not be created when needed and ended when rendering is done.
dpJudas wrote:Graf Zahl wrote:Neither side does anything that's an obvious showstopper and yet those other ports completely tank into low single digit FPS while GZDoom and PrBoom have little problems running that map at decent speeds.
When I ran the Very Sleepy profiler on 3dge, it blamed the Nvidia driver. When I then looked at the actual code, their way of batching draws basically involved queuing one 'unit' per wall drawn. It then sorted those units by texture and state setup, and finally drew them with a glBegin(GL_QUAD) for each unit. So 500 walls would mean 500 times glBegin + glVertexAttribute * 4 * 3 + glEnd (total 7,000 OpenGL calls), plus checks between each wall/unit to see if the state setup changed. Add to that it does it on a subsector level, meaning more walls than GZDoom.
I think part of the explanation is that the overhead builds up. But you have much more experience with the fixed function pipeline than I do, so you know better than me how big the overhead of the glBegin family is. There's of course also always the possibility that their clipper is buggy somehow, making them draw much more than what is needed (a problem softpoly currently has). I noticed the 3dge node builder created some errors in the bridge - if it made errors like that in the castle itself, maybe it ended up drawing far more than GZDoom does.
It is the loading of textures that is the problem. Once the pixels have been loaded then the call to GetPixels is safe enough as all the threads only read from it. A mutex lock would do the trick, although ideally it would only attempt to make such a lock if it already concluded the texture is not loaded.
Agree - it is better if I don't make changes to this part until after the merger. How/when do you suggest we do this? Main refactor work is more or less done, although there's of course always things that could be further improved. I think it is probably best we leave out the TC drawers for now until I find a better way to deal with the LLVM situation.
The threads I'm using only gets launched once. They use a condition variable to then wait for the main thread to start the next batch of work handed to them. I only really stop them if the desired thread count changes (only happens if r_multithreaded is toggled on or off).
Graf Zahl wrote:Get an older GZDoom and it works exactly like that. Don't fall for the urban myth that immediate mode is the root of all evil, I got that debunked years ago.
Precisely what I thought. I don't know if it makes sense for the software renderer to abstract the texture manager like I did in the GL renderer where I only use it as a store for the raw resources, the actual texture data gets managed by other classes. Unfortunately, when Randi wrote this code it was singlemindedly geared toward the precise requirements of the software renderer so the GL additions may appear a bit awkward as a result.
I think that LLVM is the major blocker for a full merge right now, how does this come along?
Ok. I'll definitely investigate time here to see if this can be made useful in the GL renderer as well, once I have more time when ZScript is further along, a map like Frozen Time spends approx. 7 ms per frame in code that should be somewhat multithreadable. Maybe we can get it to run at more than 60fps on my system. If the renderer has more time to process this data it may also be easier to keep some of it preprocessed for quicker rendering.
Users browsing this forum: No registered users and 0 guests