[Added] Multithreaded rendering

Moderator: GZDoom Developers

Multithreaded rendering

Postby dpJudas » Wed Dec 07, 2016 2:38 am

Pull request: https://github.com/rheit/zdoom/pull/932

Adds the QZDoom multithreading framework to ZDoom.
dpJudas
 
 
 
Joined: 28 May 2016

Re: Multithreaded rendering

Postby Jitan » Wed Dec 07, 2016 2:54 am

Is this the real multithreaded rendedering ?

I hope this get accepted real fast.
Jitan
 
Joined: 04 Oct 2011

Re: Multithreaded rendering

Postby Rachael » Wed Dec 07, 2016 3:15 am

And thus sounds the death knell for the remaining ASM code. (Other than the color matcher, maybe)
User avatar
Rachael
^ walking stack of unfinished projects ^
Admin
 
Joined: 13 Jan 2004
Discord: Rachael#3767
Twitch ID: madamerachelle
Github ID: madame-rachelle

Re: Multithreaded rendering

Postby Graf Zahl » Wed Dec 07, 2016 3:24 am

Obviously I will give this one a very thorough test, but if it really provides the capability to run the game in 4K at acceptable speeds it'd absolutely be worth it.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
 
Joined: 19 Jul 2003
Location: Germany

Re: Multithreaded rendering

Postby dpJudas » Wed Dec 07, 2016 3:35 am

The performance gains depends heavily on the actual bottleneck. E1M1 gets a significant speed boost from this on my computer (134 fps @ 4k, start viewpoint), while first map in KDiZD at 4K still struggles because the code feeding the drawers is holding it back (45 fps @ 4k, start viewpoint).
dpJudas
 
 
 
Joined: 28 May 2016

Re: Multithreaded rendering

Postby Graf Zahl » Wed Dec 07, 2016 4:27 am

dpJudas wrote:while first map in KDiZD at 4K still struggles because the code feeding the drawers is holding it back (45 fps @ 4k, start viewpoint).


Indeed. Which is why I consider the assembly code overrated anyway. The only places where it clearly shows its superiority is where it isn't needed to begin with.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
 
Joined: 19 Jul 2003
Location: Germany

Re: Multithreaded rendering

Postby Graf Zahl » Wed Dec 07, 2016 6:06 am

I just ran some tests and the results were quite interesting.

1. 64 bit builds are consistently faster than 32 bit builds, the single threaded versions far more than multithreaded, but of course 64 bit MT is the fastest one.
2. The performance gain of both 32 and 64 bit MT builds are most on simple maps (not surprising)
3. In absolute terms the gain from multithreading is a nearly constant 3 ms in both 64 bit 32 bit with 1920x1080 for me, no matter how complex the map is.

What does this tell us?

Well, I think some people will not like this but: The entire assembly story of gaining performance 'where it matters' is utterly and completely bogus. The only maps where the assembly could truly show off its 'power' is those where it completely DOESN'T matter! The rendering part is a nearly constant component of the entire render flow, and the more complex the map becomes, the less relevant it is. (Of course, for multithreading the same is true, the more complex the scene becomes, the less significant the raw drawing power becomes.)
Take Frozen Time, for example. I gained a measly 1 fps (11 to 12) between the old 32 bit single threaded renderer and the 64 bit multithreaded version - but: It still ran 3 ms faster, only with values of roundabout 90 ms per frame it doesn't register as any measurable improvement in frame rate.
This should also make it clear how pointless it is to measure performance in fps at all! I long stopped doing that. The only value that has usable properties for measuring improvements is milliseconds!

So where does that lead us? I think it's clear: If we want to improve rendering performance on more complex maps, the most important thing is not boosting the drawer performance to the limit but structure the whole thing that larger components can both be optimized and better laid out to multithread stuff at a higher level in the chain. Focussing on the drawers will only hold progress back.
I cannot say that this surprises me, the biggest roadblock to performance improvements is all the global variable shit that's going on in there.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
 
Joined: 19 Jul 2003
Location: Germany

Re: Multithreaded rendering

Postby Rachael » Wed Dec 07, 2016 6:26 am

I guess the biggest question, more than the speed improvements this code provides, is how workable this code becomes after the merge. If it gets to a point where the software renderer is more manageable I'd say it's still a win-win for most people. The only people who lose out are those who are stuck on single-core processors - but honestly, those are 10+ years old now.

I've said it before - I'll say it again - having more than 2 people who know more than nilly dilly about the software renderer can only help everyone as a whole.
User avatar
Rachael
^ walking stack of unfinished projects ^
Admin
 
Joined: 13 Jan 2004
Discord: Rachael#3767
Twitch ID: madamerachelle
Github ID: madame-rachelle

Re: Multithreaded rendering

Postby Graf Zahl » Wed Dec 07, 2016 6:40 am

This code alone doesn't improve workability. The global variables are still there. But since now everything is C all that hackery that dictated this approach in the first place is no longer needed.
The big issue with the renderer is that feeding optimized data to the assembly was the overarching design limiter for lots of stuff and now that can be gradually removed.

There's really no need to sugarcoat it: This was a design from more than 20 years ago that, instead of looking for an entirely new approach only saw optimizations that remained within that constrained framework, no attempt was ever made to break out of the framework and see if doing it differently might work better. And the last true optimizations here date back from before I started working here. Nothing about the structure has changed in any fundamental way over 10 years.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
 
Joined: 19 Jul 2003
Location: Germany

Re: Multithreaded rendering

Postby Graf Zahl » Wed Dec 07, 2016 6:56 am

Here's another gem. I was just reviewing the DoBlending stuff to see if the assembly is worth keeping.

First the results:

Code: Select allExpand view
Pure C: 43.073183 ms
MMX asm: 19.672721 ms
MMX intrinsics: 23.969307 ms
SSE2 intrinsics: 19.774505 ms
SSE2 intrinsics without fallback: 8.586062 ms
Unaligned
Pure C: 41.646474 ms
MMX asm: 20.875403 ms
MMX intrinsics: 25.511350 ms
SSE2 intrinsics: 20.675840 ms
SSE2 intrinsics without fallback: 10.082297 ms


Note that this runs 100000 calls of this function
The assembly version is only used on non-SSE hardware and on SSE hardware without alignment. But oh the irony: Ripping that out doubles the speed of the SSE version, it was apparently called unconditionally. On top of that, even the C version is far more than sufficient to run 2000 calls in a millisecond on my system, even on a 10 year old computer it could still process several hundred of these. And this gets only called once or twice per frame to set up the screen blend, plus for each Sector_SetColor call (which by lazy calculation can be distributed until really needed.) In clear English: It never becomes a factor and the MMX intrinsics version is far, far more than sufficient for nearly all cases. It's another clear case where the assembly code is not worth the hassle it may bring along - not if it only matters for ancient systems and the alternative is good enough.

The last thing to review is BestColor_MMX and if that also fails the viabilty test I am going to remove all assembly entirely.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
 
Joined: 19 Jul 2003
Location: Germany

Re: Multithreaded rendering

Postby dpJudas » Wed Dec 07, 2016 7:05 am

It is probably worth mentioning that the multithreading framework in this PR could be improved to further boost MT performance in a couple of ways:

1) The way it works right now, the renderer collects all columns and spans to be rendered into a single list of commands and then at the end it dispatches it to threads. This means the worker threads are dormant until the entire BSP has been traversed. For the more complex maps maybe 30% (95% in the Frozen Time example Graf used ;)) or more of a frame time is still done on a single thread. The worker threads could start drawing stuff immediately as commands are added, although it requires improving how it manages the list of commands to execute.

2) There is nothing in the multithreading system that says it can only work with columns. I choose to execute only drawers on worker threads because it was the lowest hanging fruit. In principle, once the renderer determines a wall segment should be drawn by wallscan, the entire wallscan could be a command scheduled to be done by workers. Same for planes. The higher level the commands operate at, the less data needs to be transferred to the workers for them to do their thing.
dpJudas
 
 
 
Joined: 28 May 2016

Re: Multithreaded rendering

Postby Graf Zahl » Wed Dec 07, 2016 7:22 am

I fully understand that. You have to start somewhere to get things going after all. But at least the start is made.

I am now seeing to remove the assembly stuff entirely. DoBlending already failed my tests, the improvements it brings are utterly irrelevant as it just isn't called often enough to justify it, and the intrinsics version, while a bit worse, is more than adequate. I think that with VC 2005 it was quite a bit worse.

But on my system, doing 100000 calles to gain 5 ms means that any normal use on an old system where this code even matters you can still do 100+ blend operations without any speed hit at all. No map I know is redefining sector colors this massively to even be remotely on the screen for a problem.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
 
Joined: 19 Jul 2003
Location: Germany

Re: Multithreaded rendering

Postby Graf Zahl » Wed Dec 07, 2016 7:35 am

Ha ha. On my current system the improvement of BestColor_MMX is precisely: Zero! This is again just a relic that may be relevant for dinosaur systems but for a forward-thinking approach not worth keeping.

I am now going to remove all assembly code that's still present.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
 
Joined: 19 Jul 2003
Location: Germany

Re: Multithreaded rendering

Postby Graf Zahl » Wed Dec 07, 2016 7:42 am

PR added, assembly removed - Time to celebrate! :mrgreen:
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
 
Joined: 19 Jul 2003
Location: Germany

Re: Multithreaded rendering

Postby Nash » Wed Dec 07, 2016 7:46 am

Cool, one less requisite and SDK/developer tool to keep installed on my computer. ;)

[EDIT] should the CMake files be updated too? Those Assembly checkboxes should be irrelevant now... ?

Now, about that Build code...
User avatar
Nash
AKA Nash Muhandes! Twitter/Facebook/Youtube: nashmuhandes
 
 
 
Joined: 27 Oct 2003
Location: Kuala Lumpur, Malaysia
Twitch ID: nashmuhandes
Github ID: nashmuhandes

Next

Return to Closed Feature Suggestions

Who is online

Users browsing this forum: No registered users and 0 guests