by Graf Zahl » Wed Dec 07, 2016 6:06 am
I just ran some tests and the results were quite interesting.
1. 64 bit builds are consistently faster than 32 bit builds, the single threaded versions far more than multithreaded, but of course 64 bit MT is the fastest one.
2. The performance gain of both 32 and 64 bit MT builds are most on simple maps (not surprising)
3. In absolute terms the gain from multithreading is a nearly constant 3 ms in both 64 bit 32 bit with 1920x1080 for me, no matter how complex the map is.
What does this tell us?
Well, I think some people will not like this but: The entire assembly story of gaining performance 'where it matters' is utterly and completely bogus. The only maps where the assembly could truly show off its 'power' is those where it completely DOESN'T matter! The rendering part is a nearly constant component of the entire render flow, and the more complex the map becomes, the less relevant it is. (Of course, for multithreading the same is true, the more complex the scene becomes, the less significant the raw drawing power becomes.)
Take Frozen Time, for example. I gained a measly 1 fps (11 to 12) between the old 32 bit single threaded renderer and the 64 bit multithreaded version - but: It still ran 3 ms faster, only with values of roundabout 90 ms per frame it doesn't register as any measurable improvement in frame rate.
This should also make it clear how pointless it is to measure performance in fps at all! I long stopped doing that. The only value that has usable properties for measuring improvements is milliseconds!
So where does that lead us? I think it's clear: If we want to improve rendering performance on more complex maps, the most important thing is not boosting the drawer performance to the limit but structure the whole thing that larger components can both be optimized and better laid out to multithread stuff at a higher level in the chain. Focussing on the drawers will only hold progress back.
I cannot say that this surprises me, the biggest roadblock to performance improvements is all the global variable shit that's going on in there.
I just ran some tests and the results were quite interesting.
1. 64 bit builds are consistently faster than 32 bit builds, the single threaded versions far more than multithreaded, but of course 64 bit MT is the fastest one.
2. The performance gain of both 32 and 64 bit MT builds are most on simple maps (not surprising)
3. In absolute terms the gain from multithreading is a nearly constant 3 ms in both 64 bit 32 bit with 1920x1080 for me, no matter how complex the map is.
What does this tell us?
Well, I think some people will not like this but: The entire assembly story of gaining performance 'where it matters' is utterly and completely bogus. The only maps where the assembly could truly show off its 'power' is those where it completely DOESN'T matter! The rendering part is a nearly constant component of the entire render flow, and the more complex the map becomes, the less relevant it is. (Of course, for multithreading the same is true, the more complex the scene becomes, the less significant the raw drawing power becomes.)
Take Frozen Time, for example. I gained a measly 1 fps (11 to 12) between the old 32 bit single threaded renderer and the 64 bit multithreaded version - but: It still ran 3 ms faster, only with values of roundabout 90 ms per frame it doesn't register as any measurable improvement in frame rate.
This should also make it clear how pointless it is to measure performance in fps at all! I long stopped doing that. The only value that has usable properties for measuring improvements is milliseconds!
So where does that lead us? I think it's clear: If we want to improve rendering performance on more complex maps, the most important thing is not boosting the drawer performance to the limit but structure the whole thing that larger components can both be optimized and better laid out to multithread stuff at a higher level in the chain. Focussing on the drawers will only hold progress back.
I cannot say that this surprises me, the biggest roadblock to performance improvements is all the global variable shit that's going on in there.