Slope differences between 32bit and 64bit builds

Post by **edward850** » Sat Nov 15, 2014 12:53 pm

I wouldn't normally care about this, but as it seems Linux users typically build ZDoom in 64bit rather than 32bit, the behaviour between the two may need some close attention.
There's a difference in slope calculation between 32bit and 64bit. This demo (Sonic.wad (2011) functional with g92143f9), plays back as expected in 32bit, yet in 64bit appears to change when jumping up a slope 3 minutes in. As seen 3 minutes into this comparison video:

The problem stems from the faux-float calculations used for slopes themselves. Because they require using exponents, 64bit ends up changing the method of calculation with a more "accurate" result, which comes out different from 32bit.

Edward-san and myself have tried recreating the method to no degree of success (finding the function to be insane), but I did find that changing the floats to doubles resolves the issue as far as I can debug. It seems to also favour the 32bit results, making previous demos suddenly work. I can't quite figure that one out.

Much better fix. Now I wonder if I'll actually get some sleep.

Post by **Edward-san** » Sat Nov 15, 2014 9:13 pm

I found an alternative, which is by using only fixed arithmetic, with no difference with the double version:

Code: Select all

		SQWORD r_den = (SQWORD(ld->dx)*ld->dx + SQWORD(ld->dy)*ld->dy) / (1 << 24);
		SQWORD r_num = ((SQWORD(tm.x - ld->v1->x)*ld->dx) + (SQWORD(tm.y - ld->v1->y)*ld->dy));
		fixed_t r = (fixed_t)(r_num/r_den);

Post by **edward850** » Sat Nov 15, 2014 9:17 pm

Actually, the funny thing is the double version is different, although subtly.
The fixed_t version, oddly enough, matches.

Post by **Edward-san** » Sun Nov 16, 2014 4:35 am

Reorder the operands and add a comment regarding my solution.

GooberMan · Post by **GooberMan** » Sun Nov 16, 2014 9:33 am

There's likely plenty of these dotted about the engine. x64 compilers by default spit out SSE code, which gives different results to x87 code. Basically, those floats will only be calculated with 32 bits on 64-bit code builds, while a 32-bit code build will use the full 80 bits of x87 precision internally before storing in the 32 bit float. The double version on a 64-bit code build would calculate at 64 bit precision, hence the subtle differences.

It'll be a tricky one to completely handle going forward as more people use 64-bit code builds. It'll be very worthwhile investigating whether you can force your compiler to generate x87 instructions in 64 bit builds (I know at the very least that MS link will still link x87 instructions as I have a problem in completely and utterly independent code with 64-bit code generation using DMD going bonkers generating x87 code).

EDIT: This, of course, implies compatibility on other platforms (such as ARM) is going to be nigh on impossible.

Post by **NeuralStunner** » Sun Nov 16, 2014 10:08 am

Am I remembering incorrectly, or did someone get ZDoom sort-of working on an ARM already?

GooberMan · Post by **GooberMan** » Sun Nov 16, 2014 11:49 am

Most likely, but demo and multiplayer compatability is another matter entirely.

Post by **Blzut3** » Sun Nov 16, 2014 12:19 pm

Can't say forcing x87 is a good idea. As you stated it wouldn't help with ARM, MIPS, and PowerPC compatibility, which is something we should have. It's good that these minor differences can manifest themselves in 32 vs 64-bit x86 as these platforms are commonly available for debugging.

Post by **Graf Zahl** » Sun Nov 16, 2014 12:22 pm

It's likely they'd even manifest themselves between 32bit x87 and 32 bit SSE2 if it's really the float math. The CMake project allows setting both with the recent MSVC compilers.

I wonder how GZDoom fares, because I switched floating point consistency from precise to fast because the GL renderer depends on floating point efficiency.

Post by **edward850** » Sun Nov 16, 2014 11:02 pm

I did notice that ZDoom in 64bit is being compiled with SSE apparently switched off (I can't find any set compiler flag, which is interesting, but DISABLE_SSE is defined which affects the internal node builder). If it really does change the result of floats, I might need to do some testing with large explosions and the like, as that and vectors are another key place where floats are used.
None of it seems dramatically different to be actually making a difference, though. Although I haven't tested GZDoom yet between systems, which might need to be done if float precision was changed (which might manifest problems between CPUs alone).

Problem is, if any more problems arise, while explosions and other radius collection functions could be fixed by using long integer math, like slopes, I have no idea what could possibly be done with vectors. Especially, if anybody wants to increase the available map size space anytime soon.

Post by **Blzut3** » Mon Nov 17, 2014 1:02 am

edward850 wrote:but DISABLE_SSE is defined

It's possible this could use a better name, but the reason for this is to disable the run time detection of SSE. Since all x86-64 processors have SSE2, compilers default to optimizing for it and there's no reason to run time check for it. You'll notice that the SSE and non-SSE node builder functions are identical, it's just a difference in what compiler flags are used.

edward850 wrote:Especially, if anybody wants to increase the available map size space anytime soon.

Things like this really make me question what the point of increasing this limit would be. I believe it should be possible to fix those issues with a 64-bit fixed point type, much like what was done to fix this issue. But maps would still need to be within the 32-bit fixed point coordinate system, which is significantly larger than what is currently permissible with 32-bit fixed point math.

Post by **Graf Zahl** » Mon Nov 17, 2014 1:32 am

Using 64 bit fixed point is only ok for isolated use as it produces some significant overhead with 32 bit code.
I'd rather see the engine ported to true double precision floating point with some extra care to the few places where underflows need to be considered (e.g. adding velocity or friction)

Post by **edward850** » Mon Nov 17, 2014 2:03 am

Blzut3 wrote:But maps would still need to be within the 32-bit fixed point coordinate system, which is significantly larger than what is currently permissible with 32-bit fixed point math.

Graf Zahl wrote:Using 64 bit fixed point is only ok for isolated use as it produces some significant overhead with 32 bit code.

The idea would be to increase it for 64bit builds, which may start to become viable as people (hopefully) start moving away from Windows XP, as 32bit processors are already very hard to find nowadays. Mind you, people continue to horde ancient GPUs, so who knows how long that will take.

As it is currently, ZDoom in 64bit isn't actually any slower. At least when you can get the assembler to compile. There seems to be something wrong with fixrtext which makes it trip up when compiling x64 builds (likely as it wasn't designed with them in mind).

Graf Zahl wrote:I'd rather see the engine ported to true double precision floating point with some extra care to the few places where underflows need to be considered (e.g. adding velocity or friction)

Do doubles have the same problem as floats with different results per processor type and implementation? Or is that just with over/underflows, like with what was found with the slope code?

Post by **Graf Zahl** » Mon Nov 17, 2014 2:27 am

If the IEEE specs are properly implemented, results should be identical. When using x87 math its internal precision has to be set to 64 bit for that. The main problem with the floats is that the x87 doesn't handle floats properly according to spec and MSVC adds a significant amount of cruft to force all intermediate results into 32 bit, and I believe that was the cause for the discrepancy here.

Post by **Edward-san** » Mon Nov 17, 2014 5:15 am

The 64 bit fixed point would make my 'SQWORD operands' idea applied to the inconsistency completely useless, until there's a way to declare a 128bit integer on all the systems. Or should we make a sort of 'arbitrary size integer' construct?

ZDoom

Slope differences between 32bit and 64bit builds

Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds

Re: Slope differences between 32bit and 64bit builds