Page 1 of 1

Crash in voxel code with SSE2 and O3 with gcc 4.9

Posted: Sun Nov 05, 2017 1:51 pm
by drfrag
I'm experiencing an early crash with BD v21 and some other mods when compiling with SSE2 and O3 (-msse2 -mfpmath=sse) in the voxel code with gcc 4.9.2. It happens with ZDoom32 but GZDoom shares the same code.
I've tracked it to the inline function inline int GetShort(const unsigned char *foo) in m_swap.h.
@devs @_mental_: Do you know what's going on? Any ideas? Thanks.

Line 258 in FVoxel *R_LoadKVX(int lumpnum) in voxels.cpp is:

Code: Select all

mipl->OffsetXY[j] = GetShort(rawmip + i + j * 2);
and then in m_swap.h:

Code: Select all

// Data accessors, since some data is highly likely to be unaligned.
#if defined(_M_IX86) || defined(_M_X64) || defined(__i386__) 
inline int GetShort(const unsigned char *foo)
{
	return *(const short *)foo;
}
inline int GetInt(const unsigned char *foo)
{
	return *(const int *)foo;
}
inline int GetBigInt(const unsigned char *foo)
{
	return BigLong(GetInt(foo));
}
#else
inline int GetShort(const unsigned char *foo)
{
	return short(foo[0] | (foo[1] << 8));
}
The backtrace:

Code: Select all

Program received signal SIGSEGV, Segmentation fault.
R_LoadKVX (lumpnum=lumpnum@entry=18118) at C:\DEV\qzdoom\src\r_data\voxels.cpp:258
C:\DEV\qzdoom\src\r_data\voxels.cpp:258:7781:beg:0x91ed41
>>>>>>cb_gdb:mipl = 0xc53b210
offsetsize = 8312
voxdatasize = 14141
mip = 2
j = <optimized out>
rawvoxel = 0x9ed8ff4 "TÖ\005"
slabs = {0x9ef81d8, 0x9f3e44e, 0x19a1, 0x5, 0x91b98b <R_InstallSprite(int)+459>}
n = 4030
lump = {
  Block = {
    Chars = 0x9ed8ff4 "TÖ\005", 
    static NullString = {
      Len = 0, 
      AllocLen = 2, 
      RefCount = 24896, 
      Nothing = "\000"
    }
  }
}
voxel = 0xc53b190
rawmip = 0x9f4d029 ">"
maxmipsize = 29106
i = <optimized out>
voxelsize = 505063
>>>>>>cb_gdb:lumpnum = 18118
>>>>>>cb_gdb:#0  R_LoadKVX (lumpnum=lumpnum@entry=18118) at C:\DEV\qzdoom\src\r_data\voxels.cpp:258
#1  0x0091f786 in R_LoadVoxelDef (lumpnum=18118, spin=0) at C:\DEV\qzdoom\src\r_data\voxels.cpp:330
#2  0x0091c3d2 in R_InitSpriteDefs () at C:\DEV\qzdoom\src\r_data\sprites.cpp:376
#3  0x0091e336 in R_InitSprites () at C:\DEV\qzdoom\src\r_data\sprites.cpp:943
#4  0x007b66a6 in P_Init () at C:\DEV\qzdoom\src\p_setup.cpp:4227
#5  0x00675686 in D_DoomMain () at C:\DEV\qzdoom\src\d_main.cpp:2521
#6  0x0042d38f in DoMain (hInstance=hInstance@entry=0x400000) at C:\DEV\qzdoom\src\win32\i_main.cpp:1034
#7  0x0042db36 in WinMain@16 (hInstance=0x400000, nothing=0x0, cmdline=0x2b243c "-file c:\\temp\\gzdoom\\bd21testnov01.pk3", nCmdShow=10) at C:\DEV\qzdoom\src\win32\i_main.cpp:1332
#8  0x00b1425b in main ()

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Posted: Sun Nov 05, 2017 2:01 pm
by Graf Zahl
Have you verified that the voxel is properly formatted?

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Posted: Sun Nov 05, 2017 2:36 pm
by drfrag
No but the non SSE2 ZDoom32 executable runs fine and GZDoom as well (but not on this machine due to the big texture bug).
How do i check the voxel? I don't think is the voxel since Castlevania also crashes (SSE2 version).
Edit: no crash with O2 either.

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Posted: Mon Nov 06, 2017 12:19 am
by _mental_
In short, just don’t use -O3.

To be able to tell something I need to look at assembly generated with -O3 and then to compare it with -O2.
I bet on unaligned address for SSE instruction that requires aligned one. GCC has a long history of bad SSE code generation.

You can try to change optimization options for a few related functions but this is really tedious process.

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Posted: Mon Nov 06, 2017 5:02 am
by drfrag
Thanks. O2 would hurt performance too much. It's fixed, just used set_source_files_properties to not use sse2 for voxels.cpp in CMakelists. It's up. :) I guess it's time for a new release.
Fixes Castlevania as well but the capped sky is still missing for the titlemap.
May be would be a good idea to apply the fix for D3D and large textures as well for the time being until someone writes an scaler.

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Posted: Mon Nov 06, 2017 6:17 am
by Graf Zahl
Why do you even bother with SSE? In the last 12 years I've never seen a hint that it actually increases performance unless intrinsics are used. Not one single of my computers showed any advantage in the node builder which existed as both x87 and SSE2.

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Posted: Mon Nov 06, 2017 6:30 am
by drfrag
SSE2 is used optionally for the truecolor renderer (two executables). It provides 40% performance increase on AMD and 50% on Intel but the difference is much greater with the new LLVM drawers. In fact the old C++ drawers are pretty fast already and SSE2 only matters for slow P4 cpus.
Edit: now i see what you mean, you only use SSE2 for the software renderer. I will do the same then.