Crash in voxel code with SSE2 and O3 with gcc 4.9

Need help running G/Q/ZDoom/ECWolf/Zandronum/3DGE? Did your computer break? Ask here.

Moderator: Developers

Crash in voxel code with SSE2 and O3 with gcc 4.9

Postby drfrag » Sun Nov 05, 2017 2:51 pm

I'm experiencing an early crash with BD v21 and some other mods when compiling with SSE2 and O3 (-msse2 -mfpmath=sse) in the voxel code with gcc 4.9.2. It happens with ZDoom32 but GZDoom shares the same code.
I've tracked it to the inline function inline int GetShort(const unsigned char *foo) in m_swap.h.
@devs @_mental_: Do you know what's going on? Any ideas? Thanks.

Line 258 in FVoxel *R_LoadKVX(int lumpnum) in voxels.cpp is:
Code: Select allExpand view
mipl->OffsetXY[j] = GetShort(rawmip + i + j * 2);

and then in m_swap.h:
Code: Select allExpand view
// Data accessors, since some data is highly likely to be unaligned.
#if defined(_M_IX86) || defined(_M_X64) || defined(__i386__)
inline int GetShort(const unsigned char *foo)
{
   return *(const short *)foo;
}
inline int GetInt(const unsigned char *foo)
{
   return *(const int *)foo;
}
inline int GetBigInt(const unsigned char *foo)
{
   return BigLong(GetInt(foo));
}
#else
inline int GetShort(const unsigned char *foo)
{
   return short(foo[0] | (foo[1] << 8));
}

The backtrace:
Code: Select allExpand view
Program received signal SIGSEGV, Segmentation fault.
R_LoadKVX (lumpnum=lumpnum@entry=18118) at C:\DEV\qzdoom\src\r_data\voxels.cpp:258
C:\DEV\qzdoom\src\r_data\voxels.cpp:258:7781:beg:0x91ed41
>>>>>>cb_gdb:mipl = 0xc53b210
offsetsize = 8312
voxdatasize = 14141
mip = 2
j = <optimized out>
rawvoxel = 0x9ed8ff4 "TÖ\005"
slabs = {0x9ef81d8, 0x9f3e44e, 0x19a1, 0x5, 0x91b98b <R_InstallSprite(int)+459>}
n = 4030
lump = {
  Block = {
    Chars = 0x9ed8ff4 "TÖ\005",
    static NullString = {
      Len = 0,
      AllocLen = 2,
      RefCount = 24896,
      Nothing = "\000"
    }
  }
}
voxel = 0xc53b190
rawmip = 0x9f4d029 ">"
maxmipsize = 29106
i = <optimized out>
voxelsize = 505063
>>>>>>cb_gdb:lumpnum = 18118
>>>>>>cb_gdb:#0  R_LoadKVX (lumpnum=lumpnum@entry=18118) at C:\DEV\qzdoom\src\r_data\voxels.cpp:258
#1  0x0091f786 in R_LoadVoxelDef (lumpnum=18118, spin=0) at C:\DEV\qzdoom\src\r_data\voxels.cpp:330
#2  0x0091c3d2 in R_InitSpriteDefs () at C:\DEV\qzdoom\src\r_data\sprites.cpp:376
#3  0x0091e336 in R_InitSprites () at C:\DEV\qzdoom\src\r_data\sprites.cpp:943
#4  0x007b66a6 in P_Init () at C:\DEV\qzdoom\src\p_setup.cpp:4227
#5  0x00675686 in D_DoomMain () at C:\DEV\qzdoom\src\d_main.cpp:2521
#6  0x0042d38f in DoMain (hInstance=hInstance@entry=0x400000) at C:\DEV\qzdoom\src\win32\i_main.cpp:1034
#7  0x0042db36 in WinMain@16 (hInstance=0x400000, nothing=0x0, cmdline=0x2b243c "-file c:\\temp\\gzdoom\\bd21testnov01.pk3", nCmdShow=10) at C:\DEV\qzdoom\src\win32\i_main.cpp:1332
#8  0x00b1425b in main ()
User avatar
drfrag
ZDoom32 and ZDoom LE developer.
 
Joined: 23 Apr 2004
Location: Spain

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Postby Graf Zahl » Sun Nov 05, 2017 3:01 pm

Have you verified that the voxel is properly formatted?
User avatar
Graf Zahl
Lead GZDoom Developer
 
Joined: 19 Jul 2003
Location: Germany

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Postby drfrag » Sun Nov 05, 2017 3:36 pm

No but the non SSE2 ZDoom32 executable runs fine and GZDoom as well (but not on this machine due to the big texture bug).
How do i check the voxel? I don't think is the voxel since Castlevania also crashes (SSE2 version).
Edit: no crash with O2 either.
User avatar
drfrag
ZDoom32 and ZDoom LE developer.
 
Joined: 23 Apr 2004
Location: Spain

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Postby _mental_ » Mon Nov 06, 2017 1:19 am

In short, just don’t use -O3.

To be able to tell something I need to look at assembly generated with -O3 and then to compare it with -O2.
I bet on unaligned address for SSE instruction that requires aligned one. GCC has a long history of bad SSE code generation.

You can try to change optimization options for a few related functions but this is really tedious process.
_mental_
 
 
 
Joined: 07 Aug 2011

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Postby drfrag » Mon Nov 06, 2017 6:02 am

Thanks. O2 would hurt performance too much. It's fixed, just used set_source_files_properties to not use sse2 for voxels.cpp in CMakelists. It's up. :) I guess it's time for a new release.
Fixes Castlevania as well but the capped sky is still missing for the titlemap.
May be would be a good idea to apply the fix for D3D and large textures as well for the time being until someone writes an scaler.
User avatar
drfrag
ZDoom32 and ZDoom LE developer.
 
Joined: 23 Apr 2004
Location: Spain

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Postby Graf Zahl » Mon Nov 06, 2017 7:17 am

Why do you even bother with SSE? In the last 12 years I've never seen a hint that it actually increases performance unless intrinsics are used. Not one single of my computers showed any advantage in the node builder which existed as both x87 and SSE2.
User avatar
Graf Zahl
Lead GZDoom Developer
 
Joined: 19 Jul 2003
Location: Germany

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Postby drfrag » Mon Nov 06, 2017 7:30 am

SSE2 is used optionally for the truecolor renderer (two executables). It provides 40% performance increase on AMD and 50% on Intel but the difference is much greater with the new LLVM drawers. In fact the old C++ drawers are pretty fast already and SSE2 only matters for slow P4 cpus.
Edit: now i see what you mean, you only use SSE2 for the software renderer. I will do the same then.
User avatar
drfrag
ZDoom32 and ZDoom LE developer.
 
Joined: 23 Apr 2004
Location: Spain


Return to Technical Issues

Who is online

Users browsing this forum: No registered users and 1 guest