Crash in voxel code with SSE2 and O3 with gcc 4.9

Need help running G/Q/ZDoom/ECWolf/Zandronum/3DGE/EDuke32/Raze? Did your computer break? Ask here.

Moderator: GZDoom Developers

Forum rules
Contrary to popular belief, we are not all-knowing-all-seeing magical beings!

If you want help you're going to have to provide lots of info. Like what is your hardware, what is your operating system, what version of GZDoom/LZDoom/whatever you're using, what mods you're loading, how you're loading it, what you've already tried for fixing the problem, and anything else that is even remotely relevant to the problem.

We can't magically figure out what it is if you're going to be vague, and if we feel like you're just wasting our time with guessing games we will act like that's what you're really doing and won't help you.
Post Reply
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Crash in voxel code with SSE2 and O3 with gcc 4.9

Post by drfrag »

I'm experiencing an early crash with BD v21 and some other mods when compiling with SSE2 and O3 (-msse2 -mfpmath=sse) in the voxel code with gcc 4.9.2. It happens with ZDoom32 but GZDoom shares the same code.
I've tracked it to the inline function inline int GetShort(const unsigned char *foo) in m_swap.h.
@devs @_mental_: Do you know what's going on? Any ideas? Thanks.

Line 258 in FVoxel *R_LoadKVX(int lumpnum) in voxels.cpp is:

Code: Select all

mipl->OffsetXY[j] = GetShort(rawmip + i + j * 2);
and then in m_swap.h:

Code: Select all

// Data accessors, since some data is highly likely to be unaligned.
#if defined(_M_IX86) || defined(_M_X64) || defined(__i386__) 
inline int GetShort(const unsigned char *foo)
{
	return *(const short *)foo;
}
inline int GetInt(const unsigned char *foo)
{
	return *(const int *)foo;
}
inline int GetBigInt(const unsigned char *foo)
{
	return BigLong(GetInt(foo));
}
#else
inline int GetShort(const unsigned char *foo)
{
	return short(foo[0] | (foo[1] << 8));
}
The backtrace:

Code: Select all

Program received signal SIGSEGV, Segmentation fault.
R_LoadKVX (lumpnum=lumpnum@entry=18118) at C:\DEV\qzdoom\src\r_data\voxels.cpp:258
C:\DEV\qzdoom\src\r_data\voxels.cpp:258:7781:beg:0x91ed41
>>>>>>cb_gdb:mipl = 0xc53b210
offsetsize = 8312
voxdatasize = 14141
mip = 2
j = <optimized out>
rawvoxel = 0x9ed8ff4 "TÖ\005"
slabs = {0x9ef81d8, 0x9f3e44e, 0x19a1, 0x5, 0x91b98b <R_InstallSprite(int)+459>}
n = 4030
lump = {
  Block = {
    Chars = 0x9ed8ff4 "TÖ\005", 
    static NullString = {
      Len = 0, 
      AllocLen = 2, 
      RefCount = 24896, 
      Nothing = "\000"
    }
  }
}
voxel = 0xc53b190
rawmip = 0x9f4d029 ">"
maxmipsize = 29106
i = <optimized out>
voxelsize = 505063
>>>>>>cb_gdb:lumpnum = 18118
>>>>>>cb_gdb:#0  R_LoadKVX (lumpnum=lumpnum@entry=18118) at C:\DEV\qzdoom\src\r_data\voxels.cpp:258
#1  0x0091f786 in R_LoadVoxelDef (lumpnum=18118, spin=0) at C:\DEV\qzdoom\src\r_data\voxels.cpp:330
#2  0x0091c3d2 in R_InitSpriteDefs () at C:\DEV\qzdoom\src\r_data\sprites.cpp:376
#3  0x0091e336 in R_InitSprites () at C:\DEV\qzdoom\src\r_data\sprites.cpp:943
#4  0x007b66a6 in P_Init () at C:\DEV\qzdoom\src\p_setup.cpp:4227
#5  0x00675686 in D_DoomMain () at C:\DEV\qzdoom\src\d_main.cpp:2521
#6  0x0042d38f in DoMain (hInstance=hInstance@entry=0x400000) at C:\DEV\qzdoom\src\win32\i_main.cpp:1034
#7  0x0042db36 in WinMain@16 (hInstance=0x400000, nothing=0x0, cmdline=0x2b243c "-file c:\\temp\\gzdoom\\bd21testnov01.pk3", nCmdShow=10) at C:\DEV\qzdoom\src\win32\i_main.cpp:1332
#8  0x00b1425b in main ()
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49056
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Post by Graf Zahl »

Have you verified that the voxel is properly formatted?
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Post by drfrag »

No but the non SSE2 ZDoom32 executable runs fine and GZDoom as well (but not on this machine due to the big texture bug).
How do i check the voxel? I don't think is the voxel since Castlevania also crashes (SSE2 version).
Edit: no crash with O2 either.
_mental_
 
 
Posts: 3812
Joined: Sun Aug 07, 2011 4:32 am

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Post by _mental_ »

In short, just don’t use -O3.

To be able to tell something I need to look at assembly generated with -O3 and then to compare it with -O2.
I bet on unaligned address for SSE instruction that requires aligned one. GCC has a long history of bad SSE code generation.

You can try to change optimization options for a few related functions but this is really tedious process.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Post by drfrag »

Thanks. O2 would hurt performance too much. It's fixed, just used set_source_files_properties to not use sse2 for voxels.cpp in CMakelists. It's up. :) I guess it's time for a new release.
Fixes Castlevania as well but the capped sky is still missing for the titlemap.
May be would be a good idea to apply the fix for D3D and large textures as well for the time being until someone writes an scaler.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49056
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Post by Graf Zahl »

Why do you even bother with SSE? In the last 12 years I've never seen a hint that it actually increases performance unless intrinsics are used. Not one single of my computers showed any advantage in the node builder which existed as both x87 and SSE2.
User avatar
drfrag
Vintage GZDoom Developer
Posts: 3141
Joined: Fri Apr 23, 2004 3:51 am
Location: Spain
Contact:

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

Post by drfrag »

SSE2 is used optionally for the truecolor renderer (two executables). It provides 40% performance increase on AMD and 50% on Intel but the difference is much greater with the new LLVM drawers. In fact the old C++ drawers are pretty fast already and SSE2 only matters for slow P4 cpus.
Edit: now i see what you mean, you only use SSE2 for the software renderer. I will do the same then.
Post Reply

Return to “Technical Issues”