Crash in voxel code with SSE2 and O3 with gcc 4.9

Forum rules
Contrary to popular belief, we are not all-knowing-all-seeing magical beings!

If you want help you're going to have to provide lots of info. Like what is your hardware, what is your operating system, what version of GZDoom/LZDoom/whatever you're using, what mods you're loading, how you're loading it, what you've already tried for fixing the problem, and anything else that is even remotely relevant to the problem.

We can't magically figure out what it is if you're going to be vague, and if we feel like you're just wasting our time with guessing games we will act like that's what you're really doing and won't help you.

Post a reply

Smilies
:D :) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :wink: :geek: :ugeek: :!: :?: :idea: :arrow: :| :mrgreen: :3: :wub: >:( :blergh:
View more smilies

BBCode is OFF
Smilies are ON

Topic review
   

Expand view Topic review: Crash in voxel code with SSE2 and O3 with gcc 4.9

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

by drfrag » Mon Nov 06, 2017 6:30 am

SSE2 is used optionally for the truecolor renderer (two executables). It provides 40% performance increase on AMD and 50% on Intel but the difference is much greater with the new LLVM drawers. In fact the old C++ drawers are pretty fast already and SSE2 only matters for slow P4 cpus.
Edit: now i see what you mean, you only use SSE2 for the software renderer. I will do the same then.

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

by Graf Zahl » Mon Nov 06, 2017 6:17 am

Why do you even bother with SSE? In the last 12 years I've never seen a hint that it actually increases performance unless intrinsics are used. Not one single of my computers showed any advantage in the node builder which existed as both x87 and SSE2.

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

by drfrag » Mon Nov 06, 2017 5:02 am

Thanks. O2 would hurt performance too much. It's fixed, just used set_source_files_properties to not use sse2 for voxels.cpp in CMakelists. It's up. :) I guess it's time for a new release.
Fixes Castlevania as well but the capped sky is still missing for the titlemap.
May be would be a good idea to apply the fix for D3D and large textures as well for the time being until someone writes an scaler.

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

by _mental_ » Mon Nov 06, 2017 12:19 am

In short, just don’t use -O3.

To be able to tell something I need to look at assembly generated with -O3 and then to compare it with -O2.
I bet on unaligned address for SSE instruction that requires aligned one. GCC has a long history of bad SSE code generation.

You can try to change optimization options for a few related functions but this is really tedious process.

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

by drfrag » Sun Nov 05, 2017 2:36 pm

No but the non SSE2 ZDoom32 executable runs fine and GZDoom as well (but not on this machine due to the big texture bug).
How do i check the voxel? I don't think is the voxel since Castlevania also crashes (SSE2 version).
Edit: no crash with O2 either.

Re: Crash in voxel code with SSE2 and O3 with gcc 4.9

by Graf Zahl » Sun Nov 05, 2017 2:01 pm

Have you verified that the voxel is properly formatted?

Crash in voxel code with SSE2 and O3 with gcc 4.9

by drfrag » Sun Nov 05, 2017 1:51 pm

I'm experiencing an early crash with BD v21 and some other mods when compiling with SSE2 and O3 (-msse2 -mfpmath=sse) in the voxel code with gcc 4.9.2. It happens with ZDoom32 but GZDoom shares the same code.
I've tracked it to the inline function inline int GetShort(const unsigned char *foo) in m_swap.h.
@devs @_mental_: Do you know what's going on? Any ideas? Thanks.

Line 258 in FVoxel *R_LoadKVX(int lumpnum) in voxels.cpp is:

Code: Select all

mipl->OffsetXY[j] = GetShort(rawmip + i + j * 2);
and then in m_swap.h:

Code: Select all

// Data accessors, since some data is highly likely to be unaligned.
#if defined(_M_IX86) || defined(_M_X64) || defined(__i386__) 
inline int GetShort(const unsigned char *foo)
{
	return *(const short *)foo;
}
inline int GetInt(const unsigned char *foo)
{
	return *(const int *)foo;
}
inline int GetBigInt(const unsigned char *foo)
{
	return BigLong(GetInt(foo));
}
#else
inline int GetShort(const unsigned char *foo)
{
	return short(foo[0] | (foo[1] << 8));
}
The backtrace:

Code: Select all

Program received signal SIGSEGV, Segmentation fault.
R_LoadKVX (lumpnum=lumpnum@entry=18118) at C:\DEV\qzdoom\src\r_data\voxels.cpp:258
C:\DEV\qzdoom\src\r_data\voxels.cpp:258:7781:beg:0x91ed41
>>>>>>cb_gdb:mipl = 0xc53b210
offsetsize = 8312
voxdatasize = 14141
mip = 2
j = <optimized out>
rawvoxel = 0x9ed8ff4 "TÖ\005"
slabs = {0x9ef81d8, 0x9f3e44e, 0x19a1, 0x5, 0x91b98b <R_InstallSprite(int)+459>}
n = 4030
lump = {
  Block = {
    Chars = 0x9ed8ff4 "TÖ\005", 
    static NullString = {
      Len = 0, 
      AllocLen = 2, 
      RefCount = 24896, 
      Nothing = "\000"
    }
  }
}
voxel = 0xc53b190
rawmip = 0x9f4d029 ">"
maxmipsize = 29106
i = <optimized out>
voxelsize = 505063
>>>>>>cb_gdb:lumpnum = 18118
>>>>>>cb_gdb:#0  R_LoadKVX (lumpnum=lumpnum@entry=18118) at C:\DEV\qzdoom\src\r_data\voxels.cpp:258
#1  0x0091f786 in R_LoadVoxelDef (lumpnum=18118, spin=0) at C:\DEV\qzdoom\src\r_data\voxels.cpp:330
#2  0x0091c3d2 in R_InitSpriteDefs () at C:\DEV\qzdoom\src\r_data\sprites.cpp:376
#3  0x0091e336 in R_InitSprites () at C:\DEV\qzdoom\src\r_data\sprites.cpp:943
#4  0x007b66a6 in P_Init () at C:\DEV\qzdoom\src\p_setup.cpp:4227
#5  0x00675686 in D_DoomMain () at C:\DEV\qzdoom\src\d_main.cpp:2521
#6  0x0042d38f in DoMain (hInstance=hInstance@entry=0x400000) at C:\DEV\qzdoom\src\win32\i_main.cpp:1034
#7  0x0042db36 in WinMain@16 (hInstance=0x400000, nothing=0x0, cmdline=0x2b243c "-file c:\\temp\\gzdoom\\bd21testnov01.pk3", nCmdShow=10) at C:\DEV\qzdoom\src\win32\i_main.cpp:1332
#8  0x00b1425b in main ()

Top