Page 1 of 2

[LINUX/ARM] Segfault on exit

PostPosted: Thu Mar 29, 2018 1:37 pm
by vanfanel
HI there,

I guess this is the right place to report.
I built GZDoom 3.3.0 on my Pi3 yesterday. Standard Raspbian with latest packages, and default 6.3.0, latest stable SDL2 2.0.8, stock kernel.
On exit, I get a segfault on a delete[] call, on src/v_video.cpp line 731.

If I comment that line, I get another different segfault I can't quite know where it's happening. Here's what I could find out with a debug version:

Code: Select allExpand view

Thread 1 "gzdoom" received signal SIGSEGV, Segmentation fault.
0x76b3ade4 in malloc_consolidate (av=av@entry=0x76c09794 <main_arena>) at malloc.c:4221
4221   malloc.c: No such file or directory.
(gdb) bt
#0  0x76b3ade4 in malloc_consolidate (av=av@entry=0x76c09794 <main_arena>) at malloc.c:4221
#1  0x76b3d2e4 in _int_malloc (av=av@entry=0x76c09794 <main_arena>, bytes=bytes@entry=4096) at malloc.c:3488
#2  0x76b3f370 in __GI___libc_malloc (bytes=4096) at malloc.c:2928
#3  0x76d637a0 in operator new(unsigned int) () from /usr/lib/arm-linux-gnueabihf/
#4  0x00fc3f2c in VMFrameStack::Alloc(int) ()
#5  0x00fc3d40 in VMFrameStack::AllocFrame(VMScriptFunction*) ()
#6  0x00fc42e0 in VMCall(VMFunction*, VMValue*, int, VMReturn*, int) ()
#7  0x00b57ea4 in DObject::Destroy() ()
#8  0x00c95ab0 in AActor::DestroyAllInventory() ()
#9  0x00ca7740 in AActor::OnDestroy() ()
#10 0x00b57ebc in DObject::Destroy() ()
#11 0x00b62d00 in DThinker::DestroyThinkersInList(FThinkerList&) ()
#12 0x00b62c5c in DThinker::DestroyAllThinkers() ()
#13 0x00ce5da8 in P_FreeLevelData() ()
#14 0x00ce82f8 in ?? ()
#15 0x006c4f20 in call_terms() ()
#16 0x76afbdd4 in __run_exit_handlers (status=0, listp=0x76c094ac <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true,
    run_dtors=run_dtors@entry=true) at exit.c:83
#17 0x76afbe34 in __GI_exit (status=<optimized out>) at exit.c:105
#18 0x006cb15c in ST_Endoom() ()
#19 0x00e7dc3c in ?? ()
#20 0x00e7dc68 in ?? ()
#21 0x00e7d934 in ?? ()
#22 0x00fbc4c8 in VMExec_Unchecked::Exec(VMFrameStack*, VMOP const*, VMReturn*, int) ()
#23 0x00fbc56c in VMExec_Unchecked::Exec(VMFrameStack*, VMOP const*, VMReturn*, int) ()
#24 0x00fc4324 in VMCall(VMFunction*, VMValue*, int, VMReturn*, int) ()
#25 0x00e6e950 in DMenu::CallMenuEvent(int, bool) ()
#26 0x00e7073c in M_Responder(event_t*) ()
#27 0x00b33e80 in D_ProcessEvents() ()
#28 0x00b43570 in NetUpdate() ()
#29 0x00b46810 in TryRunTics() ()
---Type <return> to continue, or q <return> to quit---
#30 0x00b36374 in D_DoomLoop() ()
#31 0x00b3bcd8 in D_DoomMain() ()
#32 0x006c579c in main ()
(gdb) q

Also, GZDoom creates and deletes threads endlessly during execution, heres and extract:

Code: Select allExpand view
[Thread 0x720fe400 (LWP 10616) exited]
[New Thread 0x720fe400 (LWP 10617)]
[Thread 0x720fe400 (LWP 10617) exited]
[New Thread 0x720fe400 (LWP 10618)]
[Thread 0x720fe400 (LWP 10618) exited]
[New Thread 0x720fe400 (LWP 10619)]
[Thread 0x720fe400 (LWP 10619) exited]
[New Thread 0x720fe400 (LWP 10620)]
[Thread 0x720fe400 (LWP 10620) exited]
[New Thread 0x720fe400 (LWP 10621)]
[Thread 0x720fe400 (LWP 10621) exited]
[New Thread 0x720fe400 (LWP 10622)]
[Thread 0x720fe400 (LWP 10622) exited]
[New Thread 0x720fe400 (LWP 10623)]
[Thread 0x720fe400 (LWP 10623) exited]
[New Thread 0x720fe400 (LWP 10624)]
[Thread 0x720fe400 (LWP 10624) exited]
[New Thread 0x720fe400 (LWP 10625)]
[Thread 0x720fe400 (LWP 10625) exited]
[New Thread 0x720fe400 (LWP 10626)]
[Thread 0x720fe400 (LWP 10626) exited]
[New Thread 0x720fe400 (LWP 10627)]
[Thread 0x720fe400 (LWP 10627) exited]
[New Thread 0x720fe400 (LWP 10628)]
[Thread 0x720fe400 (LWP 10628) exited]
[New Thread 0x720fe400 (LWP 10629)]
[Thread 0x720fe400 (LWP 10629) exited]
[New Thread 0x720fe400 (LWP 10630)]
[Thread 0x720fe400 (LWP 10630) exited]
[New Thread 0x720fe400 (LWP 10631)]
[Thread 0x720fe400 (LWP 10631) exited]
[New Thread 0x720fe400 (LWP 10632)]
[Thread 0x720fe400 (LWP 10632) exited]
[New Thread 0x720fe400 (LWP 10633)]
[Thread 0x720fe400 (LWP 10633) exited]
[New Thread 0x720fe400 (LWP 10634)]
[Thread 0x720fe400 (LWP 10634) exited]
[New Thread 0x720fe400 (LWP 10635)]
[Thread 0x720fe400 (LWP 10635) exited]
[New Thread 0x720fe400 (LWP 10636)]
[Thread 0x720fe400 (LWP 10636) exited]
[New Thread 0x720fe400 (LWP 10637)]
[Thread 0x720fe400 (LWP 10637) exited]
[New Thread 0x720fe400 (LWP 10638)]
[Thread 0x720fe400 (LWP 10638) exited]
[New Thread 0x720fe400 (LWP 10639)]
[Thread 0x720fe400 (LWP 10639) exited]
[New Thread 0x720fe400 (LWP 10640)]
[Thread 0x720fe400 (LWP 10640) exited]
[New Thread 0x720fe400 (LWP 10641)]
[Thread 0x720fe400 (LWP 10641) exited]
[New Thread 0x720fe400 (LWP 10642)]
[Thread 0x720fe400 (LWP 10642) exited]
[New Thread 0x720fe400 (LWP 10643)]
[Thread 0x720fe400 (LWP 10643) exited]
[New Thread 0x720fe400 (LWP 10644)]
[Thread 0x720fe400 (LWP 10644) exited]
[New Thread 0x720fe400 (LWP 10645)]
[Thread 0x720fe400 (LWP 10645) exited]
[New Thread 0x720fe400 (LWP 10646)]
[Thread 0x720fe400 (LWP 10646) exited]
[New Thread 0x720fe400 (LWP 10647)]
[Thread 0x720fe400 (LWP 10647) exited]
[New Thread 0x720fe400 (LWP 10648)]
[Thread 0x720fe400 (LWP 10648) exited]
[New Thread 0x720fe400 (LWP 10649)]
[Thread 0x720fe400 (LWP 10649) exited]
[New Thread 0x720fe400 (LWP 10650)]
[Thread 0x720fe400 (LWP 10650) exited]
[New Thread 0x720fe400 (LWP 10651)]
[Thread 0x720fe400 (LWP 10651) exited]
[New Thread 0x720fe400 (LWP 10652)]
[Thread 0x720fe400 (LWP 10652) exited]
[New Thread 0x720fe400 (LWP 10653)]
[Thread 0x720fe400 (LWP 10653) exited]
[New Thread 0x720fe400 (LWP 10654)]
[Thread 0x720fe400 (LWP 10654) exited]
[New Thread 0x720fe400 (LWP 10655)]
[Thread 0x720fe400 (LWP 10655) exited]
[New Thread 0x720fe400 (LWP 10656)]
[Thread 0x720fe400 (LWP 10656) exited]
[New Thread 0x720fe400 (LWP 10657)]
[Thread 0x720fe400 (LWP 10657) exited]
[New Thread 0x720fe400 (LWP 10658)]
[Thread 0x720fe400 (LWP 10658) exited]
[New Thread 0x720fe400 (LWP 10659)]
[Thread 0x720fe400 (LWP 10659) exited]
[New Thread 0x720fe400 (LWP 10660)]
[Thread 0x720fe400 (LWP 10660) exited]
[New Thread 0x720fe400 (LWP 10661)]
[Thread 0x720fe400 (LWP 10661) exited]
[New Thread 0x720fe400 (LWP 10662)]
[Thread 0x720fe400 (LWP 10662) exited]

...This thread creation and destruction goes on, endlessly...

Re: [LINUX/ARM] Segfault on exit

PostPosted: Fri Mar 30, 2018 2:43 am
by _mental_
You should ask Linux developers about thread creation
Code: Select allExpand view
(gdb) b pthread_create
Breakpoint 1 at 0x7ffff78ae990: file pthread_create.c, line 505.
(gdb) c
[Switching to Thread 0x7fffe8032940 (LWP 4469)]

Thread 8 "gzdoom" hit Breakpoint 1, __pthread_create_2_1 (newthread=newthread@entry=0x7fffe8031e28, attr=attr@entry=0x5555573f2bb8, start_routine=start_routine@entry=0x7ffff76a3f30 <timer_sigev_thread>,
    arg=0x7fffbc001230) at pthread_create.c:505
505   pthread_create.c: No such file or directory.
(gdb) bt
#0  __pthread_create_2_1 (newthread=newthread@entry=0x7fffe8031e28, attr=attr@entry=0x5555573f2bb8, start_routine=start_routine@entry=0x7ffff76a3f30 <timer_sigev_thread>, arg=0x7fffbc001230)
    at pthread_create.c:505
#1  0x00007ffff76a3eb2 in timer_helper_thread (arg=<optimized out>) at ../sysdeps/unix/sysv/linux/timer_routines.c:120
#2  0x00007ffff78ae6ba in start_thread (arg=0x7fffe8032940) at pthread_create.c:333
#3  0x00007ffff64fb41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

As for crashes I cannot reproduce them on x64. You can try to build with WITH_ASAN CMake option if it's supported.

Re: [LINUX/ARM] Segfault on exit

PostPosted: Fri Mar 30, 2018 4:34 pm
by vanfanel
@_mental_: game won't even run on the Pi when built with "-DWITH_ASAN=1", it just takes on all the CPU according to TOP, and never shows up.
Don't you guys have a Pi3 to test? I am willing to donate.

Anyway, built on X86_64 with debug symbols, and tried to run under Valgrind, but there are too many errors to list. Could you run Valgrind on it, please?

Re: [LINUX/ARM] Segfault on exit

PostPosted: Sat Mar 31, 2018 1:36 am
by _mental_
Sorry but I'm not interested on spending hours debugging issues of uncertain origin on low perf system like Pi.
Without rich dev environment it's a futile effort IMHO.

Valgrind gives shitload of complains on libSDL2 code and it's pretty useless without establishing fine tuned filtering for its output.
I'm using Ubuntu 16.04 x64 inside VM and bunch of issues Valgrind reported are from VirtualBox OpenGL layer.
We need a volunteer with a real Linux installation and willingness to dig into such things.

Please note that if there are so many issues indeed, GZDoom will crash very often on Linux.
But it doesn't. Almost anything reported was platform-independent errors, usually in recently changed code.

Re: [LINUX/ARM] Segfault on exit

PostPosted: Sat Mar 31, 2018 2:45 am
by Rachael
Does this still happen if you run with "-nosound"?

Re: [LINUX/ARM] Segfault on exit

PostPosted: Sat Mar 31, 2018 11:23 am
by vanfanel
@Rachel: Yes, -nosound makes no difference regarding these segfaults on exit.
Are you doing the Pi3 builds?

Re: [LINUX/ARM] Segfault on exit

PostPosted: Sat Mar 31, 2018 1:17 pm
by Rachael
Yes I am the one compiling the official builds, but my skills with debugging are nowhere near on par with _mental_'s or I would have looked into this issue myself.

I do not know what the problem is - and I suspect the sound code is related but I do not know if it's the actual cause.

Re: [LINUX/ARM] Segfault on exit

PostPosted: Sun Apr 01, 2018 8:29 am
by vanfanel
@Rachael: I have also tried QZDoom and I have the same problems on quit (I think that's your sourceport).
I believe I'll have to stick to old versions then... or use the old ZDoom or Chocolate Doom. The real problem with these segfaults on exit is that they block the Pi in practice, since SDL2 is in control of the TTY keyboard and when the game crashes on exit it doesn't free up the keyboard.

Re: [LINUX/ARM] Segfault on exit

PostPosted: Sun Apr 01, 2018 2:30 pm
by Rachael
That's fine.

A little trick I've learned is to Ctrl-Z out of the app while in the affected terminal, and then do "kill -9 %1" to kill it.

Re: [LINUX/ARM] Segfault on exit

PostPosted: Mon Apr 02, 2018 8:16 am
by _mental_
If you can reliably reproduce the problem with the recent code but it wasn't there before, could you please find the stable version when it began?
Ideally, you can do git bisect to find a faulty commit. Although, I suspect this will require a lot of time.
Are you building on Pi itself? Cross-compilation on multi-core desktop PC should be much faster. I'm not 100% sure that it works at the moment though.

Re: [LINUX/ARM] Segfault on exit

PostPosted: Mon Apr 02, 2018 8:40 am
by vanfanel
@_mental_: I have been doing precissely that, but it takes ~30 min to build on a Pi3 with -j6.
Also, noticed that with latest stable SDL2 (2.0.8) crashes at exit happen with every version back to 3.2.0...

Re: [LINUX/ARM] Segfault on exit

PostPosted: Mon Apr 02, 2018 8:51 am
by _mental_
The problem is that a potential memory overwrite may lead to arbitrary number of other crashes.
Please do no alter source code in any way because we are really interested to find out the cause of the first segfault.

And I'm not quite sure what was the first version with ARM support.
I would suggest to try it first in order to exclude bugs in third-party libraries.

Re: [LINUX/ARM] Segfault on exit

PostPosted: Mon Apr 02, 2018 9:06 am
by Rachael
GZDoom had no ARM support for a very long time, the bisect would have to go back onto ZDoom's repository instead, or at least use ZDoom-specific commit numbers.

Re: [LINUX/ARM] Segfault on exit

PostPosted: Mon Apr 02, 2018 9:47 am
by vanfanel
I have gone as far as to build the last ZDoom sources from git:

..and it also crashes on quit with latest stable SDL 2.0.8 on the Pi.

Re: [LINUX/ARM] Segfault on exit

PostPosted: Mon Apr 02, 2018 12:50 pm
by _mental_
Just out of curiosity, can you downgrade SDL to 2.0.4 or something like that?

Also, you can try to build ZDoom 2.8 because it was the first stable version after switching to SDL2.
Have no idea about status of SDL1 on Pi.