[4.0.0, possibly earlier] "Invalid Instruction: mov"

These bugs do plan to be resolved, when they can be.

Moderator: GZDoom Developers

User avatar
Matt
Posts: 9696
Joined: Sun Jan 04, 2004 5:37 pm
Preferred Pronouns: They/Them
Operating System Version (Optional): Debian Bullseye
Location: Gotham City SAR, Wyld-Lands of the Lotus People, Dominionist PetroConfederacy of Saudi Canadia

[4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by Matt »

Seemingly totally at random I'd get this popup window and the game fails to start.

Does anyone have any idea what this even means?
User avatar
phantombeta
Posts: 2119
Joined: Thu May 02, 2013 1:27 am
Operating System Version (Optional): Windows 10
Graphics Processor: nVidia with Vulkan support
Location: Brazil

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by phantombeta »

That means something went wrong when jitting a function into x86_64 code. Does it say anything else?
User avatar
Matt
Posts: 9696
Joined: Sun Jan 04, 2004 5:37 pm
Preferred Pronouns: They/Them
Operating System Version (Optional): Debian Bullseye
Location: Gotham City SAR, Wyld-Lands of the Lotus People, Dominionist PetroConfederacy of Saudi Canadia

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by Matt »

Not that I'm aware of. It seems to be completely random and I've never had it happen twice in a row if I try to run GZDoom again with exactly the same command line parameters.

EDIT: i suppose if it were completely random I'd have had a twofer by now...
User avatar
Caligari87
Admin
Posts: 6191
Joined: Thu Feb 26, 2004 3:02 pm
Preferred Pronouns: He/Him

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by Caligari87 »

I got it as well a few days ago and was unable to reproduce it.

8-)
User avatar
Player701
 
 
Posts: 1686
Joined: Wed May 13, 2009 3:15 am
Graphics Processor: nVidia with Vulkan support

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by Player701 »

Sorry for the bump. This bug apparently still exists as of GZDoom 4.2.4, because just a few days ago, during a routine test of my mod, GZDoom crashed immediately with the following error:

Code: Select all

Invalid instruction: test <None>, <None>
EDIT: Here's another one I've got just now:

Code: Select all

Invalid instruction: mov <None>, qword [rbp+16]
Since I resumed work on my project, I've also got a few address zero VM aborts, which, judging by the stack traces, happened in completely random places in my code where it was simply not possible for such an error to happen. This issue has already been reported here, but we couldn't see stack traces before. Now that I do see them, they don't tell me anything useful at all.

I've also got an unexpected JIT menu error once, it looks like it has already been reported here. Unfortunately, I don't have the exact error message, but if I get it again, I will post it in the corresponding thread.

It might be possible that all these issues are connected to each other somehow, since they all either are JIT-related or have started to appear since the JIT merge. However, they seem to be extremely rare, and there is no definite means to reproduce any of them reliably.

I can rig an automation script to repeatedly run GZDoom in a loop and leave it running overnight; if it crashes, the window with the error message will remain, and I will be able to report it here. This way, I can catch invalid instruction and address zero errors. However, I have no idea if the exact error messages/stack traces will be of any use to the developer team. What if I run a debug build of GZDoom instead? If it crashes like this, will it be possible to retrieve any useful debugging information from the process when it has already errored out?
User avatar
phantombeta
Posts: 2119
Joined: Thu May 02, 2013 1:27 am
Operating System Version (Optional): Windows 10
Graphics Processor: nVidia with Vulkan support
Location: Brazil

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by phantombeta »

Player701 wrote:Sorry for the bump. This bug apparently still exists as of GZDoom 4.2.4, because just a few days ago, during a routine test of my mod, GZDoom crashed immediately with the following error:

Code: Select all

Invalid instruction: test <None>, <None>
EDIT: Here's another one I've got just now:

Code: Select all

Invalid instruction: mov <None>, qword [rbp+16]
This is kinda known. The bug isn't easily reproduced, so it's pretty hard to try to fix it. This is also very likely to be a bug in the AsmJit library itself, rather than in GZDoom's JIT code.
Since I resumed work on my project, I've also got a few address zero VM aborts, which, judging by the stack traces, happened in completely random places in my code where it was simply not possible for such an error to happen. This issue has already been reported here, but we couldn't see stack traces before. Now that I do see them, they don't tell me anything useful at all.
Unfortunately, without a stack trace, there's no way to tell if it might be the same bug. Please do post it if you get one.
It might be possible that all these issues are connected to each other somehow, since they all either are JIT-related or have started to appear since the JIT merge. However, they seem to be extremely rare, and there is no definite means to reproduce any of them reliably.?
Extremely unlikely. The bug this thread was made to report is a bug where it fails to output valid x86_64 code, while Nash's bug report seems to be something possibly related to order of execution that may already have existed before the JIT was added, but went unnoticed because the VM can be more lenient than the JIT when garbage data (specially pointers) is involved.
(Why this code generation bug happens is unknown, but it's known that the issue itself is pretty rare. Like I said above, it's most likely a bug in the AsmJit library itself.)

[Edit]: Moving this one to On Hold Bugs too so it doesn't get lost either.
User avatar
Player701
 
 
Posts: 1686
Joined: Wed May 13, 2009 3:15 am
Graphics Processor: nVidia with Vulkan support

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by Player701 »

phantombeta wrote:
Since I resumed work on my project, I've also got a few address zero VM aborts, which, judging by the stack traces, happened in completely random places in my code where it was simply not possible for such an error to happen. This issue has already been reported here, but we couldn't see stack traces before. Now that I do see them, they don't tell me anything useful at all.
Unfortunately, without a stack trace, there's no way to tell if it might be the same bug. Please do post it if you get one.
I can do that, but so far the error has always happened in mod code, so I'm not sure if such a stack trace would be of any use. I can, however, guarantee that this is not a mod bug, because 99.9% of the time it runs fine, and then once in a blue moon this VM abort happens, but my code doesn't involve any kind of randomness that could result in such an error, and the exact place where it aborts is always different.
dpJudas
 
 
Posts: 3134
Joined: Sat May 28, 2016 1:01 pm

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by dpJudas »

Those errors are most likely caused by an uninitialized variable. Either in GZDoom's code or in Asmjit. Unfortunately it will be very difficult to track down as asmjit doesn't detect invalid instructions at insertion point. The call stack when it throws the exception will therefore be useless.

I am working on a compiler backend in a private project that I may open source to use it in GZD. That should solve the problem along with some issues I don't like about our current JIT implementation. However, that still isn't quite ready yet.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49184
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by Graf Zahl »

Now you really made me curious... ;)
dpJudas
 
 
Posts: 3134
Joined: Sat May 28, 2016 1:01 pm

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by dpJudas »

I have a script compiler that I originally wrote to output to LLVM. Then due to the issues we also had when I used it for GZD I wrote my own backend that was more or less API compatible with the IRBuilder in LLVM. So it is like a mini-llvm with the same general compiler strategy.

In my current version of the IR backend I'm using asmjit for register allocation and lowering it to x64 opcodes, but I'm working on writing my own register allocator and x64 asm writer. Once I'm done with that I'll have a complete compiler backend that can JIT with no external dependencies.

For GZD this means I can actually do certain things that I kind of hacked asmjit into doing: providing unwind info to the OS. It will also allow me to actually code optimization passes and get rid of the 256 virtual register limit that asmjit has. I wish I could have added these things to asmjit, but its code is written in such a way that I have no idea how to even begin writing a proper PR for it.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49184
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by Graf Zahl »

dpJudas wrote:Ibut its code is written in such a way that I have no idea how to even begin writing a proper PR for it.
Welcome to the club! I got the same problem with a certain other project I'm working on (I guess you know what I mean ;)), it's also written in a way that makes it very, very hard to implement stuff in a sane manner.

Regarding JIT in general, it's really a shame that there's no way to create Visual Studio debugger info for scripted content - if that existed a lot more of the engine could be scriptified.
User avatar
Nash
 
 
Posts: 17465
Joined: Mon Oct 27, 2003 12:07 am
Location: Kuala Lumpur, Malaysia

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by Nash »

Something I failed to mention in the past; I noticed that these "mysterious errors" starting popping up after the level rewrite was merged into master. And I mean the whole bunch - invalid instruction mov, unexplainable VM aborts, etc. I can say with 90% certainty that these things have never happened before said merge.

Edit for clarification: I am not blaming the level rewrite, perhaps it's not related at all... but I remember clearly _when_ these started manifesting.
dpJudas
 
 
Posts: 3134
Joined: Sat May 28, 2016 1:01 pm

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by dpJudas »

Nash wrote:Something I failed to mention in the past; I noticed that these "mysterious errors" starting popping up after the level rewrite was merged into master. And I mean the whole bunch - invalid instruction mov, unexplainable VM aborts, etc. I can say with 90% certainty that these things have never happened before said merge.

Edit for clarification: I am not blaming the level rewrite, perhaps it's not related at all... but I remember clearly _when_ these started manifesting.
I don't think the level rewrite can cause this. If anything, it was some other scripting backend related change during the same period that started it. The most likely candidate to the error is either that A) the JitCompiler compiler class receives a VM register index that is out of bounds, B) it itself fails to initialize an asmjit virtual register, or C) asmjit messes up its internal state.

We could add some validation for it in the JitCompiler, but I'd rather invest my time on my own IR backend. The unwind code in GZD is sort of a ticking time bomb in the sense that its extremely low level, IMO should be done by asmjit, and doesn't seem to become a feature there unless I add it myself (which I can't).
Graf Zahl wrote:Regarding JIT in general, it's really a shame that there's no way to create Visual Studio debugger info for scripted content - if that existed a lot more of the engine could be scriptified.
I'm actually not sure if that is impossible or not. Visual Studio is able to display the call stack for .net JIT code. The big question here is whether they implemented that in some .net specific way, or if it looks for a HMODULE header next to the function table. If it is the latter then it might be possible to give the functions names and reference source files.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49184
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by Graf Zahl »

Nash wrote:Something I failed to mention in the past; I noticed that these "mysterious errors" starting popping up after the level rewrite was merged into master. And I mean the whole bunch - invalid instruction mov, unexplainable VM aborts, etc. I can say with 90% certainty that these things have never happened before said merge.

Edit for clarification: I am not blaming the level rewrite, perhaps it's not related at all... but I remember clearly _when_ these started manifesting.

I also do not believe that the level rewrite messed things up, it never interacts with the VM's innards. The only thing I know is that it revealed some major architectural issues with the event handling but that's on a far higher level.
User avatar
phantombeta
Posts: 2119
Joined: Thu May 02, 2013 1:27 am
Operating System Version (Optional): Windows 10
Graphics Processor: nVidia with Vulkan support
Location: Brazil

Re: [4.0.0, possibly earlier] "Invalid Instruction: mov"

Post by phantombeta »

Graf Zahl wrote:I also do not believe that the level rewrite messed things up, it never interacts with the VM's innards. The only thing I know is that it revealed some major architectural issues with the event handling but that's on a far higher level.
Not for the JIT errors, but it would explain the random VM abort at level start.

By the way, I think I may have identified that bloody menu error.
This line checks the previous opcode by doing "pc - 1". While usually there would be stuff before it, I'm guessing in some rare case (perhaps a function call with no arguments at the start of a function), it's the first thing in the code, so "pc - 1" ends up in random data, and that random data sometimes ends up being equal to OP_VTBL. It should probably have some check to see if "pc" is at the start of the code, I guess.

Return to “On Hold Bugs”