Intermittent but fairly frequent crashes caused by memory access violation.

(This is a slighly modified version of a support mail I just sent to support, Support ticket #32397
I’m posting it here to see if any other Unity developers has any ideas/help.)

Hello,

We’re currently at our wits end with an intermittent crash bug that we can’t figure out.
It can occur at pretty much any time but seems to happen mostly when playing on a server (though unfortunately not often enough on our machines to be able to test effectively).
The machines in question might be somehow related to the crash since it crashes more frequently on some and some of our team and community has never had it crash but we haven’t been able to find any specific drivers/configurations that seems to cause it.

The crash doesn’t generate any Windows-crash logs and the only thing in the Unity log is Crash!!! followed by the loaded dlls but the stack trace is completely empty.

Microsoft Event Viewer only says that the application has hanged and that the faulting module is ntdll.dll (which I guess is where Unity requests memory access to an address it’s not allowed to).

By using Windows Debugging Tools we’ve found that the stack trace for the crash is:

Function
InterstellarMarines!RenderSettings::VirtualRedirectTransfer+30e1
InterstellarMarines!JobQueue::Exec+58
InterstellarMarines!JobQueue::ProcessJobs+8a
InterstellarMarines!JobQueue::WorkLoop+42
InterstellarMarines!Thread::RunThreadWrapper+2d
kernel32!BaseThreadInitThunk+e
ntdll!__RtlUserThreadStart+70
ntdll!_RtlUserThreadStart+1b

Somehow an access violation is caused by this.
In InterstellarMarines__PID__8908__Date__12_04_2015__Time_03_09_25PM__732__First chance exception 0XC0000005.dmp the assembly instruction at InterstellarMarines!RenderSettings::VirtualRedirectTransfer+30e1 in C:\0.5.23\InterstellarMarines.exe has caused an access violation exception (0xC0000005) when trying to read from memory location 0x00000253 on thread 12

Running the dump-files we can see in the disassembly that:

GenerateCombinedDynamicVisibleListJob:
00C2B4A0 push ebp
00C2B4A1 mov ebp,esp
00C2B4A3 push 0
00C2B4A5 push 1CE8EA8h
00C2B4AA call profiler_begin (0E3D270h)
00C2B4AF mov eax,dword ptr [ebp+8]
00C2B4B2 add esp,8
00C2B4B5 cmp byte ptr [eax+4Ch],0
00C2B4B9 je GenerateCombinedDynamicVisibleListJob+49h (0C2B4E9h)
00C2B4BB mov ecx,dword ptr [eax+4ACh]
00C2B4C1 mov ecx,dword ptr [ecx+254h]
00C2B4C7 lea edx,[eax+4CCh]
00C2B4CD add eax,4B8h
00C2B4D2 push edx
00C2B4D3 push eax
00C2B4D4 lea eax,[ecx+2F0h]
00C2B4DA push eax
00C2B4DB mov eax,dword ptr [ecx+33Ch] <— This is where the access violation occurs.
00C2B4E1 call SetupShadowCullData+640h (0C2B2C0h)
00C2B4E6 add esp,0Ch
00C2B4E9 pop ebp
00C2B4EA jmp profiler_end (0E3DAD0h)
00C2B4EF int 3

Unfortunately this doesn’t help us very much since there’s none of our own methods that we can see and googling hasn’t turned up any answers either.
We’ve spent a few weeks on this issue but since it’s also a very intermittent bug it takes a long time to see if a potential fix works or not (and nothing we’ve tried so far has fixed it either).
Our previous update didn’t have this issue and we’re using the same version of Unity as that build (5.1.3p1) so we’re expecting that the issue is on our side.
We’ve tried reverting all assets and scripts with no luck, however, currently only our Assets-folder and ProjectSettings-folder are under version control and we’ve been copying the project back and forth so there could be Unity-files that we haven’t been able to revert.
For the first time we’ve also been on different Unity-builds while developing so perhaps that might affect it (the build computer is/has been on the same version (5.1.3p1) the entire time but different members of the team have tried later versions).
We’ve also tried divide-and-conquering our commits but without luck.

It crashes most often on Linux, we are not very experienced in Linux development though so debugging there hasn’t given us very much (and we’re not even a hundred percent sure it’s the same crash).
What it does tell us is:

“Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.”

and

“mmap() failed: Cannot allocate memory”.

Is there any help, guidance or advice you could offer?
Have you had any other reports of similar crashes (no info in crash log, no windows logs just a crash)?

If there’s any info you need from us (logs, dumpfiles, etc.), please just ask (the project is over 40GB even without the subversion-files though so might be a problem to send).
(For the Unity-answers crowd, unfortunately we can’t share the project with you but I’ll try to answer any questions that arise as thoroughly as possible).

Thank you very much for any possible help.

Nikolas Grandin,
Programmer,
Zero Point Software
@nikograndin

Without being an expert i would try to:

Remove as many memory allocations as possible.

Make a list of what allocates memory.

Try to disable systems to rule them out.

Possible export systems to isolated project and test.

Code review “inhouse systems” to check for blunders. Put logs on efter and exit of suspected functions.

Rollback to last (for you) stable unity version

Update to next stable version and test.

Doubt everything.

Good luck.

Thank you for the tips and help.

Looking into all the memory allocations and removing/disabling them is definitely a step we will need to take at this time.

We’ve tried disabling everything that is new for the crashing update but with no luck (as well as reverting back).

The project is unfortunately very big and quite a few systems are too interconnected to be able to easily export or disable completely but we might have to bite that bullet as well.

Today we’re waiting for the 5.3 release to see if that helps. The stable 5.2 releases unfortunately have a few bugs (reported and most of them fixed in patch releases) that affect us so we can’t use that.

This bug has taught us to doubt everything to the point where I’m not sure what reality is :slight_smile:
Every time we think we have a lead or testing seems to go well the bug smacks us back down to and below earth.

Again, thank you very much for the tips.

The crash is happening in the Unity->Umbra interface. You should focus your efforts on creating a reliable repro using just your Umbra setup (so you can cut a lot of complexity out).

@nikograndin
So by searching for this term: GenerateCombinedDynamicVisibleListJob I came to this post here. Have you got any success with it? I think I’m having the exact same bug and it is related to this error message which shows up in the editor: results->dynamicBounds.empty() right before unity crashes. That can happen at any time in the game… after 5min or 20min. There’s also a forum Thread regarding this issue:
http://forum.unity3d.com/threads/results-dynamicbounds-empty-error.325138/
in the output log I’m getting this in the stack trace part:

================

0x00F12B42 (Game) [c:\buildslave\unity\build\runtime\camera\shadowculling.cpp:602] GenerateCombinedDynamicVisibleListJob
0x00F74B7B (Game) [c:\buildslave\unity\build\runtime\jobs\internal\jobqueue.cpp:345] JobQueue::Exec
0x00F74CE1 (Game) [c:\buildslave\unity\build\runtime\jobs\internal\jobqueue.cpp:717] JobQueue::ProcessJobs
0x011140ED (Game) [c:\buildslave\unity\build\runtime hreads hread.cpp:40] Thread::RunThreadWrapper
0x7674338A (kernel32) BaseThreadInitThunk
0x77259A02 (ntdll) RtlInitializeExceptionChain

===================

as well as an access violation pretty much like yours.
If you have figured something out, I would really appreciate if you could share it, because I’m pretty lost here…

We are having similar crashes in our game as well. Has anyone discovered a fix for this? Most of our crashes have a stack trace like this:

SetupShadowCullData(class Camera &,class Vector3f const &,struct ShaderReplaceData const &,struct SceneCullingParameters const *,struct ShadowCullData &) Unknown
GenerateCombinedDynamicVisibleListJob(struct CullResults *) Unknown
JobQueue::Exec(struct JobInfo *,__int64,int) Unknown
JobQueue::ProcessJobs(void *,bool *) Unknown
JobQueue::WorkLoop(void *) Unknown
Thread::RunThreadWrapper(void *) Unknown
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown

Have you fixed that? We meet the same crash in our game.