Heap Corruption Detection

We are looking for guidance about strategies to find the cause of a crash we are getting. Suggestions about how to speed up the reproduction rate would be helpful too because it’s inconveniently rare.

The bug seems to involve heap corruption (eventually a hard crash without our debug checks). Our code only detects the error when reading sentinels at boundaries of memory we have allocated during our safety checks. Due to burst we are in fact using quite a lot of unsafe code. This includes our own container types - but we have added our own boundaries checks in there and they do not trigger.

We’ve never had it even while debugging or in Editor, so our only strategies until now are: (1) adding logs (2) disabling code and comparing the results. Additionally we have added logs that demonstrate it’s not happening in the main thread (we are checking memory and logging in the main thread between which the error occurs).

Due to not having experienced this in the Editor, the lack of ENABLE_UNITY_COLLECTIONS_CHECKS outside the Editor prevents some out-of-range checks we wanted to use. To solve this so far we made a local version of Collections package and removed the conditional attribute on any of functions.

It appears to happen only when target framerate is high (>= 240fps), this might just make it more likely to happen due to more frames in total but we didn’t manage to reproduce it at all without that setting.

During GC, we see lots of first chance Access violation exceptions in Finalizer thread (“0xC0000005: Access violation writing location 0x0000000000000000.”). Is that something expected or is that something we should investigate?

Any suggestions would be helpful, thanks.

Do you happen to know which pool of memory gets corrupted? Is it engine allocated native memory? Managed heap? System allocated (as in, not allocated by Unity) memory?

That is not expected at all. Could these be null reference exceptions?

Thanks for the reply!

Basically the corruption we find is in memory allocated by
UnsafeUtility.Malloc() by our code, our own native containers (that’s the memory we are checking against at least).

In case it matters we have been using alignment 16 I think.

As for the GC aspect, I was trying to catch the original issue using WinDbg, and the access violation exceptions appear to be handled but nonetheless are displayed as I pasted. But I wanted to mention it in case it’s relevant.

UnsafeUtility.Malloc() gets allocated using our memory manager, which supports being run in “debug” mode. That does several things:

  1. It allocates an extra page after the allocation, marks it non-accessible and places the allocated region right before the non-accessible page. If you accidentally index past the end of the allocated region, that makes sure that it results in a crash.
  2. After freeing the memory, it only decommits the allocated pages but doesn’t release them to the OS for reuse. Which means that future allocations cannot reuse the same address and thus any attempts to access a memory address after it’s been deallocated will result in a crash as well.

You can enable this mode by passing “-debugallocator” as command line parameter to the built game. You can also use it in the editor but I wouldn’t recommend it as it makes the editor extremely slow.

Thank you for that info. I think I heard about that parameter before but for some reason didn’t realise it could be used in the actual built game. I will try it like that and see what the result is.

Aside from that I’ve been trying to increase the frequency of reproduction of the crash, but apart from increasing the target framerate I haven’t had much success.

I have been running with that command line argument however I haven’t really seen any difference (well I haven’t reproduced the original crash since) - but is there any way I can verify that it’s actually doing something? Currently it feels slower but I can’t be sure if it’s really behaving differently. Or should it just crash now as soon as it’s deallocating something invalid?

I did get one crash dump so far today but I’m not sure if it’s related or not (during shutting the application I think). I can send it if there’s the possibility of getting some more info from it?

I can take a look at that dump.

Regarding checking if it’s doing anything… perhaps try writing to a pointer that’s outside the bounds of some allocated array?

Thanks. Where is a good place to transfer the dump to? It’s around 2GB

Put it in a zip/7z file, that should shrink it about 10-20x. Then you can upload to any kind of file share (like dropbox, google drive, onedrive, etc) and PM me the link.