Game Crashes because of Memory Issues - but not because of our Scripts

Hello everyone,

we are a studio with a released game and active playerbase.
These crashes have not been reproduced in the studio, but we get regular reports from players.

We have a system to automatically upload crash reports. I went through the 50 most recent crash reports and for every report I noted which additional info I found in the log that could be responsible. These are either from the stacktrace or logs that happened just before the crash occurred:

Again in text, for easier copy-paste:

ONE ISSUE
15 no stacktrace or other info
11 system out of memory (managed memory or nativearrays or computebuffers are <3 gbs almost always)
6 Renderer.GetMaterialArray
4 stacktrace only in unity code, not in our c# code, no other info
4 d3d11 out of memory (code 0x8007000e)
2 custom native library issue
1 custom native library issue
1 custom native library issue
1 d3d11 device loss (code 80070057)
COMBINED ISSUES
2 system out of memory AND UnityEngine.Terrain:set_materialTemplate
2 system out of memory AND d3d11 out of memory (code 0x8007000e)
1 Renderer.GetMaterialArray AND d3d11 out of memory (code 0x8007000e)

While I am interested in fixing all of these issues, it seems like at least 20 out of the 50 crashes are clearly related to memory. This also matches our player reports that from time to time, the game balloons to 30GB of memory usage. Normal memory usage is only a couple of GB.

In order to investigate the abnormal memory usage, we have started logging memory statistics.
We use System.GC.GetTotalMemory(false) to figure out how much managed memory or C# code uses. It’s always between 0.5GB and 2GB, no abnormal behaviour to be found.

This function however does not track ComputeBuffers and NativeArrays (we do not use any other native Unity collections or unsafe allocations anywhere else). So, we have created memory management facade class and do all allocations through that class. This class tracks all allocations and disposes and gives us an overview of the amount of native memory we have allocated. It seems to stay below 1GB always, not abnormal behaviour.

We do have one native C++ library in use. Without going into detail, it seems unlikely that this is the culprit (well known, well tested 3rd party lib and was used before the crash reports came in).

We tried logging total process memory consumption, but so far haven’t been successful, see here . If you got any tips on how to do that, feel free to let us know.

So now the question is what other kinds of memory does Unity allocate and how to track down what is creating the issue? From all the graphics related errors it seems that maybe Unity is holding too many textures or allocating and not releasing by mistake? We do use the virtual texturing system, maybe it’s an issue there?

So please, if you have any ideas at all about how to find out these things, share your thoughts:

  • Besides managed memory, ComputeBuffers, NativeBuffers, what other memory does Unity allocate?
  • How to find stats about Unity memory allocation at runtime so we can log them and observe unusual behaviour?
  • The memory profiler in the profiler and window as well as the memory profiler package seem not to be available at runtime. Are parts of it accessible through scripting maybe?
  • How to get more information from crash reports, especially since we can’t reproduce them in the studio. We can outfit our game with all kinds of logging and reporting capabilities if necessary.
  • Do you know any way to report total process memory usage that works in Unity?

If you have any questions about the information above, let me know and I will clarify and edit.

Thanks so much for any thought, tip, or advice.

This is what a typical out of memory log looks like before a crash:

The only suggestion I can offer, is whatever you declare your minimum system requirements to be, is have an old crappy Laptop that meets those specs. And test with that(the best you can).

Other than that tip, I have no information on your issue, sorry :frowning:

If your users get crashes and not you then one thing to do is to check the GFX drivers and see if they are up to date.

Whatever you are contemplating loading is requesting a single 335mb chunk of memory.

That’s a LOT of memory on just about any system!!!

You can use the profiler to find out where you’re spending all this memory.

Windows → Analysis → Profiler

As already suggested above, if you intend to run on generally-available hardware, then it is important to test on some down-market hardware from time to time. If you only test on high-end gaming hardware, you can really only expect your game to run on that type of hardware.

Thanks everyone for all the suggestions so far!

Yes, you are right that we need to test on lower end systems from time to time. The issue is, even users with beefy hardware get these issues (very high memory usage - I guess they crash less often though).
And when the game typically requires only a couple gigabytes and then spontaneously balloons to 30GB or more, that is a sign of a bug, not normal operation - we’d need to reproduce or gather information about this allocation bug, no matter which system.

In the screenshot it says 33,570,818 B, which should be 33MB texture.
I assume it’s a heightmap for the terrain (4097x4097 pixels with 2 bytes per pixel is exactly 33,570,818). We do sometimes change the terrain shape during runtime.

Unfortunately, I don’t think I can use the profiler with a built game. We have not yet been able to reproduce the issue ourselves, but we get regular crash reports.

It would be interesting to log the [ALLOC DEFAULT] used and reserved memory during runtime.

We know Unity tracks it because it appears in the logs when there is a memory issue.
Apparently since Unity 2021 LTS you can configure the allocators’ behaviour and more importantly, you get some stats after the game quits: Forum Thread
But in this forum thread people have asked for a function to get these infos during runtime and it doesn’t seem like it has been exposed (yet?).

You’re right, my bad… I only glanced at it and counted the digits, and counted the “B” as an “8”. :slight_smile:

Another thing to mention, I just watched the Unite 2016 vid on Scriptable Objects, and it mentions if you don’t destroy a scriptable object properly(references?) the C++ side will delete, but not the C# side, so data will hang around in the heap and never get deleted/garbage collection.

Not sure of your code structure, but would that be relatable?

Ohh, when I started playing around with using textures as data, I did see the warnings of if a texture is too big. I’ll have to find the post where someone said a clever way on cutting the image(within loader) and doing it in chunks. Sounded like way too much work for myself, lol, so I just kept my stuff small.

Update:

MartinTilo replied in the previously linked thread about memory profiling with a tip to record memory usage at runtime:

I’ll integrate these reports and see if we can narrow down the problematic area. Keeping this open for now just in case there are any other tips :slight_smile:

As you mentioned, yes, allocator specific counters are not available through API or ProfilerRecorder, but the counters “Total Used Memory” and “Total Reserved Memory” are available, also in release builds, and do encompass the Default Allocator.

23 GB in that allocator alone is probably a bit much. And that is not just compute buffers (which also have a GFCI alloc side iirc) or NativeCollections. It also encompasses the native memory for all sorts of Unity Objects (GameObjects & Components, Assets, Managers) and the CPU side of graphics allocations (i.e. the CPU readable copy retained for read/write enabled Assets).

You can only attach the Profiler and Memory Profiler to a Development build and some of the APIs, including to record profiler data to disk or taking a memory snapshot are also only available in development builds. But they are available in those builds and not just the Editor (unless that’s what you meant).

If you have a user that’s happy to assist, you could give them a development build where they, or your memory stats monitoring code, could trigger a snapshot to disk via this API. They can then send you this snapshot for investigation.

It’d be a bit easier if you could reproduce higher memory usage locally though. It might not crash on your machines but maybe some part of the issue can be deduced from analysing memory snapshots around locally reproducible memory spikes. Looking at the details of a snapshot is more likely to point out some potential issues like leaks than just looking at high-level counters.

I think you are referring to some of the unexpected side-effects of how Unity Objects behave in memory, and about Leaked Managed Shells, but you’re confusing some matters here.

Yes, if you instantiate, new or otherwise create an instance of an object who’s managed type inherits from UnityEngine.Object (but not from GameObject or Component, those live in Scenes and will be unloaded with those, which usually takes care of them “leaking” or what least makes it obvious enough), in other words Asset Type objects, then you are very much responsibile for calling Destroy on it.

If you don’t, you leak their Managed Shell AND their native backing memory.

If nothing references either the shell, nor the native object, then Resources.UnloadUnusedAssets will remove them from memory. That process is time consuming (more so than a simple Destroy) and does not happen automatically as you run out of memory, but only on destructive scene unloads and when called explicitly. So leaking these native objects is a pretty simple way to run out of native memory and crash.

Then, I even if the object is Destroyed, you might still hold managed references to the Shell and leak that. It’s usually a small amount of memory but always with the potential to hold managed references to other Unity Objects who still do have native memory and won’t be unloaded because of the leaked shell. Version 1.1.x of the memory Profiler makes it easy to search for these leaked shells, but I suspect that in this cases, un-destroyed Dynamically created Asset objects are more likely to be an issue.

This could be e.g. old versions of your recalculate hightmap, Materials or similar. Might be worth taking two or more snapshots around areas of Memory growth and comparing them to see if anything sticks out.

First of all thanks a lot MartinTilo for all the info and suggestions!

Using the snapshot API to easily get detailed info from a helpful playtester is a great idea, I’ll probably continue with that tomorrow.

In the meantime, I have got some crash reports and associated memory logs using the profiler API. I have a handful of open questions about this.

So this is one of the logs just before a crash:

Memory (2023/08/07 7:02:42 PM)
managed: 1511, native: 143
Total: 28,141MB / 40,721MB
GC: 1,392MB / 1,714MB
Gfx: 0MB / 0MB
Audio: 0MB / 0MB
Video: 0MB / 0MB
Profiler: 51,332MB / 75,289MB
System: 47,319MB0

Explanation:

  • managed is GC.GetTotalMemory(false). Matches Unity’s GC used size pretty closely usually.
  • native is our own tracked memory usage of all allocated ComputeBuffers and NativeArrays, both are in MB
  • All the other stats are from the ProfilerRecorder API, first number is used memory, second number is reserved memory

My questions:

  • the profiler memory usage seems to be waaaay over the top - and also inconsistent with “Total” or “System” memory usage. Is there a reason for this or might it be a bug?
    Note: Every Update() I call Reset() and Start() on every recorder to clear it of any old data, because I’m just interested in the current usage.
  • Gfx memory does not seem to be tracked in release builds. This is also written in the 2021.3 documentation, but the 2020.2 docs claim that release builds do support it. Was this capability removed or are the 2020.2 docs wrong?

Also, from the logs, it seems to me now that the memory usage is slowly building up over time. From player reports I got the idea that it would jump suddenly. I’m keeping my eyes open and will try to get a memory snapshot from a player. If you have any other ideas, let me know and thanks for the help so far everyone :slight_smile:

I just had another thought, just throwing it out there:
One of the most common things to find in our crash stack traces (6 out of 50) is Renderer.GetMaterialArray().

Like here:

0x0000029A301A6C01 (Mono JIT Code) (wrapper managed-to-native) UnityEngine.Renderer:GetMaterialArray (UnityEngine.Renderer)
0x0000029A301A6B2B (Mono JIT Code) UnityEngine.Renderer:get_materials
0x0000029A301A630B (Mono JIT Code) Building:SetOverrideMaterial (UnityEngine.Material,UnityEngine.Color)

In Unity, it’s probably line 1159 here:

So I thought that this would only hold the references to the meshRenderer’s materials (mr is a MehRenderer).
And I thought I wouldn’t need to call Destroy on it because of that. But maybe the materials get duplicated here?

In this function we get the material array, swap out some of the materials with some cached ones as needed by the game situation and at the end set the mesh renderer material array to our modified material array to update it.

Would appreciate a second opinion by a material/memory expert :smile:

This function is called maybe every minute or so as a result of player action, but for tens of thousands of objects. Small leaks could add up.

Yes. While behaviorally this is rather unintuitive, the API documentation on .material and .materials is relatively clear on that matter:

Using these APIs to get and set the materials ONLY changes that renderer (unlike .sharedMaterial) and instantiates a copy of those materials on getting them (iirc not on setting? Something to try out and check with the memory Profiler. You can also set the material’s .name property for easier debugging).

This is kind of a known issue with these properties. The typical Unity memory leak if you will. (There’s a reason Material Count is an item in the memory profiler Module.)

It also aligns with a slow trickling leak as materials aren’t huge in terms of memory usage. They can however hold on to textures.

The 2020.2 docs are wrong.
Regarding the Profiler memory stat, I think I remember a bug about that. But technically the Allocator tracked by that counter is not used in Release. It might have accidentally tracked a fallback allocator in a weird and broken way instead.

Yes, it seems very probable that this is the cause / one cause of all of this. Thanks for the help!

I do remember learning something like that in the beginning, but I think in my mind it worked like a struct. Yes I knew it was a copy, but I didn’t remember / get that it’s tracked in the background and needs to be destroyed.

I’ll post again if anything else comes up regarding this, but thanks again for the help, it was really nice!