Big performance drop due to GC Collect

Hello, I have a class which mimics the original Unity Mesh class. It’s purpose is to work in a separate thread since the Mesh class is not thread-safe. I noticed a big performance drop when I used my custom Combine method and when I profiled the issue, I saw that the main reason for this is the garbage collector.

This is the combine method:

public void Combine(TS_Mesh newMesh)
        {
            Vector3[] newVertices = new Vector3[_vertices.Length + newMesh._vertices.Length];
            Vector3[] newNormals = new Vector3[_normals.Length + newMesh._normals.Length];
            Vector2[] newUvs = new Vector2[_uv.Length + newMesh._uv.Length];
            Color[] newColors = new Color[_colors.Length + newMesh._colors.Length];
            Vector4[] newTangents = new Vector4[_tangents.Length + newMesh._tangents.Length];
            int[] newTriangles = new int[_triangles.Length + newMesh._triangles.Length];

            _vertices.CopyTo(newVertices, 0);
            newMesh._vertices.CopyTo(newVertices, _vertices.Length);

            _normals.CopyTo(newNormals, 0);
            newMesh._normals.CopyTo(newNormals, _normals.Length);

            _uv.CopyTo(newUvs, 0);
            newMesh._uv.CopyTo(newUvs, _uv.Length);

            _colors.CopyTo(newColors, 0);
            newMesh._colors.CopyTo(newColors, _colors.Length);

            _tangents.CopyTo(newTangents, 0);
            newMesh._tangents.CopyTo(newTangents, _tangents.Length);

            for(int i = 0; i < newTriangles.Length; i++)
            {
                if (i < _triangles.Length) newTriangles[i] = _triangles[i];
                else  newTriangles[i] = (newMesh._triangles[i - _triangles.Length] + _vertices.Length);
            }

            for(int i = 0; i < newMesh._subMeshes.Count; i++)
            {
                if(i >= _subMeshes.Count) subMeshes.Add(newMesh.subMeshes[i]);
                else
                {
                    int[] newTris = new int[_subMeshes[i].Length + newMesh._subMeshes[i].Length];
                    _subMeshes[i].CopyTo(newTris, 0);
                    for(int n = 0; n < newMesh._subMeshes[i].Length; n++)
                    {
                        newTris[_subMeshes[i].Length + n] = newMesh._subMeshes[i][n] + _vertices.Length;
                    }
                    _subMeshes[i] = newTris;
                }
            }
            _vertices = newVertices;
            _normals = newNormals;
            _uv = newUvs;
            _colors = newColors;
            _tangents = newTangents;
            _triangles = newTriangles;
            _hasUpdate = true;
        }

I think the reason for the GC working so hard is that I create new temporary arrays in order to expand the current vertex/normal/uv (and so on) arrays and then the old arrays just get replaced by the new reference. What is the work around for this ? How can I reduce the workload for the GC ? I could use Lists but I know they are a lot slower than arrays.

You might consider not discarding previous _vertices, _normals, etc arrays - but keeping them for next time you call Combine. And only increasing their size when needed (eventually it will reach maximum you use and stay at that, before that there will be some GC calls when they expand). Will increase overhead in Mesh.vertices=_vertices; and such calls as there will be zeroed vertices, but will decrease GC. Without testing there’s no way whether it will be increase in performance or decrease.

You can do that because afaik Mesh.vertices=_vertices will copy _vertices array, not use its reference. So you could just reuse that array without discarding it.

I can’t recommend lists in this case as to use Mesh.vertices = list you would need to call list’s ToArray() method which will create new array causing GC to trigger.

Basically, remember all new calls as much as possible and don’t call them if you can do without calling them - that’s like the only way of fighting GC.

Another possible reason for slow GC is that arrays are large and are placed into heap for large objects. You might also consider making smaller meshes/not combining them.

1 Like

Hi Teravisor,
Thanks for the quick answer! I know that these arrays are not passed by reference when I assign them to the Mesh class and I’m actually doing exactly what you said: I’m keeping those vertices, normals, uvs, triangles, tangents, etc. while combining all of the meshes and only after that do I write them to the real mesh object. So for example if I have to combine 10 meshes, I combine them using the TS_Mesh class, them modify them and finally write everything to a Mesh object.

So basically what you’re saying is that as long as I make new arrays in order to expand the old ones, the GC will be busy ? That’s pretty bad because I can’t go with less combinations in this particular case.

Yes, it will be GCing old arrays. You do, hovewer, always do … newVertices = new … when you could’ve cached previous array and reused that. It’s tricky to do and requires a bit of replanning on how TS_Mesh keeps data.

Also if you’re doing several combines of same mesh with others, you might consider making Combine(TS_Mesh newMesh1, TS_Mesh newMesh2) and such methods or Combine(params TS_Mesh[ ] newMeshes) which will allocate only one time for several combines.

1 Like

Hmm what do you mean by that exactly ? If I cached newVertices along with _vertices in order to reuse them won’t that just keep an additional instance of the array (since _vertices = newVertices at the end of the method) and not quite do anything more ? I need to increase the _vertices array with each combine so I need to make new arrays that are bigger than the initial _vertices array. If I kept newVertices for reuse, I’d still have to make a new array in order to expand the _vertices array, right ? Maybe I’m missing the point here.

And yes, that actually sounds like a plan - having a Combine which takes a TS_Mesh array instead of a single TS_Mesh object and then allocate the new arrays only once to fit the sum of the combined vertices. Thanks for that tip! I’ll try to do that and see if it gives a better result.

You could save _vertices in some Pool and reuse it later. Like Object Pooling in Unity, just for arrays already allocated. It’s a desperate measure, though and in a lot of cases it will only hurt performance by hogging RAM. Also it requires you to manually specify how and when to free memory from it, it will have overhead generally, and… It’s not beautiful code.

Also if each of your mesh would always have 30000 empty vertexes at start, adding up to that amount should take quite some time (and each time you use it only partially when combining will save GC one object allocated). That is inefficient when setting Mesh data and on total RAM usage though.

And completely desperate way (if nothing have helped) is creating unmanaged array of bytes and distributing it as arrays you need. Requires #unsafe code and tons of hacks, or going to C++ (has problems with compiling lib for all platforms). Then you define when to free memory and how to do it.

1 Like

You can make the array bigger than you need so you can always reuse it, and return an int that gives the true length.

Another thing I’ve done is create a TempArrayPool that I can ask to give me an array of a specific length, which I return to the pool when I’m done. This way I only need to create an array if I don’t already have an array of that length. This may not work for some applications that use millions of differently sized arrays.

1 Like

Gotcha both!
Well yeah, array pooling wouldn’t be a very big enhancement to the performance since I’m using different-sized arrays all the time and if I have to do that just for this single piece of code (since I’m not needing that functionality anywhere else) it would waste more time than help the performance I suppose.

The idea with the bigger array is also interesting, the only problem is that…well it’s a bigger array and will take up more memory and I’ll still have to cache it along with a length index but if having a multiple Combine method doesn’t help, I’ll try that one too.

Thank you both for the answers! I’m off to work, will post back results!

I wrote an overload for the Combine method which supports a TS_Mesh array and using it almost completely destroyed the GC overhead.

it’s a little bit messy right now and I’m sure it can be optimized further but this worked pretty well for me:

public void Combine(TS_Mesh[] newMeshes)
        {
            int newVerts = 0;
            int newTris = 0;
            List<int> newSubs = new List<int>();
            for(int i = 0; i < newMeshes.Length; i++)
            {
                newVerts += newMeshes[i].vertexCount;
                newTris += newMeshes[i].triangles.Length;
                for(int n = 0; n < newMeshes[i].subMeshes.Count; n++)
                {
                    if (n >= newSubs.Count) newSubs.Add(newMeshes[i].subMeshes[n].Length);
                    else newSubs[n] += newMeshes[i].subMeshes[n].Length;
                }
            }
            Vector3[] newVertices = new Vector3[_vertices.Length + newVerts];
            Vector3[] newNormals = new Vector3[_normals.Length + newVerts];
            Vector2[] newUvs = new Vector2[_uv.Length + newVerts];
            Color[] newColors = new Color[_colors.Length + newVerts];
            Vector4[] newTangents = new Vector4[_tangents.Length + newVerts];
            int[] newTriangles = new int[_triangles.Length + newTris];
            List<int[]> newSubmeshes = new List<int[]>();
            for(int i = 0; i < newSubs.Count; i++)
            {
                newSubmeshes.Add(new int[newSubs[i]]);
                if (i < _subMeshes.Count) newSubs[i] = _subMeshes[i].Length;
                else newSubs[i] = 0;
            }
            newVerts = vertexCount;
            newTris = _triangles.Length;
            _vertices.CopyTo(newVertices, 0);
            _normals.CopyTo(newNormals, 0);
            _uv.CopyTo(newUvs, 0);
            _colors.CopyTo(newColors, 0);
            _tangents.CopyTo(newTangents, 0);
            _triangles.CopyTo(newTriangles, 0);

            for (int i = 0; i < newMeshes.Length; i++)
            {
                newMeshes[i]._vertices.CopyTo(newVertices, newVerts);
                newMeshes[i]._normals.CopyTo(newNormals, newVerts);
                newMeshes[i]._uv.CopyTo(newUvs, _uv.Length);
                newMeshes[i]._colors.CopyTo(newColors, newVerts);
                newMeshes[i]._tangents.CopyTo(newTangents, newVerts);

                for (int n = newTris; n < newTris + newMeshes[i]._triangles.Length; n++)
                {
                    newTriangles[n] = newMeshes[i]._triangles[n - newTris] + newVerts;
                }


                for (int n = 0; n < newMeshes[i].subMeshes.Count; n++)
                {
                    for (int x = newSubs[n]; x < newSubs[n] + newMeshes[i]._subMeshes[n].Length; x++)
                    {
                        newSubmeshes[n][x] = newMeshes[i].subMeshes[n][x - newSubs[n]] + newVerts;
                    }
                    newSubs[n] += newMeshes[i]._subMeshes[n].Length;
                }
                newTris += newMeshes[i].triangles.Length;
                newVerts += newMeshes[i].vertexCount;
            }

            _vertices = newVertices;
            _normals = newNormals;
            _uv = newUvs;
            _colors = newColors;
            _tangents = newTangents;
            _triangles = newTriangles;
            _subMeshes = newSubmeshes;
            _hasUpdate = true;
        }