Graphics.RenderMeshInstanced generates GC and is slower than Graphics.DrawMeshInstanced?

Hey there. I have two snippets of code, both doing the same rendering. For the first, I’m using
Graphics.RenderMeshInstanced(), because it has support for passing in NativeArray, which is convenient for my use case (heavy job system usage). Unfortunately, I’ve noticed that doing it this way generates garbage every frame and takes over twice as long as the ‘old’ method??

// 9.3 KB garbage per frame, ~0.25ms
for (var splitIndex = 0; splitIndex < trsData.Length; ++splitIndex)
{
    //..
   
    var renderParams = new RenderParams(material);
        renderParams.matProps = propertyBlock;
        renderParams.lightProbeUsage = unityLightProbeUsage;
        renderParams.shadowCastingMode = shadowsMode;
   
    Graphics.RenderMeshInstanced(renderParams, mesh, 0, renderGroup, renderGroup.Length, 0);
}

Here’s an example of using Graphics.DrawMeshInstanced(), with a cached copy of the NativeArray copied into a Matrix4x4. It’s much faster and does not generate garbage.

// 0 KB garbage per frame, ~0.10ms
for (var splitIndex = 0; splitIndex < trsDataFallback.Length; ++splitIndex)
{
    //..

    Graphics.DrawMeshInstanced(mesh, 0, material, renderGroup, renderGroup.Length,
        propertyBlock, shadowsMode, true, layer, null, unityLightProbeUsage);
}

Surely, this is a bug in the new API? If anyone has any ideas as to what is going wrong here, please let me know.

1 Like

Yes, it’s generating a lot of GC because they’ve made the method generic and they’re trying to Parse your struct layout on each call using Marshal.OffsetOf (reason for GC), who knows why. Everybody wanted a NativeArray overload of the DrawMeshInstanced since 2018 but yet we get this. No replies from graphics team in years. We’re using the old DrawMeshInstanced so still have to pay the cost of copying matrices to temporary array.

1 Like

Whoever is responsible for this needs to be held accountable.

3 Likes

It explains why in the docs: Unity - Scripting API: Graphics.RenderMeshInstanced

It doesn’t explain why they need to do it every call instead of caching it per type.

BTW, if you want to write to a managed array from a job, you can pin it. I’m doing this because CommandBuffer’s DrawMeshInstanced still only accepts a Matrix4x4[ ]. To do this, set up a list of pins for each frame in a manager class (eg your render pipeline if doing a custom SRP):

private readonly List<GCHandle> _arrayHandles = new();
public T* pin<T>(T[] array) where T : unmanaged
{
    if(array == null || array.Length == 0)
        throw new Exception("Array must be non-null and non-empty");
    GCHandle gcHandle = GCHandle.Alloc(array, GCHandleType.Pinned);
    _arrayHandles.Add(gcHandle);
    return (T*) gcHandle.AddrOfPinnedObject();
}

When scheduling a job, pin it with that method and put the pointer into the job like this

private unsafe struct ConfigureInstanceDataJob : IJob {
    [NativeDisableUnsafePtrRestriction] private Matrix4x4* pointMatrices;

And then at the end of frame (don’t forget!!!)

foreach(GCHandle pinned in _arrayHandles)
    pinned.Free();
_arrayHandles.Clear();
1 Like

I’m aware of this method, but it has two drawbacks:

  • requires “unsafe”, which I think makes asset store buyers wary, and adds an extra step in documentation for any plugins using this method
  • cannot be Burst compiled (correct me if I’m wrong)

It can be burst compiled with pointers. But, yes, it requires unsafe code.

The problem is you can’t convert Pointers or Spans or NativeArrays back to regular C# array without allocation. The DrawMeshInstanced only accepts that.
Even if they added a startIndex parameter that would solve it.

yes can confirm garbage collection for RenderMeshInstanced. however it seems to run faster than drawmeshinstanced on my tests. fps spikes down due to gc. is there any indication for when this will be fixed?
UPDATE
drawmeshinstanced runs at the same speed the issue was converting my nativearray to matrix4x4 array. drawmeshinstanced has zero gc and runs at same speed

After I converted my code to use RenderMeshInstanced I had to revert everything, the GC work is so intense it doubles the execution time.

I wish I could rely on unity stuff :frowning:

1 Like

The first thing I noticed is you are creating params every loop. If you take this out of your loop, I should think the GC will be zero.

void Start()
{
      renderParams = new RenderParams(material);
}

...

for (var splitIndex = 0; splitIndex < trsData.Length; ++splitIndex)
{
    //..
  
        renderParams.matProps = propertyBlock;
        renderParams.lightProbeUsage = unityLightProbeUsage;
        renderParams.shadowCastingMode = shadowsMode;
  
    Graphics.RenderMeshInstanced(renderParams, mesh, 0, renderGroup, renderGroup.Length, 0);
}

Every time you use new for an object instance that exists, you replace it and therefore the GC needs to clean up the old object from memory.

Further to that, if every render is going to use the same values, then you can take that out of the loop too. Anything which stays the same for every loop, doesn’t need to be in the loop, and can be initialised just before.

void Start()
{
      renderParams = new RenderParams(material);
}
...

renderParams.matProps = propertyBlock;
renderParams.lightProbeUsage = unityLightProbeUsage;
renderParams.shadowCastingMode = shadowsMode;

for (var splitIndex = 0; splitIndex < trsData.Length; ++splitIndex)
{
    //..
 
    Graphics.RenderMeshInstanced(renderParams, mesh, 0, renderGroup, renderGroup.Length, 0);
}

Garbage Collection, hope this helps

myString = new string “Bob”

Mem 001-003 = “Bob”

myString = new string “Cat” (you wouldn’t do this, but just to explain new)

Mem 001-003 Cleanup
Mem 004-006 = “Cat” ‘new’ is saying you want new memory allocated, even though it’s the same size

myString = “Bob”

Mem 004-006 = “Bob” Same size no need to cleanup

myString = “Bobby”

Mem 004-006 Cleanup
Mem 007-011 = “Bobby” As Bobby is 5 letters, a new location is required, hence GC on the old

are you sure this will make it 0 GC? I don’t want to test it again and find out it’s still allocating. Also the material changes, so unless the RenderMeshInstanced will make a copy of the params, I cannot do what you are suggesting.

You would just need to add the material change in the loop if it changes. The problem of allocation is when you allocate new memory. I would hope that it grabs the RenderParams by reference in the function

Sorry I want just to be clear here. You are not saying this because I do new RenderParams right? Because RenderParams is a struct, not a class and therefore new RenderParams doesn’t allocate by itself.

*When you create a struct object using the new operator, it gets created and the appropriate constructor is called. Unlike classes, structs can be instantiated without using the new operator. If you do not use new, the fields will remain unassigned and the object cannot be used until all of the fields are initialized.

When you are using new RenderParams, you don’t have to fill in all the other values, they are set as a default. But it is still ‘creating’ a new struct, meaning the old one is marked for deletion. If you are using a struct persistently, then create that outside of where you use it, Class level: RenderParams myRP (allocated, uninitialised, not usable), Start: myRP = new RenderParams (allocate new, set as defaults) , Loop: renderParams.matProps = propertyBlock; (change a property)
When you change a part of the struct, you are not allocating new memory because it already exists from the class level, stored in memory. I hope I’m right anyway!

Here’s a test I made ages ago, 1million blades of grass, 170+fps, zero allocation. This is part of the construct, although I’m using RenderMeshIndirect here which is a little complex to setup, but showing the setup I use (cut out all the extra code)

public class DrawGrass2 : MonoBehaviour
{
    RenderParams renderParams;

public void Start()
{
        // Init with a material
        renderParams = new RenderParams(material);
        // Set receive shadows
        renderParams.receiveShadows = true;
        // Set shadow casting mode
        renderParams.shadowCastingMode = UnityEngine.Rendering.ShadowCastingMode.Off;
        // MaterialPropertyBlock so you can access the custom StructuredBuffer in the shader
        renderParams.matProps = new MaterialPropertyBlock();
        // Send PositionsBuffer to the GPU ShaderBuffer
        renderParams.matProps.SetBuffer("_PerInstanceData", GB_positionsBuffer);
}

    public void Update()
    {
        if (meshRenderer.isVisible)
        {
            // Instances to draw
            GB_commandData[0].instanceCount = (uint)drawInstances;

            GB_commandBuffer.SetData(GB_commandData);
      
            Graphics.RenderMeshIndirect(in renderParams, mesh, GB_commandBuffer);
        }
    }
}

Ps. you can’t see the count/verts etc. in Statistics, because it’s Indirect. Good thing with Indirect is no limit on the number of instances

Graphics.RenderMeshIndirect behaves differently than RenderMeshInstanced though. It’s possible that the problem is linked only to RenderMeshInstanced

And you might be right about it not allocating if you use new, it may just reset the struct since it calls the default constructor. But the fact is in your code, you are creating a new struct by saying RenderParams/var renderParams = new RenderParams(); You are saying a new one exists, in every loop, so it would be marked for deletion at the end of every loop.

It makes more sense if you think of it as:

allocate new RenderParams renderParams = default RenderParams();

I have used it somewhere but can’t remember where. And I might have encountered this issue too, reading the comments above