DrawMeshInstanced without data transfer overhead each frame (CommandBuffer?)

Hi there,

I am building a custom system for drawing lots of the same objects simultaneously, using MaterialPropertyBlock to customize how each object looks. However, most of the settings will be exactly the same from frame to frame, with only occasional slight adjustments to specific instances at runtime.

My concern is that all this data is getting pushed to the GPU every single frame, which doesn’t seem very efficient to me when there aren’t many changes happening.

My question is: can I re-use a CommandBuffer from one frame to another without the overhead of pushing new data to the GPU each frame?

Thanks

you can use DrawMeshInstanceIndirect and have the per item data in a structured buffer.
This way all the instances are stored on the GPU and there is no extra copy.

when you need to update some per item data you can use a compute shader to just update the needed indexes in the structured buffer.

2 Likes

Woah, really? Can anyone be so kind as to point me to an example?

as you can see in the example the compute buffers are only updated in Update() if the instance count changes. if not it will render the same instances direct from the GPU every frame with basically no copy to the GPU or CPU work.

2 Likes

That part I see, but what about using a Compute Shader to update individual values as opposed to re-uploading the whole buffer each time it changes?

Lets say you have a compute buffer with a struct that holds the per item data of 100 000 instances you render.
You then want to update 100 of them with new data. You know the index in the large buffer/array where you want the data updated.

Make a new compute buffer with a count of 100 that holds the data you want to update and the index of the original instance in the large list.

then make a compute shader you assign both the large and the new (100 item) compute buffer to.
Dispatch the kernel and set it to run 100 times.

when the compute shader runs it gets its own index, gets the struct from the (100 item) compute buffer.
from that it gets the index in the large buffer and updates the per instance data.

This way you only copy the changed data for the 100 items to the GPU and update the data in the large structured buffer.

3 Likes

Pardon me for barging in on this discussion, but do you happen to know if there are any limitations to this approach? Do all gpus with compute shader support can do this?

I think it should work on all. There is some limits per platform on how many buffers you can add to a compute shader but I think they all support at least 8.

1 Like

More specifics can be found here: https://docs.unity3d.com/Manual/class-ComputeShader.html

Some features are platform dependent.

How can i update the large buffer? I think i have to use RWStructuredBuffer for the large buffer to have write access?