About the Performance Issues of DrawMeshInstancedIndirect

I am currently encountering an issue where, even if the instance count calculated on the GPU is zero, the GPU should theoretically not render anything. However, since the CPU does not know the exact number of instances to be drawn, it still calls DrawMeshInstancedIndirect. In theory, this should be the same as not calling the DrawMeshInstancedIndirect API at all. However, I have observed a significant performance difference between calling and not calling DrawMeshInstancedIndirect, so I suspect that extra data might be transferred to the GPU.

For example, if there are twenty types of instanced objects in the scene, this would require calling DrawMeshInstancedIndirect twenty times, which would lead to a noticeable impact on performance. I suspect that calling DrawMeshInstancedIndirect might involve transferring additional data to the GPU.