I’m writing a custom renderer for Quest 2, I currently have 2 working solutions:
A) Using Graphics.RenderMeshInstanced
- For each unique mesh call RenderMeshInstanced once
- Custom per instance data provided in shader with StructuredBuffer
→ 1 Drawcall per unique mesh
→ 1 command submit to GPU per unique mesh
→ CPU performance is decent, GPU performance is great
B) Using Graphics.DrawMeshIndirect
- Combine all unique meshes into big atlas mesh (editor time)
- Create IndirectDrawIndexedArgs for each unique mesh
- Call Graphics.RenderMeshIndirect once with buffer of IndirectDrawIndexedArgs
- Custom per instance data provided in shader with StructuredBuffer or cbuffer (tried both)
→ 1 command submit to GPU for ALL meshes
→ on Vulkan this should be MDI (Multi-Draw-Indirect)? however it seems to not be entirely it?, in renderDoc I see 1 drawCommand with sub-commands for each unique mesh.
→ CPU performance is amazing, GPU performance is bad.
Quest 2 results (Vulkan) for a scene with ~200 drawcalls & ~50k vertices:
- Solution A is MUCH faster on the GPU, but slower on the CPU
- In renderDoc the amount of drawcalls is the same
- For A there is a bind for vertex buffer & index buffer every drawcall, for B it’s once in the first drawcall (atlas mesh gets bound)
- The buffers are different for A (Instanced) & B (Indirect), A seems to be using buffers in the ScratchBuffer Page.
Questions:
- What’s the ScratchBufferPage? How to use it? When to use it?
- Why is the Indirect draw so much slower? Is this entirely because of the buffer layout?
- Why does B call vkCmdBindDescriptorSets for EACH sub-command, it always binds the exact same data…?
Note:
I have also tried the BRG (Batch Render Group API), in terms of performance it’s almost the same as solution B, because of some added complexities & similar performance I did not pursue that path. BRG seems to be an easier abstraction for MDI (correct me if I’m wrong).