DrawIndexedIndirect vs DrawIndexed (Vulkan)

I’m writing a custom renderer for Quest 2, I currently have 2 working solutions:

A) Using Graphics.RenderMeshInstanced

  • For each unique mesh call RenderMeshInstanced once
  • Custom per instance data provided in shader with StructuredBuffer
    → 1 Drawcall per unique mesh
    → 1 command submit to GPU per unique mesh
    → CPU performance is decent, GPU performance is great

B) Using Graphics.DrawMeshIndirect

  • Combine all unique meshes into big atlas mesh (editor time)
  • Create IndirectDrawIndexedArgs for each unique mesh
  • Call Graphics.RenderMeshIndirect once with buffer of IndirectDrawIndexedArgs
  • Custom per instance data provided in shader with StructuredBuffer or cbuffer (tried both)
    → 1 command submit to GPU for ALL meshes
    → on Vulkan this should be MDI (Multi-Draw-Indirect)? however it seems to not be entirely it?, in renderDoc I see 1 drawCommand with sub-commands for each unique mesh.
    → CPU performance is amazing, GPU performance is bad.

Quest 2 results (Vulkan) for a scene with ~200 drawcalls & ~50k vertices:

  • Solution A is MUCH faster on the GPU, but slower on the CPU
  • In renderDoc the amount of drawcalls is the same
  • For A there is a bind for vertex buffer & index buffer every drawcall, for B it’s once in the first drawcall (atlas mesh gets bound)
  • The buffers are different for A (Instanced) & B (Indirect), A seems to be using buffers in the ScratchBuffer Page.

Questions:

  • What’s the ScratchBufferPage? How to use it? When to use it?
  • Why is the Indirect draw so much slower? Is this entirely because of the buffer layout?
  • Why does B call vkCmdBindDescriptorSets for EACH sub-command, it always binds the exact same data…?

Note:
I have also tried the BRG (Batch Render Group API), in terms of performance it’s almost the same as solution B, because of some added complexities & similar performance I did not pursue that path. BRG seems to be an easier abstraction for MDI (correct me if I’m wrong).