GPU driven rendering with SRP: No DrawProceduralIndirectNow for CommandBuffers?

I am currently in the process of writing a custom GPU driven SRP and have a problem with reducing SetPass calls as there seem to be no methods for drawing meshes via a CommandBuffer without calling SetPass each time.

TL;DR; I need DrawProceduralIndirectNow for CommandBuffers to render lots of different meshes with only one SetPass call.

The algorithm:
I combine all my meshes into one big mesh (“Mesh-atlas”) and sent it to the GPU via GraphicsBuffers. Rendering is then done by creating an args buffer with correct vertex/index offsets and calling CommandBuffer.DrawProceduralIndirect with the correct offset for each mesh inside the “mesh-atlas”, always supplying the same material and pass.

The problem:
At the moment, whenever i call CommandBuffer.DrawProceduralIndirect the total SetPass count is increased by one, even if the material<->mesh combination and pass index stays the same. I clearly do not want that. I want to render most of my scene with one SetPass call. This seems like a bug to me, as there is no need to call SetPass again.

The workaround(?)
I work around this issue by not calling CommandBuffer.DrawProceduralIndirect in my SRP’s Render() function, but calling Graphics.DrawProceduralIndirectNow in RenderPipelineManager.endCameraRendering instead. This works and i can now render everything with only one SetPass call.
But i have the strong feeling that this is not the correct way to do it and i will get into trouble on the long run when trying to draw the shadow pass or transparent objects and trying to fire computeshaders and async gpu fences in between to handle everything else (gpu culling, light binning, decals, tiled rendering, etc.).
From my understanding of Unity, this should all be done in a CommandBuffer to make sure everything is in correct order. “Manually” calling Graphics.DrawProceduralIndirectNow in between sounds very risky to me.

Am i overlooking something? Is there a method to draw meshes with a GPU driven/indirect approach with CommandBuffers which will not call SetPass everytime?

1 Like

If you are doing GPU driven rendering, why do you need to call DrawProceduralIndirect more than once per render pass with the same material?

Thanks for your reply:)

Afaik Unity does not expose MultiDrawIndirect so i have to emulate it by manually calling DrawProceduralIndirect for each mesh.
This results in driver overhead but i have to do that anyways, as Intel GPUs do not support MultiDrawIndirect (afaik).
Once i have that working, i can think about creating native plugins for MultiDrawIndirect access on PCs with Nvidia/AMD GPUs as well as cconsoles
Or am i missing something here? :slight_smile:

Ah ok, if you are only culling at a per-object level, then yes makes sense. I was assuming you were wanting to do finer culling (cluster, triangle), in which case you would only need 1 draw call per material, but that’s a different strategy.

As for why 1 set pass per draw call (instead of per material), I have experienced similar, iirc, and would also be interested to hear what’s going on.

bump

Any news on this from Unity?

Others need this too. See here:

Or skip the proposed method and implement Multi Draw Indirect / ExecuteIndirect for CommandBuffers instead and emulate it with DrawProceduralIndirectNow if Multi Draw Indirect / ExecuteIndirect is not supported on a platform.

Pretty please? :slight_smile:

Maybe check out BatchRenderGroup:

1 Like

Each DrawProceduralIndirect call is one GPU draw call. Unity doesn’t (and can’t) batch them together, it’s up to you to make sure you draw as much stuff as possible in a single DrawProceduralIndirect call.

Thanks for posting these :slight_smile:
I had a quick look at BRG in the past and it seems this is just an abstraction layer on top of DrawMeshInstanced and to get rid of GameObjects for rendering.
We are not using GameObjects (except for UI). We are also not using DOTS for that matter but a custom ecs solution.
We also created our own abstraction layer for DrawMeshInstancedIndirect in the past but with our next project we want to go one step further by using DrawProceduralIndirect or MultiDrawIndirect / ExecuteIndirect if Unity implements it.

I will have another look at BRG again and will also look closely at the threads you posted. Maybe i missed something. Thanks again. :slight_smile:

Thanks for your answer.
Please note that i am talking about SetPass calls and not batches. :slight_smile:
When using CommandBuffer.DrawMeshProceduralIndirect i get one SetPass call each time i use it. Note that the material is always the same and there should be no need to call SetPass again.
If i use Graphics.DrawProceduralIndirectNow i only get 1 SetPass call for everything, which is what i want but there is no DrawProceduralIndirectNow for CommandBuffers.
Please have a look at the attached screenshots to see what i mean :slight_smile:

8302554--1088961--Graphics.DrawProceduralIndirectNow.JPG
First screenshot shows Graphics.DrawProceduralIndirectNow with only 1 SetPass call for all meshes. (Another one for the sky)

8302554--1088958--CommandBuffer.JPG
Second screenshot shows CommandBuffer.DrawProceduralIndirect with 67 SetPass calls for all meshes and another one for the sky.

Ah, I see. SetPass is basically Unity’s way of telling “sending material parameters to the GPU before a draw call”. I’m actually surprised it does only one for multiple Graphics.DrawProceduralIndirectNow.

Yes, this is a major advantage of that function as you save the SetPass call if the material/pass does not change between calls. You basically have to manually tell Unity which material and pass it should use, saving you a lot of cpu time if you have many different meshes. :slight_smile:
MultiDrawIndirect / ExecuteIndirect would be even faster.

From the DrawProceduralIndirectNow documentation

1 Like

I had a look at BRG again and it is not what i am looking for as it is a CPU driven approach (at least at the moment).

HOWEVER, while reading through the second thread, i stumbled upon Graphics.RenderMeshIndirect which is mentioned there by user Jes28 and it seems this is Unity’s wrapper for MultiDrawIndirect/ExecuteIndirect. If this is the case, this is a big step forward! I would then still need a version for CommandBuffers, but this would already be great news.

I will give this a try now and report back :slight_smile:

Just tried it and it seems to do what MultiDrawIndirect/ExecuteIndirect is supposed to do :slight_smile:

On DX11 it uses a fallback/software-emulation which is to be expected, as DX11 does not officially support it (afaik). There are DX11 vendor specific extensions for mdi though.
But since we are not planning to use DX11, this is ok for us.

I have not yet checked my sample on PS5 and SeriesX, but i expect them to also support this in hardware and no software-emulation should be needed.

The only thing missing now is to implement RenderMeshIndirect for CommandBuffers.
** @ **: pretty please? :slight_smile:

DX12:

Vulkan:

DX11:

Frame Debugger (Vulkan):

All in all, this is pretty cool and will probably save us from having to implement Mesh Cluster Rendering

4 Likes

I don’t think that RenderMeshIndirect is the same as MDI. RenderMeshIndirect can only render one mesh multiple times whereas MDI can render different meshes with one draw call.

The only difference to RenderMeshInstanced is that baseVertexIndex, indexCountPerIntance, instanceCount, startIndex and startInstance come from a compute buffer. You can have multiple instanced draw calls in the compute buffer but they all use the same mesh and shader.

This should be possible in DX11 as well.

This method also exists for CommandBuffers, it’s just called slightly different:

1 Like

You render multiple meshes with RenderMeshIndirect by supplying a single “mega-mesh”/“mesh-atlas” which contains all the meshes you want to render. You then pick the correct mesh from the mega-mesh by supplying the correct offsets to the graphicsBuffer and setting commandCount to the amount of different meshes you want to render.
So basically bind all the mesh data once and then pick the individual mesh via the GraphicsBuffer and commandCount supplied to RenderMeshIndirect.

Quick and dirty example for that:

multiDrawCommandsBuffer = new GraphicsBuffer(GraphicsBuffer.Target.IndirectArguments, meshes.Count, GraphicsBuffer.IndirectDrawIndexedArgs.size);
            multiDrawCommands = new GraphicsBuffer.IndirectDrawIndexedArgs[meshes.Count];

            //Create merged mesh
            if (mergedMesh != null)
            {
                UnityEngine.Object.Destroy(mergedMesh);
            }
            mergedMesh = new Mesh();

            int vertexCount = 0;
            int indexCount = 0;
            foreach (Mesh m in meshes)
            {
                vertexCount += m.vertexCount;
                indexCount += (int)m.triangles.Length;
            }

            //Create merged mesh and multDrawCommands
            Vector3[] vertices = new Vector3[vertexCount];
            int[] indices = new int[indexCount];
            Vector3[] normals = new Vector3[vertexCount];
            int currentVertexCount = 0;
            int currentIndexCount = 0;
            for (int i = 0; i < meshes.Count; i++)
            {
                Mesh m = meshes[i];
                Array.Copy(m.vertices, 0, vertices, currentVertexCount, m.vertexCount);
                Array.Copy(m.triangles, 0, indices, currentIndexCount, m.triangles.Length);
                Array.Copy(m.normals, 0, normals, currentVertexCount, m.vertexCount);

                multiDrawCommands[i].baseVertexIndex = (uint)currentVertexCount;
                multiDrawCommands[i].indexCountPerInstance = (uint)m.triangles.Length;
                multiDrawCommands[i].instanceCount = instanceCount;
                multiDrawCommands[i].startIndex = (uint)currentIndexCount;
                multiDrawCommands[i].startInstance = (uint)(i * instanceCount);

                currentVertexCount += m.vertexCount;
                currentIndexCount += (int)m.triangles.Length;
            }
            //mergedMesh.SetVertices(vertices);
            //mergedMesh.SetIndices(indices, MeshTopology.Triangles, 0);
            //mergedMesh.SetNormals(normals);
            mergedMesh.vertices = vertices;
            mergedMesh.triangles = indices;
            mergedMesh.normals = normals;
            mergedMesh.RecalculateTangents();

            multiDrawCommandsBuffer.SetData(multiDrawCommands);

You can also fill the GraphicsBuffer on the GPU via a ComputeShader if you want to do culling on the GPU.

And then call this somewhere else to actually render everything in one go:

RenderParams rp = new RenderParams(material);
            rp.worldBounds = new Bounds(Vector3.zero, 10000 * Vector3.one); // use tighter bounds for better FOV culling
            rp.matProps = new MaterialPropertyBlock();

            Graphics.RenderMeshIndirect(rp, mergedMesh, multiDrawCommandsBuffer, meshes.Count);

And a quick and dirty test shader to render everything, which is basically the one from the documentation. Please note that this is still using the old CG syntax and not the “new” HLSL syntax which should be used for the SRP:

Shader "Custom/UberTweaked"
{
    Properties
    {
  
    }

    SubShader
    {
        Tags {
            "RenderType" = "Opaque"
            "LightMode" = "SRPDefaultUnlit"
        }

        Pass
        {
            CGPROGRAM
            #pragma target 4.5
            #pragma vertex vert
            #pragma fragment frag

            #define UNITY_INDIRECT_DRAW_ARGS IndirectDrawIndexedArgs
            #include "UnityIndirect.cginc"

            struct appdata
            {
                float4 vertex : POSITION;
                float3 normals : NORMAL;
                uint svInstanceID : SV_InstanceID;
                //uint svVertexID : SV_VertexID;
            };

            struct v2f
            {
                float4 pos : SV_POSITION;
                float4 color : COLOR0;
                float3 worldNormal : TEXCOORD0;
            };

            v2f vert(appdata v)
            {
                InitIndirectDrawArgs(0);
                v2f o;
                uint cmdID = GetCommandID(0);
                uint instanceID = GetIndirectInstanceID(v.svInstanceID);
                float4 wpos = mul(unity_ObjectToWorld, v.vertex + float4( (instanceID%10) * 15, cmdID * 8, (int)(instanceID / 10) * 15, 0));

                o.pos = mul(UNITY_MATRIX_VP, wpos);
                o.color = v.vertex / 10;// v.svInstanceID;// float4(cmdID & 1 ? 0.0f : 1.0f, cmdID & 1 ? 1.0f : 0.0f, instanceID), 0.0f);

                o.worldNormal = mul((float3x3)unity_ObjectToWorld, v.normals.xyz).xyz;
                return o;
            }

            float4 frag(v2f i) : SV_Target
            {
                float4 color = 1;
                color.xyz = dot(normalize(i.worldNormal.xyz), normalize(float3(1, 1, 0)));
         
                return color;
            }
            ENDCG
        }
    }
}

I also had a look at the documentation again and chances are pretty high that it uses MDI under the hood on platforms that support it.
From the documentation:

6 Likes

Good job! But it would be better if it can render the entire static scene with 1 drawcall, so I’m confused about different materials.

Hi, very interesting thread :slight_smile:
Have you found a way to implement InitIndirectDrawArgs(), GetCommandID() and GetIndirectInstanceID() in HLSL?. I’m using URP but I could not find anything in the documentation that would suggest how to properly implement this. Everything seems to work fine if I include “UnityIndirect.cginc” in my HLSL code but that does not seem like the right thing to do to me.

1 Like

Maybe you can create a “UnityIndirect.hlsl” and copy into it.

I suppose I could do that. I was hoping there would be a more officially supported way of doing it. But it’ll do for now. Thanks :slight_smile:

Wow: probably the most useful post ever in this forum / thank you so much Sir dotmos for the incredible help !
Still need to figure out how to include Matrix4x4 for the individual items, but this should be manageable. Not the first time I regret not to have invested time in understanding shaders…
Again, super mega useful stuff!!!

1 Like

Any chance to have that working in hdrp and shadergraph?