New BatchRendererGroup API for 2022.1

Hi Everyone!

We in the Hybrid Renderer team are excited to share some news on what we have been working on for the last year.

Background
The Hybrid Renderer is a way for Unity to use Entities and associated data for rendering without requiring you to round trip to GameObjects. Since its initial implementation, the Unity Engine side of this, the BatchRendererGroup (BRG), has not been very easy to use unless you know a lot of the implementation details.

As the BRG provides a way to potentially issue more draw calls with much less CPU cost there have naturally been requests from you on how to use this interface. So as of Unity 2022.1, the BRG has been fully rewritten which we hope will bring better flexibility as well as be easier to use.

Where the BRG previously consisted of a set of pre-built batches with the same mesh and material, it now allows you to build your own draw commands in the OnPerformCulling callback or in (Burst compiled) jobs. Each draw command is like a DrawMeshInstanced call and can draw multiple instances with the same material and mesh, with the difference that they are much more efficient and flexible.

Performance example
Draw commands using the BRG are considerably cheaper than traditional GameObject rendering, with regards to actual draw submission cost on the main and render thread. Direct comparison is tricky due to the fact that the responsibility of some work (culling, draw setup and so on) is moved to you as a user.

As an example this URP test scene of roughly 24k GameObjects with varying meshes and materials has a baseline RenderCameraStack profiler marker of about 17ms on AMD 3970X. This is all serial main thread time, and the render thread has about the same time because the main thread canā€™t feed it quickly enough.

Compared to a scene where the GameObjects have been converted to draws in a BRG, the same profiler marker takes about 0.8ms on the main thread and about 0.2ms render thread.

Previously you could do this with Graphics.DrawMeshInstanced of course, but it would require you to upload a lot of matrices every frame as well as having to write custom shaders for any per instance data you would like to override. With the new BRG it is possible to write shaders that support both this and regular game objects and we support URP/Lit and HDRP/Lit (among others) out of the box. Also take these measurements with a grain of salt, but they do show that itā€™s possible to render much more efficiently with the new API.

Getting Started
First, your project needs to use a Scriptable Render Pipeline (URP/HDRP or custom) as the new BRG interface is fully built upon the SRP Batcher. You then need to disable stripping of DOTS Instancing variants by setting ā€œBuild-time stripping of BatchRendererGroup variantsā€ (under ā€œProject Settingsā€, ā€œGraphicsā€, ā€œShader Strippingā€) to ā€œKeep allā€. The project also needs to have ā€œAllow unsafe codeā€ enabled (under ā€œProject Settingsā€, ā€œScript Compilationā€).

Currently OpenGL, GLES and WebGL are not supported.

The following code is adapted from the SimpleExample test scene and script available in the URP BRG test project in the Graphics repository.

BatchRenderGroup Object
Next up you need to create the BatchRendererGroup and provide an OnPerformCulling callback method.

void Start()
{
    m_BRG = new BatchRendererGroup(OnPerformCulling, IntPtr.Zero);
    // Register resources
    // Create batch
}

This callback is the main entry for the BatchRendererGroup and will be called by Unity internals whenever visible objects are to be culled.

public unsafe JobHandle OnPerformCulling(
  BatchRendererGroup rendererGroup,       // The BRG associated with this callback
  BatchCullingContext cullingContext,     // Immutable input data for the operation
  BatchCullingOutput cullingOutput,       // Mutable output data (draw commands)
  IntPtr userContext)                     // Context user can pass (IntPtr.Zero here)
{
    // Do draw command generation work here
    return new JobHandle(); // A job handle saying when the output actually is ready
}

Register Resources
The data output from the culling callback are all unmanaged types. This means that you canā€™t directly reference objects such as instances of Mesh and Material, and instead, you need to register them with the BRG before you use them. In a real world project you should do this dynamically, but this example uses Start:

    m_MeshID = m_BRG.RegisterMesh(mesh);
    m_MaterialID = m_BRG.RegisterMaterial(material);

The BRG object holds meshes and materials and updates its internal representation once per frame (after it processes the culling callback).

Create Batch
Except meshes and materials, the only way to add data to a specific draw call is through a batch and its associated metadata. The SRP shader code looks at the metadata for specific properties, determines if the property is in batch data or in the material constant buffer (high bit set or not), and loads the data for the instance from the correct location. We provide a utility (UNITY_ACCESS_DOTS_INSTANCED_PROP_WITH_DEFAULT) in the render pipelines core package for this. For more information on how this works, please see the linked manual draft.

Note: the layout and implementation here is just one way to do it, and the way we have implemented for the Hybrid Renderer. You can use the metadata values in any way you find suitable with custom shaders or a modified URP/HDRP.

First you need to create a buffer big enough to hold on to all the data you intend to store. If you outgrow this size later there is an API to update the buffer for a specific batch.

    m_InstanceData = new GraphicsBuffer(GraphicsBuffer.Target.Raw,
                                        kBufferSizeInInts,
                                        sizeof(int));
    // Fill data to buffer (see attached example code)

Next up you need to set up the metadata for the batch you want to create. In this example, there are three shader properties stored in the buffer: ObjectToWorld, WorldToObject and BaseColor. The high bit (0x80000000) is set for these so they will read per-instance data.

    var metadata = new NativeArray<MetadataValue>(3, Allocator.Temp);
    metadata[0] = new MetadataValue { NameID = Shader.PropertyToID("unity_ObjectToWorld"), Value = 0x80000000 | byteAddressObjectToWorld, };
    metadata[1] = new MetadataValue { NameID = Shader.PropertyToID("unity_WorldToObject"), Value = 0x80000000 | byteAddressWorldToObject, };
    metadata[2] = new MetadataValue { NameID = Shader.PropertyToID("_BaseColor"), Value = 0x80000000 | byteAddressColor, };

And finally you can create this batch.

    m_BatchID = m_BRG.AddBatch(metadata, m_InstanceData.bufferHandle);

As with mesh and material registration, you should create these batches when needed, but this example uses Start.

Culling Callback
Now that everything is set up it is finally time to fill in the draw command generation part in the culling callback.

First up you need to allocate memory for the output. This is necessary as you as a user must know how many commands and ranges you will output. In this example, it means the draw commands, the draw ranges, and the visible instances. Always allocate memory using Allocator.TempJob as Unity frees this memory on the backend, potentially on another thread, later on.

    var drawCommands = (BatchCullingOutputDrawCommands*)cullingOutput.drawCommands.GetUnsafePtr();

    drawCommands->drawCommands = (BatchDrawCommand*)UnsafeUtility.Malloc(UnsafeUtility.SizeOf<BatchDrawCommand>(), alignment, Allocator.TempJob);
    drawCommands->drawRanges = (BatchDrawRange*)UnsafeUtility.Malloc(UnsafeUtility.SizeOf<BatchDrawRange>(), alignment, Allocator.TempJob);
    drawCommands->visibleInstances = (int*)UnsafeUtility.Malloc(kNumInstances * sizeof(int), alignment, Allocator.TempJob);
    drawCommands->drawCommandPickingInstanceIDs = null; // Picking is not handled

This example creates a single draw command, in a single range, that renders three instances. It doesnā€™t set up any sorting positions, but you can use these in your own implementation to sort draw commands.

    drawCommands->drawCommandCount = 1;
    drawCommands->drawRangeCount = 1;
    drawCommands->visibleInstanceCount = kNumInstances;

    drawCommands->instanceSortingPositions = null; // No sorting
    drawCommands->instanceSortingPositionFloatCount = 0;

You need to set up each of the draw calls with a contiguous range of instance indices in the visible instances array. This example just does one draw call and writes out the values 0, 1, and 2. This is the index to be used for a specific instance to look up the data in a specific batch, even though the array is shared between many draw calls referencing multiple batches.

    for (int i = 0; i < kNumInstances; ++i)
        drawCommands->visibleInstances[i] = i;

Draw commands contain the information Unity needs to render with the appropriate mesh, material, and batch. The code below sets up the offset and count of the visible instances, as well as some other data Unity needs for rendering.

    drawCommands->drawCommands[0].visibleOffset = 0;
    drawCommands->drawCommands[0].visibleCount = kNumInstances;
    drawCommands->drawCommands[0].batchID = m_BatchID;
    drawCommands->drawCommands[0].materialID = m_MaterialID;
    drawCommands->drawCommands[0].meshID = m_MeshID;
    drawCommands->drawCommands[0].submeshIndex = 0;
    drawCommands->drawCommands[0].splitVisibilityMask = 0xff;
    drawCommands->drawCommands[0].flags = 0;
    drawCommands->drawCommands[0].sortingPosition = 0;

Finally, you can set up the draw range. Itā€™s just one in this case. A real project scene would use commands with different filtering settings to split the instances into different ranges.

    drawCommands->drawRanges[0].drawCommandsBegin = 0;
    drawCommands->drawRanges[0].drawCommandsCount = 1;
    drawCommands->drawRanges[0].filterSettings = new BatchFilterSettings { renderingLayerMask = 0xffffffff, };

This simple example returns a default JobHandle, with all draw command generation happening on the main thread. To optimize performance, most of the things in this function should be a Burst job. If you use Burst jobs, Unity builds a job chain with the returned job handle here to filter, prepare, and execute draws matching filter settings from draw renderers and draw shadows commands in the SRP.

Additional user responsibilities
The downside of a more raw and flexible draw submission system is that you have to set up a lot more data. The example above only sets up matrices and colors, but for more complex scenes it will be your responsibility to set up any global illumination data such as light probes and light maps.
Unity canā€™t do this since it has no information about what instance is where any longer, itā€™s all provided using an opaque GPU buffer.

Known issues
Some mobile devices might have a GPU performance regression compared to game objects. This can be due to the SSBO data loading path used in these examples (and by the current Hybrid Renderer). We are investigating different approaches to solve this without an explosion of shader variants.

Resources
Our test projects for URP and HDRP live in the Graphics repo.

The Unity Manual page for BatchRendererGroup

Future work
In the coming year, the Hybrid Rendering team will land improvements to the Hybrid Renderer package to use this interface, as well as work on interface improvements and bug fixes on the core Unity side of things.

Support for GLES3.1 and higher is on our roadmap, and we are also looking at supporting equivalents to Graphics.DrawProcedural and Graphics.DrawProceduralIndirect.



36 Likes

Please give me a real world example of why I need to re-register these? Probably just need more peripheral information why I do these things, not I should do these thingsā€¦

Iā€™m guessing just to alter the rendered mesh or material whenever it might be needed? For example with grass: never, in my case?

1 Like

I will try to clarify this in the post later, but the main reason is that Mesh and Material is managed types and this whole interface is written to be burst compatible which rules out managed types.
Registering also allows us to sidestep some very slow setup and teardown costs per frame and generate a better usable view of Mesh and Material up front.

I agree that ideally Mesh and Material would be directly usable but we are not there yet.

7 Likes

Hi @joelv
Good step forward thanks :slight_smile:

This new api looks like very similar to MultiDrawIndirect (MDI)

Can you clarify similarities and differences of MDI to MeshShaders to Unity BRG Api?

  • can we issue one draw call to render many instances of different submeshes of one mesh in different locations?

  • can we use modern techniques like culling with compute buffer ā†’ generate draw commands buffer and than issue one BRG to render from that buffer?

  • do this api is step forward to MeshShaders or you plan to rewrite BRG once again to support MeshShaders?

  • can we use meshlets approach through BRG

  • when we can expect shader graph support for this?

Thanks :slight_smile:

1 Like

Glad you like it.

So to answer your questions: Yes it is a bit like MDI but it is still a CPU draw loop. It will allow you to switch Mesh (including submesh) and Material between the draws in this loop (each draw command).

But currently this does not interact nicely with GPU generated draws, the draw commands exposed by the BRG is generated and consumed on the CPU. We are looking into supporting the equivalent to DrawProcedural which could mean at least almost full MDI support depending on what the underlying graphics API supports. In our prototype this becomes a new type of DrawCommand that allows you to provide the compute buffers needed.

Mesh shaders do not really fit into our roadmap as it is now, but if/once the shader pipeline and device backends for them are planned out we will be sure to support them.

And finally, yes this works with shader graph. At least if used in an HDRP/URP context where we have written the required code generation.

Hope this answers your questions. We canā€™t commit to how the API will evolve but we will continue to improve it.

Also, in a real world project you would likely have new objects created dynamically during the game.
For example, letā€™s say you want to instantiate a projectile when shooting at something. If the projectile uses some materials or meshes which were unknown on startup, then you would need to register them dynamically to the BatchRendererGroup to be able to perform the rendering.
Currently you canā€™t render anything with a BRG using materials or meshes which havenā€™t been registered beforehand.
In the same way, deregistration should also happen dynamically so that the data associated with unused registered materials/meshes is freed up. The details regarding when and how you register/deregister the resources are up to you.

In this post, we talk about a simple example scene in which all the materials and meshes are known on startup. So technically you can just gather all of them in the Start() function, then register everything and be done with it. But in real world projects it usually wonā€™t be that simple. The example in this post is mainly for illustration purpose.

4 Likes

Thanks, as you can imagine the performance gains here donā€™t really make this optional for any serious published game, or for use in VR (Vulkan at the moment if Quest2).

I use URP currently.

My projects are currently mobobehaviour and gameobject based but my need for BRG API is for rendering static level geometry, details, and things that I need a lot of but canā€™t afford the overhead (but have predictable characteristics so donā€™t need to be gameobjects). So if you could keep it in mind for future examples (accelerating common static level rendering tasks), I would really appreciate it and probably learn a lot.

Ideally Iā€™d like to replace all my rendering with this if I can, and am willing to build some functions to make life easier. Anything along those lines like a strategy or decent setup to build on top for dev QOL that you could advise would be great.

I aim to try and get the most out of this valuable performance gain!

And:

An example for that too if possible. I sound awfully selfish but this feature is a big deal, especially if like me, youā€™re not doing an ECS project but a regular one with a heck of a lot of draw calls (large open world rendering on rubbish hardware).

Thanks regardless as this was a much needed feature.

8 Likes

Iā€™m wondering if we can get a decent HDRP example with helpful comments like the URP example. I too like hippocoder would like to replace all my rendering with this asap.

1 Like

Thanks

Another questions:

  • Can we expect to have simple for loop of draw calls (without state change) when we have same material same mesh just different submesh and per instance data offset in buffer?
    All DrawIndexed commands must be equal, just different parameters so it must be super fast.

  • Unity have DrawProcedural that is equivalent of glDrawElementsIndirect
    when/do unity will expose MultiDrawProcedural that will be equivalent of glMultiDrawElementsIndirect?

3 Likes

Hi!
Do you plan to work on a new renderer using this interface for GameObjects/MeshRenderer without any dependency to DOTS, as seen in https://twitter.com/SebAaltonen/status/1407661348197175299 ?

5 Likes

It is very difficult to raise Draw, which is originally in the low-level area, to the script level, but the reason for entrusting this to the front-end stage is not to leave Occlusion Culling to the developerā€™s discretion.

Currently I do think mesh switching is a bit costly, even submesh. It means some additional data encoded per draw in the backend and some buffers to be bound. We are looking into optimizing this.

I am not aware of any plans for exposing that API. However it might happen in context of the BatchRendererGroup at some point.

1 Like

The test scripts in the repository linked in the original post actually originates from Sebastians hackweek project. We are maintaining these but currently they are not feature complete (missing lightprobe and lightmap support, missing static batching support).

Our teams focus is the Hybrid Renderer and we canā€™t spread ourselves too thin at the moment. I would love to be able to reimplement full Game Object rendering on top of BRG though, but itā€™s nothing I can promise =)

4 Likes

We have a branch here where we have been experimenting with allowing static objects to render via BRG. This is super super experimental but can serve as a bigger example of how to put a lot more things into the BRG. This is also using some engine APIā€™s that have not made it into a release yet so it will likely not just work or just compile for you but the entry point is the monobehaviour here.

Itā€™s likely a pretty okay starting point if you want to make your own ā€˜fast brg for my static objectsā€™. Note we also do some stuff for objects that are moving, but rigid so they can also work through BRG. This just leaves some things like skinned meshes and effects going through the gameobject path. Note: Donā€™t look into the deferred material stuff, thatā€™s even more experimental and builds on top of the BRG gameobject rendering and is even less stable than the super experimental static objects in BRG.

10 Likes

Specifically if you take a look at that branch ignore anything that is turned on setting this to true:

public bool EnableDeferredMaterials = false;

1 Like

Oh boy, this will be epic if it ever makes into prod. +1 from me, I pretty much would like to see this developed properly.

3 Likes

Is this similar to vulkan multithreaded rendering?

In one way you can see of it like that: multiple threads can build draw commands. However the commands you provide here are CPU draw commands. They need to be issued to the GPU using some low level API calls later on. These can happen from multiple threads as well if you have graphics jobs enabled.

2 Likes

You should consider giving this a much higher priority if you ask me. The tech seems amazing and would allow us to do more. Getting something into the hands of people sooner will be better.

9 Likes

The manual page was just published.
Please have a look for more details how this all works

8 Likes