Hi Everyone!
We in the Hybrid Renderer team are excited to share some news on what we have been working on for the last year.
Background
The Hybrid Renderer is a way for Unity to use Entities and associated data for rendering without requiring you to round trip to GameObjects. Since its initial implementation, the Unity Engine side of this, the BatchRendererGroup (BRG), has not been very easy to use unless you know a lot of the implementation details.
As the BRG provides a way to potentially issue more draw calls with much less CPU cost there have naturally been requests from you on how to use this interface. So as of Unity 2022.1, the BRG has been fully rewritten which we hope will bring better flexibility as well as be easier to use.
Where the BRG previously consisted of a set of pre-built batches with the same mesh and material, it now allows you to build your own draw commands in the OnPerformCulling callback or in (Burst compiled) jobs. Each draw command is like a DrawMeshInstanced call and can draw multiple instances with the same material and mesh, with the difference that they are much more efficient and flexible.
Performance example
Draw commands using the BRG are considerably cheaper than traditional GameObject rendering, with regards to actual draw submission cost on the main and render thread. Direct comparison is tricky due to the fact that the responsibility of some work (culling, draw setup and so on) is moved to you as a user.
As an example this URP test scene of roughly 24k GameObjects with varying meshes and materials has a baseline RenderCameraStack profiler marker of about 17ms on AMD 3970X. This is all serial main thread time, and the render thread has about the same time because the main thread canāt feed it quickly enough.
Compared to a scene where the GameObjects have been converted to draws in a BRG, the same profiler marker takes about 0.8ms on the main thread and about 0.2ms render thread.
Previously you could do this with Graphics.DrawMeshInstanced of course, but it would require you to upload a lot of matrices every frame as well as having to write custom shaders for any per instance data you would like to override. With the new BRG it is possible to write shaders that support both this and regular game objects and we support URP/Lit and HDRP/Lit (among others) out of the box. Also take these measurements with a grain of salt, but they do show that itās possible to render much more efficiently with the new API.
Getting Started
First, your project needs to use a Scriptable Render Pipeline (URP/HDRP or custom) as the new BRG interface is fully built upon the SRP Batcher. You then need to disable stripping of DOTS Instancing variants by setting āBuild-time stripping of BatchRendererGroup variantsā (under āProject Settingsā, āGraphicsā, āShader Strippingā) to āKeep allā. The project also needs to have āAllow unsafe codeā enabled (under āProject Settingsā, āScript Compilationā).
Currently OpenGL, GLES and WebGL are not supported.
The following code is adapted from the SimpleExample test scene and script available in the URP BRG test project in the Graphics repository.
BatchRenderGroup Object
Next up you need to create the BatchRendererGroup and provide an OnPerformCulling callback method.
void Start()
{
m_BRG = new BatchRendererGroup(OnPerformCulling, IntPtr.Zero);
// Register resources
// Create batch
}
This callback is the main entry for the BatchRendererGroup and will be called by Unity internals whenever visible objects are to be culled.
public unsafe JobHandle OnPerformCulling(
BatchRendererGroup rendererGroup, // The BRG associated with this callback
BatchCullingContext cullingContext, // Immutable input data for the operation
BatchCullingOutput cullingOutput, // Mutable output data (draw commands)
IntPtr userContext) // Context user can pass (IntPtr.Zero here)
{
// Do draw command generation work here
return new JobHandle(); // A job handle saying when the output actually is ready
}
Register Resources
The data output from the culling callback are all unmanaged types. This means that you canāt directly reference objects such as instances of Mesh and Material, and instead, you need to register them with the BRG before you use them. In a real world project you should do this dynamically, but this example uses Start:
m_MeshID = m_BRG.RegisterMesh(mesh);
m_MaterialID = m_BRG.RegisterMaterial(material);
The BRG object holds meshes and materials and updates its internal representation once per frame (after it processes the culling callback).
Create Batch
Except meshes and materials, the only way to add data to a specific draw call is through a batch and its associated metadata. The SRP shader code looks at the metadata for specific properties, determines if the property is in batch data or in the material constant buffer (high bit set or not), and loads the data for the instance from the correct location. We provide a utility (UNITY_ACCESS_DOTS_INSTANCED_PROP_WITH_DEFAULT) in the render pipelines core package for this. For more information on how this works, please see the linked manual draft.
Note: the layout and implementation here is just one way to do it, and the way we have implemented for the Hybrid Renderer. You can use the metadata values in any way you find suitable with custom shaders or a modified URP/HDRP.
First you need to create a buffer big enough to hold on to all the data you intend to store. If you outgrow this size later there is an API to update the buffer for a specific batch.
m_InstanceData = new GraphicsBuffer(GraphicsBuffer.Target.Raw,
kBufferSizeInInts,
sizeof(int));
// Fill data to buffer (see attached example code)
Next up you need to set up the metadata for the batch you want to create. In this example, there are three shader properties stored in the buffer: ObjectToWorld, WorldToObject and BaseColor. The high bit (0x80000000) is set for these so they will read per-instance data.
var metadata = new NativeArray<MetadataValue>(3, Allocator.Temp);
metadata[0] = new MetadataValue { NameID = Shader.PropertyToID("unity_ObjectToWorld"), Value = 0x80000000 | byteAddressObjectToWorld, };
metadata[1] = new MetadataValue { NameID = Shader.PropertyToID("unity_WorldToObject"), Value = 0x80000000 | byteAddressWorldToObject, };
metadata[2] = new MetadataValue { NameID = Shader.PropertyToID("_BaseColor"), Value = 0x80000000 | byteAddressColor, };
And finally you can create this batch.
m_BatchID = m_BRG.AddBatch(metadata, m_InstanceData.bufferHandle);
As with mesh and material registration, you should create these batches when needed, but this example uses Start.
Culling Callback
Now that everything is set up it is finally time to fill in the draw command generation part in the culling callback.
First up you need to allocate memory for the output. This is necessary as you as a user must know how many commands and ranges you will output. In this example, it means the draw commands, the draw ranges, and the visible instances. Always allocate memory using Allocator.TempJob as Unity frees this memory on the backend, potentially on another thread, later on.
var drawCommands = (BatchCullingOutputDrawCommands*)cullingOutput.drawCommands.GetUnsafePtr();
drawCommands->drawCommands = (BatchDrawCommand*)UnsafeUtility.Malloc(UnsafeUtility.SizeOf<BatchDrawCommand>(), alignment, Allocator.TempJob);
drawCommands->drawRanges = (BatchDrawRange*)UnsafeUtility.Malloc(UnsafeUtility.SizeOf<BatchDrawRange>(), alignment, Allocator.TempJob);
drawCommands->visibleInstances = (int*)UnsafeUtility.Malloc(kNumInstances * sizeof(int), alignment, Allocator.TempJob);
drawCommands->drawCommandPickingInstanceIDs = null; // Picking is not handled
This example creates a single draw command, in a single range, that renders three instances. It doesnāt set up any sorting positions, but you can use these in your own implementation to sort draw commands.
drawCommands->drawCommandCount = 1;
drawCommands->drawRangeCount = 1;
drawCommands->visibleInstanceCount = kNumInstances;
drawCommands->instanceSortingPositions = null; // No sorting
drawCommands->instanceSortingPositionFloatCount = 0;
You need to set up each of the draw calls with a contiguous range of instance indices in the visible instances array. This example just does one draw call and writes out the values 0, 1, and 2. This is the index to be used for a specific instance to look up the data in a specific batch, even though the array is shared between many draw calls referencing multiple batches.
for (int i = 0; i < kNumInstances; ++i)
drawCommands->visibleInstances[i] = i;
Draw commands contain the information Unity needs to render with the appropriate mesh, material, and batch. The code below sets up the offset and count of the visible instances, as well as some other data Unity needs for rendering.
drawCommands->drawCommands[0].visibleOffset = 0;
drawCommands->drawCommands[0].visibleCount = kNumInstances;
drawCommands->drawCommands[0].batchID = m_BatchID;
drawCommands->drawCommands[0].materialID = m_MaterialID;
drawCommands->drawCommands[0].meshID = m_MeshID;
drawCommands->drawCommands[0].submeshIndex = 0;
drawCommands->drawCommands[0].splitVisibilityMask = 0xff;
drawCommands->drawCommands[0].flags = 0;
drawCommands->drawCommands[0].sortingPosition = 0;
Finally, you can set up the draw range. Itās just one in this case. A real project scene would use commands with different filtering settings to split the instances into different ranges.
drawCommands->drawRanges[0].drawCommandsBegin = 0;
drawCommands->drawRanges[0].drawCommandsCount = 1;
drawCommands->drawRanges[0].filterSettings = new BatchFilterSettings { renderingLayerMask = 0xffffffff, };
This simple example returns a default JobHandle, with all draw command generation happening on the main thread. To optimize performance, most of the things in this function should be a Burst job. If you use Burst jobs, Unity builds a job chain with the returned job handle here to filter, prepare, and execute draws matching filter settings from draw renderers and draw shadows commands in the SRP.
Additional user responsibilities
The downside of a more raw and flexible draw submission system is that you have to set up a lot more data. The example above only sets up matrices and colors, but for more complex scenes it will be your responsibility to set up any global illumination data such as light probes and light maps.
Unity canāt do this since it has no information about what instance is where any longer, itās all provided using an opaque GPU buffer.
Known issues
Some mobile devices might have a GPU performance regression compared to game objects. This can be due to the SSBO data loading path used in these examples (and by the current Hybrid Renderer). We are investigating different approaches to solve this without an explosion of shader variants.
Resources
Our test projects for URP and HDRP live in the Graphics repo.
The Unity Manual page for BatchRendererGroup
Future work
In the coming year, the Hybrid Rendering team will land improvements to the Hybrid Renderer package to use this interface, as well as work on interface improvements and bug fixes on the core Unity side of things.
Support for GLES3.1 and higher is on our roadmap, and we are also looking at supporting equivalents to Graphics.DrawProcedural and Graphics.DrawProceduralIndirect.