Experiments with BatchRendererGroup on mobile devices

Hi!

I made some tests of the Entities Graphics on low end Android mobile device and a bit frustrated with results - skinned mesh rendering is broken (on GLES 3.1 device), worse performance when comparing to default URP, Forward+ and Linear only limitations. By the way, on high end Android phone with Vulkan support it runs smooth at 60 FPS

So, as our current project aims to support low end devices with at least GLES 3.1, I decided to try to build a simple rendering system based on BRG

My first step was to make sure BRG is able to render high number of animated skinned meshes at acceptable framerate (30 FPS). After some experimenting, I managed to get it rendering 250 skinned and animated meshes at ~35 FPS on low end GLES 3.1 device (Xiaomi Mi A1). Performance is more or less comparable when render the same scene on default URP renderer (legacy Animation, no GPU skinning). Each skinned mesh has 52 bones, 2000 vertices and, 1 bone per vertex. This step was the most complex, but I quite satisfied with the results.

Rendering plain mesh renderers is relatively easy, no problems so far. After completing remaining tasks like LOD and frustum culling, I finally got a simple rendering system which successfully replaced our current ugly hybrid approach (based on game objects). Currently only thing to fix is sorting meshes with transparency

Also, one thing to note, ShaderGraph-based shaders performs much worse with my implementation, so I used hand-written fragment shaders with lightweight PBR

BRG is a very nice tech as it lets you to do more flexible rendering. Next steps - is to add per-instance point lighting support and lightmaps support

As I mentioned above, I'm currently trying to solve sorting issues with transparent meshes. Tried to implement a basic sorting like in the example:

https://github.com/Unity-Technologies/Graphics/blob/master/Tests/SRPTests/Packages/com.unity.testing.brg/Scripts/RenderBRG.cs

  • I call draw commands for transparent meshes with multiple instance count (so, not a draw call per each transparent instance) At some positions and camera angles transparent meshes are still drawing with wrong order (between two separated draw calls). I assume it happens because sorting is performing only within a draw call context, and no between drawcall commands? If this assumption correct, I guess I should perform split a draw call with multiple instances and make them for each instance.
  • instanceSortingPositions data should be used only for sorting transparent meshes? Or it can be helpful for opaque and alphatest geometry, for example, to reduce overdraw?
  • There is an allDepthSorted field in DrawRange.filterSettings. It is not clear to me how and when to use it - does it mean it is more optimal to split non-sorted and sorted geometry between two draw ranges?

9094300--1260079--upload_2023-6-21_13-21-32.png


Looks like official needs to put a lot resources to optimize Entities Graphics on low end Android mobile device as much as possible. At least needs to same performance with game object if not much better performance.

1 Like

  • That is correct, the sorting happens at the draw command level, since that is the only way to get correct interleaving with GameObjects. Unity will also respect the instance order within the draw command itself, in case you want to do some more coarse grained sorting.
  • The primary use case is transparencies, but the feature is not specific to transparencies in any way. You could use it to try to reduce overdraw. For opaque overdraw purposes, it is also possible to use the BatchDrawCommand.sortingPosition field without the depth sorting flag enabled, in which case Unity will sort the draw command using the literal value of the field as a proxy depth (e.g. setting a value of -2 will cause the draw command to be sorted using -2.0 as the depth value). This can be used as a hint to get some draw commands to render earlier to reduce overdraw, but is only useful for opaques where accurate ordering is not required for correct output.
  • The purpose of this field is for the user to signal that the given draw range contains only depth sorted draw commands, which allows Unity to skip the entire range when rendering a pass that does not use depth sorting. The use of this field is not required for correctness (the output will be exactly the same either way), but it may bring a small CPU benefit if you are using a lot of depth sorted draws. Most users will probably not need to care about this flag.
1 Like

Ah, ok, now it's much clearer for me. Thanks for the quick response!

1 Like


Any chance you could elaborate on what you did for this? I also rewrote skinned mesh rendering, but I have been targeting more of the high-end.

Basically, i store SkinMatrix arrays for each instance in the same batch buffers - in this way i get fast reading speeds. Tried to store them in a global skin matrix buffer - but this is noticeably slower. I use small 16 Kb buffers for each batch (about 6-7 skinned mesh instances per batch in my case. So, less bones -> better instancing)

Also, i'm not using compute shaders to upload instance data, only LockBufferForWrite

1 Like