Graphics.DrawMesh and DrawMeshInstanced functions internally add an ImmediateRenderer node to a render queue used each frame, then clear out that node at the end of the frame. This means that each Graphics.DrawMesh call needs to happen every frame.
In our test case (52251), which has been sitting in Unity enterprise support for several months, we submitted a change to the internal code which allows you to cache these calls between frames. In the example scene provided this removes 38ms of per-frame time each frame. We also tested a change similar to the one made in Graphics.DrawMeshInstanced, where you take an array of matrixâs instead of a single matrix, allowing you to reduce c#->cpp time, but our results show that this is only a small fraction of the CPU time used. The majority of the savings exists by having some way to not resubmit the calls each frame. The test case submitted allows you to A/B these results and see them for yourself.
Enterprise support recently closed our ticket as âyouâre change has been included into Unity 5.5â. This is clearly not the case, and since we have been unable to get a response on this issue after months of going through the official channels, Iâm hoping that someone responsible for the DrawMeshInstanced changes monitors these forums and can give us some feedback on this issue. Being able to use DrawMesh with instancing is nice and all, but itâs not very useful if itâs just going to make you bottlenecked on the CPU. Additionally, DrawMeshInstanced only works on high end devices, where as being able to cache the submit calls and clear them later allows any device to use Graphics.DrawMesh in an extremely efficient way.
I have a procedural game world with plants, trees, rocks etc that is not using the Unity terrain to render.
Would this changes allow me to skip using normal GameObjects with MeshRender and just pass inn an array of matrixâ to render multiple instances. Skipping the extra CPU work of traversing the object hierarchy every frame?
At what time does culling of objects happen? Will off-screen meshes in the list still be processed?
That would be awesome, for the game weâre working on related to the grass rendering system.
By the way if youâre interested:
It would be awesome if you wouldnât have to submit them each frame. I really hope âDrawMeshInstancedâ will solve some of the performance issues.
But by the way, âhigh end devicesâ? You mean that for mobile devices right? Since this instanced stuff tech exists for at least 12 years on desktop PCâs⌠Since DX9, right?
When you call Graphics.DrawMesh, it just inserts the relevant data structure used for rendering into the Queue and clears it out before the next frame; culling/batching/etc still happen as normal.
You can do that now in 5.5 using Graphics.DrawMeshInstanced (but not Graphics.DrawMesh), the only difference is that you have to do this every frame. For our use case, this was prohibitive (38ms), and wasnât available on our platforms (mobile).
You can do that now in 5.5 using Graphics.DrawMeshInstanced (but not Graphics.DrawMesh), the only difference is that you have to do this every frame. For our use case, this was prohibitive (38ms), and wasnât available on our platforms (mobile).[/QUOTE]
I understand. It would help a lot to have a system to âkeepâ this in the render loop.
Do you think that this prohibitive cost is caused by the submission itself or by the fact that every call of âDrawMeshâ the data passed is copied instead of being used? For example if you send a MaterialPropertyBlock every call you make to âDrawMeshâ all the data from it is copied instead of being used as-is.
Do you think that if it wonât copy the data it will work faster? Or you think that the submission itself is the bottleneck?
So am I correct to assume that the persistent calls would help with objects that donât move between frames and with moving (animated) objects DrawMesh would still be slower than using GameObjects?
@zeroyao - I think a command buffer like approach would likely be more Unity like of an API; using an int based ID system is fine for our uses, but a little, well, un-unity like, but it was far simpler for us to do (since weâre not as familiar with the source) than changing the command buffer system to work for this use case. Either way, something that solves the use case at similar performance would be amazing.
Yeah, not viable: Doesnât dynamically batch (1 draw call per mesh rendered), and you canât insert it into the normal rendering pathways (depth, shadow, drawing, etc), only after or before a given operation.
I agree we should have a more performant way to draw meshes repeatedly with dynamic batching. And if anybody will look into the Graphics.DrawMesh code anyway, it might also be interesting to finally get a way to set the sorting order when submitting a mesh.
Replying to follow this! It would be great if the persistent drawmesh calls could ârememberâ the result of the dynamic batching operation too, and reuse the result. There is CPU overhead for dynamic batching, but if our meshes are persistent and we have a thousand small meshes to draw, it would be awesome if it didnât have to recompute the dynamic batching every frame for these persistent meshes! Maybe it will already be doing this, I canât say since this has not been released yet, but just thought Iâd voice my suggestion anyway.
I have been doing some tests with billboards and instancing and i was wondering the same thing. As you can see on the images the saved draw calls number is huge but i am still wondering what the cpu overhead is. Btw the billboards are objects on the scene on unity 5.4.0f3 (not using graphics.drawmesh api)
Sure, here they are (cpu profiler and frame debugger):
I am just curious if there is a too many batched calls issue and what is the sweet spot between having more instances of fewer poly count objects or less instances of higher poly count objects.
Either way i cant wait to test the new DrawMeshInstancedPersistent
I am curious too. Thinking about it further, its probably impossible to âremember the resultâ, because objects could move in and out of the view frustum and be filled entirely. Wouldnât want those rendered, and so youd have to recompute dynamic batching each frame for each camera. Just speculating, we should discuss dynamic batching related issues elsewhere I guess, so we donât derail this thread.
Nice looking stuff, but if you plan to draw grass, I highly recommend Unity 5.5 with itâs âDrawMeshInstancedâ method. It improved my grass rendering by an order of x5 to x10. (From 30FPS to 300FPS).