Graphics.DrawMesh/DrawMeshInstanced fundamentally bottlenecked by the CPU

Graphics.DrawMesh and DrawMeshInstanced functions internally add an ImmediateRenderer node to a render queue used each frame, then clear out that node at the end of the frame. This means that each Graphics.DrawMesh call needs to happen every frame.

In our test case (52251), which has been sitting in Unity enterprise support for several months, we submitted a change to the internal code which allows you to cache these calls between frames. In the example scene provided this removes 38ms of per-frame time each frame. We also tested a change similar to the one made in Graphics.DrawMeshInstanced, where you take an array of matrix’s instead of a single matrix, allowing you to reduce c#->cpp time, but our results show that this is only a small fraction of the CPU time used. The majority of the savings exists by having some way to not resubmit the calls each frame. The test case submitted allows you to A/B these results and see them for yourself.

Enterprise support recently closed our ticket as “you’re change has been included into Unity 5.5”. This is clearly not the case, and since we have been unable to get a response on this issue after months of going through the official channels, I’m hoping that someone responsible for the DrawMeshInstanced changes monitors these forums and can give us some feedback on this issue. Being able to use DrawMesh with instancing is nice and all, but it’s not very useful if it’s just going to make you bottlenecked on the CPU. Additionally, DrawMeshInstanced only works on high end devices, where as being able to cache the submit calls and clear them later allows any device to use Graphics.DrawMesh in an extremely efficient way.

23 Likes

Hi jbooth,
The proper authorities have been informed. It might take a while, though, given the upcoming weekend.

1 Like

Thanks LeonhardP!

I have a procedural game world with plants, trees, rocks etc that is not using the Unity terrain to render.
Would this changes allow me to skip using normal GameObjects with MeshRender and just pass inn an array of matrix’ to render multiple instances. Skipping the extra CPU work of traversing the object hierarchy every frame?

At what time does culling of objects happen? Will off-screen meshes in the list still be processed?

That would be awesome, for the game we’re working on related to the grass rendering system.

By the way if you’re interested:

It would be awesome if you wouldn’t have to submit them each frame. I really hope ‘DrawMeshInstanced’ will solve some of the performance issues.

But by the way, ‘high end devices’? You mean that for mobile devices right? Since this instanced stuff tech exists for at least 12 years on desktop PC’s… Since DX9, right?

When you call Graphics.DrawMesh, it just inserts the relevant data structure used for rendering into the Queue and clears it out before the next frame; culling/batching/etc still happen as normal.

You can do that now in 5.5 using Graphics.DrawMeshInstanced (but not Graphics.DrawMesh), the only difference is that you have to do this every frame. For our use case, this was prohibitive (38ms), and wasn’t available on our platforms (mobile).

1 Like

You can do that now in 5.5 using Graphics.DrawMeshInstanced (but not Graphics.DrawMesh), the only difference is that you have to do this every frame. For our use case, this was prohibitive (38ms), and wasn’t available on our platforms (mobile).[/QUOTE]

I understand. It would help a lot to have a system to “keep” this in the render loop.

Do you think that this prohibitive cost is caused by the submission itself or by the fact that every call of ‘DrawMesh’ the data passed is copied instead of being used? For example if you send a MaterialPropertyBlock every call you make to ‘DrawMesh’ all the data from it is copied instead of being used as-is.

Do you think that if it won’t copy the data it will work faster? Or you think that the submission itself is the bottleneck?

1 Like

Hey @jbooth_1 ,

I’ll talk to the team to see if it’s okay to add DrawMeshPersistent and DrawMeshInstancedPersistent.

10 Likes

So am I correct to assume that the persistent calls would help with objects that don’t move between frames and with moving (animated) objects DrawMesh would still be slower than using GameObjects?

@zeroyao - I think a command buffer like approach would likely be more Unity like of an API; using an int based ID system is fine for our uses, but a little, well, un-unity like, but it was far simpler for us to do (since we’re not as familiar with the source) than changing the command buffer system to work for this use case. Either way, something that solves the use case at similar performance would be amazing.

@livo_k: Pretty much, yes.

@jbooth_1 Have you tried RenderingCommandBuffer.DrawMesh?

1 Like

@zeroyao :

Yeah, not viable: Doesn’t dynamically batch (1 draw call per mesh rendered), and you can’t insert it into the normal rendering pathways (depth, shadow, drawing, etc), only after or before a given operation.

1 Like

I agree we should have a more performant way to draw meshes repeatedly with dynamic batching. And if anybody will look into the Graphics.DrawMesh code anyway, it might also be interesting to finally get a way to set the sorting order when submitting a mesh.

Replying to follow this! It would be great if the persistent drawmesh calls could ‘remember’ the result of the dynamic batching operation too, and reuse the result. There is CPU overhead for dynamic batching, but if our meshes are persistent and we have a thousand small meshes to draw, it would be awesome if it didn’t have to recompute the dynamic batching every frame for these persistent meshes! Maybe it will already be doing this, I can’t say since this has not been released yet, but just thought I’d voice my suggestion anyway. :slight_smile:

2 Likes

I have been doing some tests with billboards and instancing and i was wondering the same thing. As you can see on the images the saved draw calls number is huge but i am still wondering what the cpu overhead is. Btw the billboards are objects on the scene on unity 5.4.0f3 (not using graphics.drawmesh api)
2788726--202104--InstancingBillboards_NoImageEffects.PNG.jpg
2788726--202105--InstancingBillboards.PNG.jpg

If you could show screenshot with profiler view with extended rendering that would be helpful.

1 Like

Sure, here they are (cpu profiler and frame debugger):
2788775--202114--InstancingBillboards_Profiler.PNG.jpg
2788775--202115--InstancingBillboards_FrameDebugger.PNG
I am just curious if there is a too many batched calls issue and what is the sweet spot between having more instances of fewer poly count objects or less instances of higher poly count objects.
Either way i cant wait to test the new DrawMeshInstancedPersistent :wink:

I am curious too. Thinking about it further, its probably impossible to “remember the result”, because objects could move in and out of the view frustum and be filled entirely. Wouldn’t want those rendered, and so youd have to recompute dynamic batching each frame for each camera. Just speculating, we should discuss dynamic batching related issues elsewhere I guess, so we don’t derail this thread.

Nice looking stuff, but if you plan to draw grass, I highly recommend Unity 5.5 with it’s ‘DrawMeshInstanced’ method. It improved my grass rendering by an order of x5 to x10. (From 30FPS to 300FPS).

1 Like