MeshInstanceRenderSystem (poor) performance

Just did a quick test and I'm finding MeshInstanceRenderSystem is much slower (2x) than MeshRenderer

Using MeshRenderers (415 'fps', 2.4ms cpu, 0.7ms render)

Using MeshInstanceRenderSystem (187 'fps', 5.4ms cpu, 1.9ms render)

As you can see batches, tris, verts and resolution are identical, it's just much slower to process with the pure implementation.

I think we need to wait for Graphics.DrawMeshInstanced (it is inside MeshInstanceRenderSystem) to fully support native containers. Your situation looks like the data copy hack's cost outweight the performance gain from ECS fast iteration.

(I remembered someone on this forum speed up the copy around that area.)

It's weird that the debug info shows Saved by batching: 0. This might mean that each and every one of your meshes are drawn individually (most probably because they don't share the same SharedMeshInstancedRenderer component).

The mesh instance renderer excels at drawing the same mesh with the same material 1023 instances at a time. If your meshes are not the same or does not share the same material, the normal unity mesh renderer would be better off.

This is a screenshot of what the stats window should look like if instancing is applied correctly (I'm drawing 10,000 cubes at with the same material and mesh):

3544539--284699--MeshInstanceRenderer.PNG

1 Like

Of course the there is no batching, it's a randomly generated voxel world.

While each chunk might share a material, they all have their own mesh.

Then that explains the poor performance. You’re better off using the built in mesh renderer or create your own implementation of a renderer using Graphics.DrawMesh :).

Like I mentioned above, the instanced mesh renderer’s purpose is to draw lots of the same thing at once. A usecase might be for drawing trees on a non-unity terrain.

This is basically my conclusion from this test. But I was hoping for a pure ECS approach, I guess I can look at my own solution and see if it'd actually do better than the mesh renderer.

Why do they all have their own mesh? Looks to me like you are just using some kind of box based algorithm.

Have you ever played minecraft?

that’s 64 chunks [meshes] randomly generated (currently just using perlin noise for testing)
Each chunk is 32x32x32 in size (though I’ll probably push this back up to 64x32x64) and not a single chunk is identical to another.

How would you propose to share meshes.

@Necromantic probably said that because (not counting texture) it looks like you have an identical cube mesh stacked together to make different height and shapes from the impression given by gridded top grass layer, but actually looking at wireframe they are of different mesh with different sizes and probably with a tiled texture. And if that is the case we would see multiple grass tops from the side of cliff.

So the solution to share mesh is to construct everything by a same cube mesh. It might be a lot more tris because of an overlapping vertex in the ground that we would never see but batched performance might be better. (And if you will have a digging mechanism then it is required to have a vertex inside ground)

For texture variation you might use material property block. The current MeshInstanceRenderer set it to null. You could copy the default rendering system code and modify it in some way so you have grass top and other as ground.

I must not be understanding because how can you possible turn this into shared meshes

3546114--284896--upload_2018-6-27_21-42-25.png

Each chunk is 32m x 32m making that image ~320mx567m

You can already see that it's highly optimized compared to the naive generation of doing singular blocks - about 10x less verts.

3546114--284897--upload_2018-6-27_21-44-42.png

If you were to make this into a single, repeating mesh, it'd be 5.8million batched draw calls

Also even if 2 sections are identical, there are going to be hundreds of textures that would make the meshes have different UVs. This is handled very nicely with Texture2DArrays because it allows using repeating UVs.

Maybe a more randomly generated version makes it clearer

3546123--284898--upload_2018-6-27_21-52-59.jpg

1 Like

Ok that was fast.. yes 5.8 mil would likely lose to non instanced dynamic batching call.

In that case how about exploiting the scale? I see that all of the meshes are different size of rectangle. If you use the same unit rectangle with varying scale it might be possible to do instanced drawing? (But again the current MeshInstanceRendererSystem use 1 1 1 scale in the matrix, you would need some work. And you might need to counter-scale the texture UV, or something...)

Each mesh is the same size (currently 32x32x32)

Within each mesh the faces vary because of the optimization of merging the same textures into large repeating quads to reduce verts - this is called greedy meshing Jason Gedge - Greedy Voxel Meshing

I can already reduce draw calls by using large chunks (64x64x64 or larger). But the thing is, it’s worse performance because I can’t cull as much.

There is the possibility of creating similar meshes for quads that share textures but there’s little point. Apart from being a much more complex and algorithm, it’d probably reduce performance afterwards anyway. As you can see from the diagram there are very few quads that have the same size so you have to go pretty far to start merging. Again this would mean less is culled, I’d be drawing a lot of stuff outside of the players field of view.

The entire world is also dynamic at runtime, blocks are added removed, whole chunks need regenerating on the fly every frame potentially. Just because 2 parts match when it was first generated, doesn’t mean they will match 10 seconds from then.

Not only that, but each chunk would need to know about all the other chunks - this get’s rid of one of the major performance improvements of parallelism. By separating everything into chunks, i can generate each chunk simultaneously with the job system. This entire world took under 1 second to generate. It is 4x slower to upload the new meshes to the GPU than it is to cull faces then oprtimize them, before calculating verts, uvs, etc.

And after saying all this, I don’t even need more performance - it runs amazingly.
The whole point of this post was just that mesh renderers seem to perform better than MeshInstanceRenderSystem, at least when no instancing is involved.

[[quote=“tertle, post:13, topic: 706185, username:tertle”]

[/quote]]( https://discussions.unity.com/t/706185 reply?quote=3546166)[quote=“tertle, post:13, topic: 706185, username:tertle”]
[The whole point of this post was just that mesh renderers seem to perform better than MeshInstanceRenderSystem, at least when no instancing is involved.]( https://discussions.unity.com/t/706185 reply?quote=3546166)
[/quote][

Which is to be expected
]( https://discussions.unity.com/t/706185 reply?quote=3546166)

1 Like

in another post, when i started experimenting with ecs. I was informed (i think by mikeacton) that meshinstancerender is not managing things properly yet and there is a lot of overhead. I understood that eventually it will be worked on. right now it serves the needs of drawing something on screen.
Thats why if you read around you will find people did implementations of their own systems. Simpler cleaner and more performant. Sorry i dont have links at the ready. but at the time that wasnt my main concern