CPU bottleneck rendering 380K verts on a high end PC.

Hi everyone,

I am trying to pinpoint what is holding back my rendering.
I am procedurally generating puzzle pieces. Like 2000 of them. They all have 6 sides, 2 materials, and currently the scene is empty besides that. No code running, just those pieces.

My worst case scenario for this scene is that i want all 2000 of them on screen at once, none connected, so all sides are enabled and rendering. It uses 1 material for all fronts, and 1 for all backs AND sides. These are dynamically batched into 36 drawcalls or so. (so 6 submeshes per mesh renderer)
The scene sits around 380k Verts, and for some reason it is taking the CPU on my PC 11ms just to get these pieces going. While my GPU is just sitting idle most of the time, putting in 4ms of work per frame.
Since i try to run this at 90FPS as my goal on this high end machine, the 11ms without any code or physics at work is kind of killing it right now.

All materials are mobile diffuse, only 1 light is in the scene, it is directional. Graphics settings has anti aliasing on 8x (doesn’t seem to matter much) All the other things are either minimal or off. No shadows.

I tried a ton of other player settings and such, they do no change anything whatsoever, unless i disable dynamic batching, it gets a bit worse then.
CPU is nearly 50% utilized. I think that this part just isn’t multithreaded.

PC stats:
GTX 980
16GB RAM
i5 4690K 3.50 Ghz
Unity 5.3.4f1

Here are some screenshots to illustrate the scene:
Hierarchy, game view:
Statistics:
Profiler GPU:
Profiler CPU Timeline:
Profiler CPU Hierarchy:

Am i missing something obvious? I ran scenes with millions of triangle without issue on better settings, hard shadowing and plenty of textures. What gives?
Something is keeping my CPU busy in processing these puzzle pieces, and i do not get what it exactly is.

Any help would be greatly appreciated.
Sincerely,
Vince

Too many separate pieces. Use the Mesh class to create everything in a few large meshes, instead of thousands of tiny meshes.

–Eric

Yes, that is all fine and dandy. But i am trying to create a puzzle here. These are all going to be dynamic. And they are all going to need the front/back separate because of the separate texture/material. And the sides are the same material as the back, but they are separate because when joined together, they cause horrible pixelated artifacting on the seams. So i disable them when they get connected.

To be clear, each GameObject is 1 piece. Which has 1 mesh renderer, which has 6 submeshes. And i can’t really seem to get around that due to what i want to achieve.

If that turns out to be the issue, i would like to know why or how the CPU struggles with this. And if i can work my way around this somehow by doing my own wonky ass batching every frame of some sorts?
Moving any of the load to the GPU would be the best thing, but that implies that i have a say in the matter. At this point i do not really have that.

Thanks for the feedback so far.

Yes, I know. That doesn’t change anything.

You have a crazy number of separate meshes here (over 13K). Dynamic batching helps a bit as you note, but dynamically batching all those objects every frame takes a lot of time itself, plus you have stuff like frustum culling which is also not free. It would be faster to cull a few objects instead of thousands.

Use the Mesh class. There’s nothing about it that prevents dynamic content, it just becomes more complex to program. Although you would of course set it up so that once you have the system in place, you would have simple functions to move pieces around and so on.

–Eric

So you are telling me to use the mesh class to batch stuff myself every frame? (move a piece, recalculate whole mesh around that etc.?) Or to change the meshes at runtime when splits are needed for example the sides?
I am not really sure at what you are getting at with “use the mesh class” since i am already using it to generate this whole puzzle edges and all from scratch.

If it is any of the above, i think the first would be way too slow. And the 2nd would probably be doable, but that would be quite the mess if you make a mistake when creating all the mesh parts.

I kind of assume that using submeshes on a single mesh renderer would allow it do some of this smart stuff for me internally. Sadly this is not the case then?

Thanks for the insight,
Vince

You don’t need to recalculate the whole mesh, just move some verts around. It’s extremely fast. Submeshes don’t really help; it’s almost the same as having separate objects.

–Eric

Hi,
I am very interested in this too,
How exactly would you use the Mesh class in this situation ?
Is there any example or code available ?
Kind regards,
Ippokratis

As a really basic example, say you have a mesh that you use to make two “separate” quads (but in the same mesh), 8 verts total, 4 for each quad. You want to move the first quad over by 1 unit on the x axis, so you loop through verts 0-3 and add Vector3(1, 0, 0) to each one, then upload the vertex array. That’s it.

–Eric

I really need to find a way to figure out which mesh verts i would need to edit. At this point i was happy that i could identify what is what and where during generation when it was all a seperate submesh and 1 huge set of vertices. But i would need a whole lot more coordination just to get a piece to move. I suppose i will back up the current, and then try this out.

On this point, i will still need to disable sides of my pieces at some point. How do i go about it? Just remove the triangles from the huge int array? Or is there a better way?

The pieces all have the same number of verts? Then it’s just trivial math. Otherwise have an array with the number of verts/piece, and it’s still pretty simple.

Source control. :wink: It’s asking for trouble to not have any safety net.

You can use Lists now with SetVertices etc., so that’s feasible (with arrays there would be a lot of garbage generated because of having to make new arrays when the size changes). Or you can just set the verts for the unwanted parts to something like Vector3.zero; degenerate triangles aren’t rendered anyway and that would probably make things simpler.

–Eric

They do not, but the math should be simple still. It just becomes a mess to figure out where it went wrong when you miscount by 1 or more in such math!

I know, i just don’t have a private repo unless i use the one at work. (This is a spare time project)

This sounds like a good idea. If i use Vector3.zero’s method, i could even still use an array without trouble. Only setTriangles would maybe be an issue. I will figure that out when i get there.

A colleague of mine asked if this would be feasible to do with a custom shader. It just takes the top side of the piece, extrudes the sides and the backside on the shader end instead of in mesh/code. I do not have a lot of shader knowledge, but if this would fare even better, i might consider learning shaders just to achieve this. Also i wonder how both methods hold up on mobile devices.

Thanks for all the info. Really learned a lot!

The triangles would stay the same actually; the only thing you need to touch is the verts.

–Eric

Huh, interesting. Wish they would write down edge cases like this in their documentation. Obviously it would be near impossible to know them all or write them all out. But it is useful to know a 0,0,0 vert stack would just get ignored.

Thanks again,
Vince

Any degenerate triangle; the coords aren’t important. It’s a good idea to have some knowledge of how GPUs and graphics APIs work outside Unity, since they can’t (and really shouldn’t try to) cover everything.

–Eric

This. So, what about some Unity’s tutorial?!! :smile:

It’s nothing specific to Unity; plenty of places to learn.

–Eric

Like Eric says use the Mesh API, you store the position of all your pieces in arrays instead of GO, which is far more efficient.
If it’s 2D and you don’t want to deal with the mesh API, try Matt Rix’s Futile framework.

I think i will manage just fine. With ease i can reduce the amount of meshes from 6 to 2, and then just write code to collapse edges on request. If that doesn’t get me the speed i want out of it. I need to go full 3D mesh editing and i am wondering if rotating and positioning those will be efficient enough if you have 1000 pieces to move around using just vertices :slight_smile: