Mobile Performance - Object Limits?

I’m building a game for the Gear VR, so it needs to render twice and hit 60fps on a Samsung Note 4. I’m running into performance trouble, and I’m posting here since I think the problem might extend to mobile projects in general.

I’m trying to build a game that involves a lot of small, simple objects in view at once. Mindful of the performance limitations of mobile VR development, I started out by building a stress test scene. So far, it seems that having large numbers of rendered objects in the scene is a huge performance killer. My current test involves 2000 trivial quads (1k per eye). Even though this involves only 8k vertices, and even when they’re all batched into a couple of draw calls (either dynamically or statically), this is enough to bring the framerate well below 60FPS.

So: Is it common for ~2k objects to drag down performance like this, even if they’re trivial and fully batched?

If that’s just a limitation of Unity on the Note 4, I’ll deal with it, but this strikes me as weird. Especially because I can have much more geometric complexity, and many more draw calls, and still run fast! So I’m thinking/hoping/praying that there’s something wrong, and this problem might just go away after a quick fix.

I’ve already checked most of the more obvious potential culprits. The scene has no extraneous elements, the shaders are simple, the Quality Settings are correct, and the dev software is up-to-date (Unity 4.6.3f1, Mobile SDK 0.5.0, Oculus Runtime 0.5.0.1). The profiler shows that the biggest problem is the CPU rendering time (and also that I’m running into the Gfx.WaitForPresent problem discussed in this thread). I’ve also noticed that the reported framerate tends to be at 30fps or 60fps without much time in-between, almost as if it’s trying to V-Sync, even though that’s definitely disabled in the settings.

Here’s a screenshot of the profiler results. Note the 2000 calls to FFWD Lights, even though the scene only includes unlit materials. (Is that normal?)

Also, a single-camera screenshot of the stress test itself (note that fill rate shouldn’t be a problem here) and a__nother of the Oculus Remote Monitor__ (not sure how to read this info, really).

If I just needed to render quads, I’d happily combine them into a single mesh (which works fine) and call it a day, but unfortunately this is just a stand-in for more complex moving objects that can’t be combined. Halp!

Isn’t that complaining about overdraw, or something?

Anyway it seems like your GPU is way behind ( a few frames) the requests to draw as you know.
The thread you linked to half-intimates that this is an intermittent bug in unity 4.x so I am bound to ask if you have tested in 5.0.1 (unless there is something I don’t know about the rift support that requires the older system)

The other thought is that this is simply natural batching overhead for such a massive batch and maybe splitting it up into smaller sub-meshes - JUST TO TEST - would not hurt. Get a feel for when it is flipping off the curve.

I recall batching a few thousand (very trivial) houses (EDIT: added some test pics from editor) - doing that did hit a bit performance dip past a certain point - they can’t have been much more than a few triangles each.

Maybe simplify the test numbers a lot and just do some exponential tests, starting at like 100 tri’s/quads/whatever and see where it starts to fall down.

It’s possible to happily bundle a couple of thousand batches (a couple) before paying the price but 8000 seems too high (for mobile too).

I was crunching 2500 calls down to about 200 I think with success, but without an exact match on the device… it’s all guesswork and conjecture.

Thought: can you redefine your patches to be 2x2 swatches? 2x2x2500 enough?

So to sum up: Possibly breaking up the load gpu and cpu side via managing the batching (best you can ) is the best pathway here with some tests for actual bottleneck values along the way.

Thanks for responding, twobob!

I don’t think overdraw is the problem. Not only does the problem still occur with opaque materials, but also I’ve done FAR worse things with overdraw and still hit a solid 60fps on this platform.

I don’t know whether the Gfx.WaitForPresent is a bug or an indication of a slow GPU, but either way, the profiler is still reporting a lot of CPU rendering time, so I still have to solve that problem at the very least. If it’s a bug, it’s still reported in Unity 5.0.1. And anyway, I can’t yet upgrade to Unity 5 due to a memory leak in the VR stuff. :-/

I’ve done some of this testing already. It seems that performance corresponds linearly to the number of rendered objects: ~500 quads takes ~5ms, ~1000 quads takes ~10ms, ~2000 quads takes ~20ms.

I’m not sure what you mean by “splitting into smaller sub-meshes”. Right now, each quad is a separate gameobject, so it can’t be split any smaller. Combining them into larger meshes works great, but that’s not an option outside of this simple stress test; my actual game involves more complex moving parts, so combining the meshes and adjusting the vertices would be far too slow.

To be clear, my problem is not polygon count (I can render >50k polygons with no trouble if they’re a single object). When I talk about “rendering 1k quads” I refer to the primitive Quad object in Unity. The problem seems to be the number of rendered objects, irrespective of their complexity.

Was your project on mobile or PC? If mobile, then I feel sure that my test ought to be running faster than it is.

In my case, its not 8000 batches. There are only 2000 MeshRenderers (of 4 vertices and 2 triangles each), which are batched down to 3 draw calls. Before doing this test, I felt sure that the device would be able to handle this scene. What could be slowing it down so much?

are they moving then?

Can you send me the demo?

(I don’t have a rift but I could at least run it, right?)

I was thinking about baking a single quad “model” with 32 (or 64 or something manageable) possible rotations into it.
Then instancing 8000 of those models are switching out to the ones with the correct rotation via some sort of very efficient int based enum (or something) shifting all the weight cpu side

This limits your rotations but reduces your footprint.
sort of like:

lets face it, crazy problems sometimes require crazy solutions.

Given a better, exact, idea of what you want to achieve perhaps I could come up with better.

Looking at the project would be ideal.

Here’s a simple demo project that illustrates the problem for me.

It should work for deploying to any Android device. You can deactivate the main camera and activate the OVRCameraRig if you’re working on Gear VR like me, but the single-camera version also reproduces this issue.

Obviously, the particular device will matter a lot. You can change the number of quads spawned if you want to test with different numbers.

cheers. I will chuck it on a few things and see what collapses.
Shame about the 4.x requirement. That could be a showstopper - or not - depends on the state of the feature retro-fitting on the old series I suppose. It was my feeling that the old series was to be put on life-support and little else once the 5_0 push was complete.

I will just double check it’s static batching we are talking about yeah? “which are batched” via?
Dynamic batching is hellish slow in comparison if I recall correctly.

I was talking about pro static batching in my examples above…

This same thing happens to me!! Draw calls are low (Only like 40 batches) and tris/verts are low (around 30k) but there are a lot of objects in the scene and it makes it lag :frowning: Any fix for this?