Yeah It’s going to be a small saving everywhere ( doing less work is always better on all platforms ) but understandably its most pronounced on older devices like iPad, iPhone 4, iPod 4 and 3GS where it can make a big difference so I generally try to avoid triangle resubmission at all costs.
Saying that, testing on the actual devices is the only way to know for sure it your code changes are benefiting performance. Whats faster on PC / Mac is not necessarily the same on idevice.
Last time I investigated this, I found double buffering (and just updating meshes in general) caused issues on certain Android HW. Can’t remember what specifically though, it was quite a long time ago. In 2D Toolkit, we don’t update index buffers where it isn’t necessary, and found it did speed things up quite a bit. I’ll have to investigate this again to see where we stand with different hardware configs and the latest Unity.
Edit: I’ve not done much tests on other devices (not much on PC/Mac either), my main target is mobile, and historically this has been mainly iOS devices. I wouldn’t be surprised if the flash and NaCl build have completely different perf characteristics.
It would be good to have some sort of official recommendations and expected behavior from the Unity guys, its not much fun having hacks in for different devices and configs.
I’ve done some tests on iPhone4 (slowest device I’ve got handy), and there results are mixed at best. This was hacked into the current version of 2D Toolkit - I might do some more isolated tests later to confirm these findings.
I already have a Vector3[ ] for mesh position and all other attributes, including indices cached locally.
Updating just the changed triangle data (rather than all attributes and resubmitting tri data), is by far the quickest, but has large performance spikes occasionally.
Double buffered, changing all data with a Mesh.Clear() followed by filling up the new mesh, is significantly slower than the above, taking up a lot more script time, but doesn’t spike as much as 1.
Using arm6 (GLES1.1), spikes when just changing triangle data almost never happens, and seems to be the quickest of the lot.
Double buffering with GLES1.1 doesn’t seem to give any performance gains, but this has similar performance characteristics to 2.
Looks like the best option for GLES2 is the double buffered one, but it’ll need the tri index cacheing code in there for it to be comparable to 1. I suppose one takeaway from this is that resubmitting all attributes and indices is expensive on iOS.
The fact theres such a difference in behaviour between GLES1.1 and GLES2 paths is worrying.
Unikon, any hints for 2Dtk users who have this problem? I’m getting intermittent periods of really high load from CreateVBO, with no particular pattern that I can discern.
Wait a bit if possible, I’m prepping a version which double buffers internally, and does some slightly more intelligent resubmitting to avoid the situation described above, and generally reduce the number of CPU branches in there. If you have very few animated sprites, I have that naive version described above which does away with almost all the CreateVBO overhead at the cost of updating mesh data - it only becomes significant with large numbers of animated sprite (around 135 in my case). Send me a PM/support email if you’d like to try it out.
Well that’s one of the mysteries answered for iOS: why has 1.1 always been faster than 2.0. However on droid, it’s looking like 2.0 is faster. So you’re looking for unity to solve this rather than workarounds. I don’t see why this is our problem to solve. Hopefully as unity know about it, there will be good progress on that front.
Oh, I was wrong. Yesterday I did a fast test and it seemed to work but today I have made the real test and I have checked that it has not been fixed. On iPad 1 Mesh.CreateVBO takes 39-42 ms each frame and it drops the FPS from 30 to 15.
How does it look when you test it on a device that is not underspeced on RAM and GPU like the iPad1 / iTouch4? Is it equally bad on the 3GS devices or 4S / iPad2 generations?
That drop is nothing to worry though as it looks like a VSync based drop so that would be the first place to check
Did you hook up the unity pro profiler while running on the device to ensure that you really need that much time, that its not just wait for vsync filling up?
The same applies to the iPad1 too naturally as its also a drop by a ‘natural fraction’ of 60 ie 60 → 30 → 15 → 7 / 8
Yep, VSync is disabled on both devices (VSync is set to “Don’t Sync”). Profiler shows the drop is casued only when using morphers that calls CreateVBO each frame.
I’m trying to make a game with many procedural trails from objects - the gameplay is based on this trails.
This CreateVBO bug is really annoying in my case.
I have tried two workarounds posted here: double buffering and dynamic batching path.
I don’t have Unity Pro and cannot see profiler details, so I used iPhone profiler stats (mainly cpu-player and render) to compare results. Tested on iPhone 4.
Two meshes (240 verts each), nothing changes, everything is static. render: 0.5
Two meshes (240 verts each), all vertex colors are changed via Colors32 each frame. render: 9.2
Same meshes with double buffering. render: 6.1
Same meshes with dummy zero triangles to force dynamic batching. batching: 0.4 render: -0.1
In the last case numbers are good, but there are also huge visible spikes. Cpu-player max shows 50
Combining all together (double buffering + forcing batching) removes some spikes, but not completely.
All this things are kind of very disappointing for me, looks like Unity is definitely not suited for dynamic geometry game for mobile. I’m even thinking of changing game engine for this project…
I have been experimenting with procedural meshes on iOS but my FPS is between 20 and 30 and occasionally flicks to 5. When should I be calling Mesh.MarkDynamic in the lifecycle of a procedural mesh? I just stuck it after the mesh is first created.
So The project I was working on finished early in the year, and in the end I got around the biggest probs of dynamic meshes in a number of sneaky ways. ( under unity 3.5 )
For text quad rendering I continued to use the double buffer technique and took lots of steps to make sure any vertex data submission was minimised ( couldn’t avoid this with score counters, or timers which need updating all the time ), so that was always expensive.
For all other GUI elements I engineered our GUI system to use a skinned mesh instead of the normal mesh class. All GUI vertex, bone and skinning data was generated dynamically at runtime and vertex information bound to those bones. Benefit of this was that instead of calculating and resubmitting vertex data to the mesh to move an element, I simply moved the bone the element was attached to.
Color information had to be submitted as normal ( using the faster Color32 ), HOWEVER I avoided any submissions of colour data for alpha by making the z position of the vertex equal to the alpha in the shader.
so in the glsl shader there is a line like this;
It makse sense given that GUI typically has no depth, so the position.z of a vertex was free to abuse.
This meant that all the fancy rotation, position, scale and alpha effects of our gui elements were in effect free. ( other than the minimal cost for it being a skinned mesh, but overall that was still cheaper than a dynamic mesh. )
I tried to speed this up even further by using the double buffer trick on skinned meshes, but this results in significantly worse performance. ( unlike with a normal mesh )
Now i’m back again and we’re now moving over to Unity 4.0+, so it felt like a good time to revisit this post and evaluate again how things have changed since last year.
Okay, first of test conditions;
Ipod4, iOS5.1, Unity 4.1.5, blank scene with just the mesh generator script and a camera.
At Start() I create 500 quads worth of vertex data, and store that information in local arrays.
Every frame I simply apply those same arrays unchanged back to the mesh ( effectivly resubmitting vertex data )
So CPU cost is how long that bit of code takes.
I tested,
Normal, vs MarkDynamic once vs MarkDynamic everyframe
Normal vs Double Buffering.
Color ( with triangles ) vs Color32 ( with triangles )
Clearing a mesh with Clear(true) and Clear(false) before submitting vertex information
Using the Profiler I note the average time for CPU of the script, and Mesh.CreateVBO, Mesh.SubmitVBO and mesh.DrawVBO for each test case.
The results!
Winner - A , MarkDynamic makes everything slower
Winner - B, double buffer meshes. ( but the margin isn’t as massive any more )
Winner B - Color32 is faster than Color, triangles submission is slow whatever.
Winner - A, use Clear(true) if you HAVE to clear the mesh. But avoid clear() calls like the plague
As always you should test results in your own code yourself before taking them as gospel, but heres my summary of the results;
Mesh class when submitting geometry every frame in 4.0 is A LOT faster than 3.5 ( especially CreateVBO )
Mesh.MarkDynamic() is SLOWER, so unless someone knows the use case for this??? i wouldn’t bother.
Color32 is faster than Color
Never ever Clear() a mesh every frame.
Avoid triangle submission every frame if you can, try caching your triangles at startup.
Double buffering meshes is still the fastest ( but margins are no where near as big as before ~1ms )
Not tested here but I recommend using a skinned mesh for elements that only move, scale and rotate
Hope this is of help to someone, many moons later.
Steven.