How to efficiently copy transform data into an array

Hey, so we are writing our own UI system since UGUI is pretty slow for us on mobile. The hardest parts are done, but the “hardest” part seems to be what should be the easiest.

I am not sure how to quickly copy the localToWorldMatrix of a lot of transforms into an array. After profiling, it seems there is a lot of Unity overhead when trying to do this. If I were writing this in in my own application I am pretty sure I could get this operation down to around 0.05 ms or less, but with Unity we are sitting around 0.2 ms. Please if you are going to comment with “0.2 ms” is already fast" don’t. We are running in a highly time critical environment (mobile) and we need to shave off time anywhere we can.

Right now the only option I can think of is to write a a custom Transform kind of system just for our UI system, but this seems like overkill.

Please, I would love any kind of help. I really need to get this figured out ASAP.

Just so you know, the entire UnityEngine.UI system is available in source control from the package manager (and from github) and you can look at it and probably see there isn’t going to be a lot of fat.

If there are some particular specific widgets you use in there that are too slow because of the use-generalization assumptions made, you might be able to get some benefit by tweaking them, but since so much of mobile performance bumps against pixel fill bounds, it’s unlikely to get you much benefit.

Instead, if you have a massively-complex UI, the time-honored way of breaking it into the static and the dynamic parts and placing those in different hierarchies is likely to give you the best improvements, at least based on the number of times I have optimized UI this way.

Look to optimize drawcalls this way too, as well as reducing transform updates. Use the Frame Debugger to see what all gets drawn each call, and that can give you some great insight into how to batch big chunks of identical draws into a single pass. Obviously good sprite packing helps a lot here too. In my experience the general Unity sprite packer can get you about 80-90% of the way, and for any more you need to hand-pack your sprites, probably using a third-party tool.

And of course, get your pixel overdraw down down down. Use the overdraw view in the Scene to see what your major offenders are. Often with a big UI you’ll have several near-full-screen pieces that can either go away or get baked down into a pre-made background for huge wins.

Here is a starting point:

https://learn.unity.com/tutorial/optimizing-unity-ui

3 Likes

Hey I appreciate the effort, but last time I checked, only part of the UI is available on github. The core of it is still locked away in a DLL. I may be wrong though.

Either way, a few things :

  1. fill rate has nothing to do with CPU performance. We are CPU bound on UI, not GPU bound.
  2. using multiple canvases has helped a bit, but not much. We are already optimized about as much as we can be there.
  3. our new system is significantly faster than Unity’s, just has less features. For comparison, UGUI can cost up to or even over 3 ms, whereas ours is about 0.5 ms or less with more objects. It still needs some work done, but I don’t anticipate it costing anywhere near the cost of UGUI.
  4. we have less than 30 batches from UGUI, yes sprite packing helps a ton here. With our new system, we will probably have less than 10 draw calls.

Probably this sounds a bit combative. I do appreciate your help, it’s just that we know we want to use a custom UI. UGUI is beyond saving for us I think. All of our research has shown it will just be too slow for what we want to do. Our new system has less draw calls, and less CPU work, but I would like to further optimize it.

We know what we are doing (for the most part, hehe). In fact right now even with a complex game, UGUI is our single biggest slowdown.

I am really just looking for a way to get faster access to a list of the world matrices.

1 Like

Can you determine from the profiler if the CPU cost issue is from getting the data marshaled from the native engine over to the C# side? Or is it another matrix transmogrification taking place on the C# side that is killing you?

Likely the answer is going to lie in some kind of clever C#-side caching and careful limitation as to what you recalculate each frame, limiting it to truly only the data that is “dirtied.” It may be further helped by your own knowledge of what you can and can’t get away with ignoring / assuming in your specific problem context.

If you really do have a bazillion objects, you might need to reach for something like the ECS system to get them all up and going at the rate you want.

yeah right now I am checking if the transform.hasChanged property (https://docs.unity3d.com/ScriptReference/Transform-hasChanged.html) is true, and only then updating the localToWorldMatrix from the Transform. If I look at a C++ profiler, I can see a lot of time is wasted on Unity overhead calling the transform.hasChanged method. A lot of it is null checking and random stuff like that. I tried turning off null checks for the method that accesses hasChanged, but that didn’t help.

Right now yeah basically thinking my best option is just to make my own version of Transform. But I mean come on… :face_palm:

We only have about 1000 objects we need to keep track of in the worst case scenario.

Are those 1000 objects all interactable on a tiny mobile screen?

It might be easier to just roll your own mesh-marshaling code… can I ask what the actual use is?

I trivially made a 300-item local cloud of moving air streak particles around my Jetpack Kurt game, all as a single drawcall mesh. I couldn’t use particles because I wanted some explicit realtime controls (wind movement during FixedUpdate) that proved difficult to integrate with the ParticleSystem.

All told it wasn’t a lot of code and it is pretty far down the list of computational resource usage…

Main Mote System module:

https://gist.github.com/kurtdekker/e3117bef40fbf329d87c2bfd3c0121d2