So I finally started digging into the new UI Toolkit, doing some basic tests and stuff and I’m genuinely loving the principles and structure of it - yeah, it’s a bit clunky in places but I can accept that as it’s in preview still and there’s always going to be things you need to work around, but
Hundreds of kBs of allocations any time the UI changes? What are you doing here? How does anyone at unity think this is a viable way of creating games? Sure it’s nice that it allocates 0 bytes when the UI is static, but I don’t know about anyone else, but mostly games aren’t static.
This makes the UI unusable if you have any sense of quality around framerate. On my last game, we worked our asses off to get allocations down to nearly 0 during gameplay. I estimate we allocate less than 1kB in 5-10 minute level for gameplay, interactions, characters, projectiles, effects, animated UI and more. But here’s the new UI system just gonna thrash megabytes of allocations in that same time period, like it’s amateur hour over here.
I honestly can’t believe that after so many years and even a recent push for dots and “performance by default”, there’s still programmers at Unity that are oblivious to how much of a killer garbage collection is for games made in Unity. So what’s going on here?
UIToolkit has tonnes of potential. I really want to reiterate that I’m genuinely excited about this paradigm. I absolutely don’t want to write it off, but in it’s current state it’s DOA. There’s no point in adding extra features to make UI “nicer” if the system that it’s built upon is fundamentally not fit fit purpose. Outside of spawning new VisualElements (which given the API, should be something we can pool on our end), and maybe some initial setup, the UI system should not be allocating anything at runtime.
Edit - I did only profile in editor, so this could be one of those situations where Unity is thrashing allocations in editor and not bothering to flag it in any way. I’ll look into that at some point, but if that is the case, it should be very obvious that this is still a huge problem and time sink for end users with a Allocation target of 0.
It’s hard to understand what’s going on without knowing what kind of operation you are doing. Some operations will allocate, but have fast paths available that you can enable through usage hints.
A common example is transforming VisualElements. By default, the geometry will be updated on the main thread and may need to allocate extra memory to avoid clashing with what’s being used by the GPU. In that situation, you can set usageHint = DynamicTransform on that VisualElement, which will let the GPU do the transform on the geometry (and also avoid extra allocations).
It’s hard to get high-performance by default with such a complex UI system, but we are working on it. We are working on the animation system that will enable the proper hints automatically when we detect they would be beneficial. This should remove the burden from the user for most common situations.
We will also improve on other areas. For example, changing the tint color of VisualElements is not particularly efficient (and my allocate). We are working to push these updates on the GPU. Changing the opacity is already optimized like that.
If you can share your test project, that would be appreciated.
Thanks for your response, but I’m afraid it doesn’t really give me much hope.
I don’t have anything too useful to share right now, but will be happy to share some things later if I end up using UIToolkit properly and have worked through some more of the principles involved.
However, what I’m reading from your reply is that 0 allocations is probably an unachievable goal for UIToolkit without:
Very limited motion (which is kind of nonsense in most games)
Lots of additional work to flag elements that move (which with game UI is often everything because appearing / disappearing transitions normally have some motion in them).
I note on the official documentation that the number of dynamic usageHints is limited differently per platform. So I guess it’s going to be basically impossible to ensure good memory behaviour. All the time-saving we might gain from improved workflows will be lost trying to blindly massage a black box implementation into good behaviour (including duplicate testing across multiple hardware targets) every time we iterate some UI. I always expect Unity to be somewhat out of touch with real-world game development, but this is quite excessive.
“We are working on the animation system that will enable the proper hints automatically when we detect they would be beneficial.” - if this means keyframe animators, then that’s not “most common situations”. More normally UI motions are tweened or scripted in some way in my experience. Even if it does include css-style tweening, there’s still an undisclosed and platform-variant limit on dynamic visual elements, so this can’t be considered a solution. Just more black-box complication and more cognitive load for when we are wasting our time trying to massage the system to behave.
Unfortunately, it really sounds like this poor memory behaviour is going to be a fundamental flaw in the UI and we probably just have to not use the feature if we are trying to make a high quality end product. Which is honestly a really bad outcome for us and for you.
And I repeat from my previous post: The UI layout system should not be allocating to the managed heap at all. If there is need for allocation, push that to native. I don’t care if there is a CPU cost to handling in native, marginally slower all the time is a thousand times more acceptable than the frequent CPU spikes from garbage collection on a system allocating hundreds of kBs in the update loop.
I want to clarify that even if “everything” moves on screen doesn’t mean that everything has to get the DynamicTransform hint. Only the parent moving element can have the hint, its children will inherit the dynamic transform and will be “skinned” on the GPU as well. If the children are also moving relative to the parent, they can get their own DynamicTransform hint as well of course.
The documentation needs to be updated. We used to have severe storage limitations on lower-end devices, hence the warning in the DynamicTransform docs. We have improved this situation by using textures as storage on these devices (instead of a memory-limited constant buffer).
No, I was referring to animation tweens. Getting high-performance tweens for transforms, colors, etc. is our focus right now.
This is our goal as well. I don’t know if we’ll get there, but we are trying to.
OK, thank you very much for the follow up - that is a lot more reassuring. I guess there’s always a difficulty with documentation for preview things. I think I saw somewhere else that the documentation will be getting a full pass for 1.0?
Additional notes with the documentation for usageHints
DynamicTransform - I didn’t really read this as being helpful with the allocations I was seeing. I think if you fully understand what’s under the hood that would be very clear, but to newcomers it’s not. It’s also not clear that it affects all children too (partly because the GroupTransform exists)
GroupTransform - I think I understand the usage of this now, but it’s very hard to parse the text here.
Finally, unrelated but I have your ear right now: UIBuilder doesn’t play nice with Perforce. It will allow you to make changes but silently fail when you hit ctrl-s and needs a manual checkout of all files to work properly. It should just checkout the files as soon as you start making changes. Should I file a formal bug report for this?
I’m seeing 470B of allocations on most frames, but not all (which is the most confusing thing) and it’s showing up in the Renderchain.UpdateVisuals, and deep profiling shows it in TextCoreHandle.DrawText().
So should I take from this that any textual changes in UI are inherently going to allocate?
I’ve noted your observations regarding the UsageHint docs, we’ll try to make that clearer.
I would guess that this label is using a dynamic font, the sporadic allocations would occur when new characters are used. Using a static (pre-baked) font asset should help in that regard, but I’ll make sure to have a look to make sure we aren’t doing something weird. Thanks for pointing this out.
Could be that it’s a dynamic font - I’ve not touched fonts at all yet, so default settings all the way. Though the string being pushed (check the code snippet) is only changing in terms of the number - there’s only 10 possible characters that could potentially appear there so I don’t think that’s the issue. Unless it’s frame-to-frame “forgetting” characters, in which case your hypothesis would 100% match what I’m seeing.
Reparenting elements seems to be very heavy on allocations too
Both on the actual change of the hierarchy under VisualElement.Add()
I’ve set every element here as dynamic, to rule that out as an issue. 40kB+ to move a label in the hierarchy is wild.
And maybe there’s some workaround for this one too, but how many context-specific workarounds will we need to learn? And how limited is the usable toolset going to be once we discover how to tread through the allocation minefield that is being laid out?
I’m now just back to feeling that using UIElements is just going to be a constant and incredibly tedious struggle at every turn. I cannot imagine how this can be considered fit for purpose, especially given the PR push around how easy and painless and optimized this new system is. I’m sorry that this is harsh, but UI development in Unity has always been a massive pain point and improving tools at the cost of making the end product effectively unusable is such a huge disappointment
Reparenting is (very) expensive and requires a re-layout and some visibility recomputations. This is not a very optimized code-path at this time (but will improve over time), so it’s best for performance to keep the hierarchy static and to show/hide part of it. We are also trying to get rid of the managed allocations.
The best thing you can do to help would be to share a project that represents what you consider the common use-cases that should just work, run fast, and without allocations. We have our own test cases that we optimize for, but having our users share their use cases helps us prioritize which real-world features to optimize first.
Could you clarify, if this test was also in the editor? If so, could you please compare these with build performance?
This thread is terrifying … I have been building my UI in this system based on the promise of - as you mentioned - easy to use and performant system. I somehow got over the “ease of use” part, fighting the system along the way. But this performance is shocking.
Not sure what more can a user do to help other than the OP’s example. We would all expect such a simple usage of setting a label text to have no performance impact at all. Wouldn’t you? Or is such a simple example not included in your internal list of things to work fast and clean out of the box? Perhaps the other way around would be more beneficial - if you could provide list of usage examples that you optimized for internally, so that we know how and what to use, and what to avoid. Also, a detailed plan would be helpful to let us know on which usages we can rely in the future to be performant, so that we can start building them now.
It may feel obvious to some users, but we have to make sure our labels renders the glyphs correctly on all platforms, in all languages, for every font styles. We cannot simply pre-bake an ASCII font atlas in Arial Regular and call it a day. We certainly have work to do to make the default use-case more performant though, I’ll admit that. The user has to massage UI Toolkit a bit more than necessary to get decent performances.
As for “Not sure what more can a user do to help other than the OP’s example.”. Sharing examples here is helpful, but to get the full story we need an actual project. I can only speculate on how the VisualElement hierarchy is set up, which version of Unity and UI Toolkit he is using, and whether the C# is running in Debug or Release. The measured time on my machine were significantly different than what was reported here.
For sure, there are rough corners that we are improving as we speak. There are still some low-hanging fruits that we have to tackle, but to share a few examples, in the past weeks we’ve introduced:
Dynamic atlassing of textures, so that we can draw massive hierarchies in a single draw call (this used to be a major bottleneck)
Multi-texturing to avoid breaking batches on texture changes
On GPU evaluation of TextCore properties (outlines/shadows)
We’re actively working on:
Visibility optimisations (so that hide/show should become very cheap)
Reducing allocations across the board
etc.
We should get in a much better position in the next couple of releases. But again, please share your use-cases. It’s easy to find horrendous performances in a toy example with degenerate use-cases, but we’re trying to improve the real-world use-cases in priority.
Thanks for the response. Sounds reassuring enough! Have you thought about compiling some “best practices” page in the manual to help guide our learning paths along your efforts to optimize the system?
I use .CloneTree and .Add heavily to build the hierarchy and noticed that a more static hierarchy approach was suggested. Does this mean, for example, that if building a list of nested visual elements where I anticipate around 5 to 15 elements, that it’s better to build static hierarchy with 20 elements and turn them on/off?
Because, if the plan is to make .Add very performant in future, I’d rather keep the dynamic way and wait for the performance to catch up. To make these decisions though, a timeframe would be very helpful.
The point is that we want to use this system and thus are forced to make some decisions. However, many times it’s not obvious to us that a decision is even needed. To clarify, we were presented a system that magically one-draw-calls all the UI and is centered around a responsive style-sheet authoring tool - as an answer to all of the common UI problems and challenges in previous Unity UI system. It’s ok to learn that there are caveats and temporary workarounds needed, however upon discovering this, we suddenly question everything. And it would be great to have some guidance in the manner of “Ok, these are the goals, until we reach them, use this, do this, avoid that.”
So the thing here is that I’m just getting started with UI Toolkit and dabbling with the tools. I’m not in production right now, and I don’t have any real-world use-case for you. And I will admit, what I’ve been trying is not the most obvious in-game UI examples. I was actually attempting to rebuild some debug helper overlay that was previously implemented in imGUI and gets used every single day. It also needs more refactoring than I anticipated to fit a retained model rather than an immediate model (hence the weirdness of reparenting etc. which were intermediate steps). But while it’s not a bog-standard main menu, it is a real thing, not a “degenerate toy”.
And yes, it will always be possible to create things that perform badly, regardless of the tools. That’s on us. But when it comes to internal allocations, they need to get down to (nearly) 0 on your end before it’s even possible for us to produce anything that consistently performs well. The foundation is, at this point, unfit for purpose. And that’s got nothing to do with whether any particular use case is right or wrong, it’s a core concept thing. If I’m not adding new objects to the UI, then I expect there to be no allocations. And yes that’s very tough, I know, but it’s also essential to be viable within this engine.
As I said way up above - I’d be happy to share something later when I have something. But right now, creating real world examples really means several days or weeks of work - to produce production-representative UI just for the sake of testing. But I do appreciate it’s frustrating to hear complaints but not have concrete examples to work on so I’ll stop poking at this now.
If there is one thing to take away from this, please let it be that these allocations are a fundamental issue and requiring end users to limit ourselves to some subset of cookie-cutter UI and also have encyclopaedic knowledge of hidden implementation details that we can’t possibly know is not a solution.
And I do hear and understand that you that you are continuously optimizing, but many of your comments focus on specific use cases and also I notice that you are often conflating generalized performance with the very specific issue of managed allocations. These things make it hard to feel entirely reassured.
Yes, absolutely. This is something that very much lacking at the moment.
Longer term, we’re also thinking about adding more performance information in the UI Toolkit Debugger window to help steer the users into more performant API use.
Adding new elements in the hierarchy will always incur some geometry generation cost to rebuild the visuals. They shouldn’t incur managed allocations. The geometry is allocated in GPU-mapped memory and shouldn’t show up in managed allocations. Apart from text elements that is. Text is a special snowflake in that regard, but should get better soon.
As for keeping a static hierarchy with show/hide, we have a nasty bug at the moment where setting visible=false followed by visible=true will essentially behave the same as adding new elements in the hierarchy. We are working to fix this in UI Toolkit in 2021.2, so this is what I expect the following operations will perform:
opacity=0 / opacity=1 should be practically “free”, only minor GPU data to update
visible=false / visible=true should be cheap, but may require walking down the hierarchy to validate the visibility of children
display=None / display=Flex should be cheap
RemoveFromHierarchy / Add may require some geometry regeneration / layout computation
In all cases, we’ll be working hard to avoid useless allocations.
That’s true. This thread was all about allocation, and I diverged from that topic. I just wanted to clarify that we understand the importance of avoiding allocations. Seeing one in the profiler always raises an eyebrow in our team, and for that reason we very much want our users to let us know when something suspicious occurs in that regard.
This really feels like a classic XY problem to me. The goal should not be reducing managed allocations to zero. That would be nice. But the real goal should be using a garbage collector that isn’t stuck in 2005 or whenever Mono was first implemented. Allocations are not expensive on .NET Core/.Net 5. They’re not expensive in Java or JS or Unreal Engine. They should be a couple pointer bumps and some occasional moving/compaction.
Instead of finding ways to get users (and internal teams like UI toolkit, apparently) to avoid allocations, Unity needs to make allocations less expensive.
Both can be true. Different games have different requirements. For some, any garbage collect pass is simply a no-no, for others, it doesn’t matter.
Ideally, the internal tools (such as UI Toolkit) wouldn’t allocate managed memory, and at the same time, the garbage collector would be faster. As discussed in this thread, we’re working on the former, and there’s ongoing work by the scripting team to help with the latter (such as the incremental garbage collector).