source code released: [Source Code] DotsUI - open source UI framework for DOTS
Motivation
The current Unity UI solution is very powerful but struggles with performance (especially with many objects instantiation). DOTS seems like a great solution to this problem. I decided to recreate Unity’s RectTransform with the maximum performance design in mind. Now it’s time to share my results and thoughts after a few weeks with ECS.
There’s one thing I have to make clear before I start: Currently there is no way to write “pure” UI in Dots (there is no value-type Mesh, Texture, Sprite, Font, Material and CommandBuffer). I called it pure, because the only game object in the scene is a Camera. All UI controls are entities with SCD’s for the types I listed above.
UI System Design
Rect transforms
I implemented simplified RectTransform system. RectTransform is defined as:
public struct RectTransform : IComponentData
{
public float2 AnchorMin;
public float2 AnchorMax;
public float2 Position;
public float2 SizeDelta;
public float2 Pivot;
}
I skipped scale and rotation in the first iteration, to make things simpler. Parenting is copy-pasted Parent System form Entities.Transforms (I replaced its components with my own). RectTransformSystem is fully jobified. Every canvas in calculating on its own worker thread. Canvas job goes recursive through children tree and fills WorldSpaceRect. Since there is no rotation, the rect can be defined as 2 float2:
public struct WorldSpaceRect : IComponentData
{
public float2 Min;
public float2 Max;
}
RectTransform system supports both fixed pixel size and physical size (calculated from Screen.dpi). My results match 1:1 Unity’s CanvasScaler.
Sprites
I keep sprites in SCD:
public struct SpriteImage : ISharedComponentData
{
public Sprite Value;
}
Sprite vertices and triangles are calculated in jobs. The job takes WorldSpaceRect and SpriteVertexData as input:
public struct SpriteVertexData
{
public float4 Outer;
public float4 Inner;
public float4 Padding;
public float4 Border;
public float PixelsPerUnit;
}
Sprite vertices and indices are stored in DynamicBuffers. It takes full advantage of multithreading and doesn’t require jobs sync points.
SDF Fonts
I borrowed TextMeshPro SDF fonts and materials. I recreated simple text Mesh batcher with all the most important features. Currently, it supports:
- Alignment (left/center/right, top/middle/bottom)
- Different font sizes
- Word wrapping
- Bold style
- Font color
It’s enough for most of my needs. Mesh generation is similar to the Sprite system. It’s executed after RectTransformSystem, reads data from WorldSpaceRect, and writes vertices/indices to DynamicBuffers. Text jobs run in parallel to sprite batching jobs (no sync points required).
UI Mesh batching
Mesh batching is done in two stages (2 jobs). In the first stage, it creates persistent HashMap<Entity, int> with material ids. MaterialID is just SCD index of sprite or text font. Since SCD index is accessible form job, it’s scheduled and the next job is prepared immediately. The second stage takes hashmap from a previous job as input, and goes through parent->children tree to build dynamic buffers with vertices, indices, and submeshes. The new submesh is created when the next entity MaterialID is different from the previous. SubMesh contains material type (sprite or text) and MaterialID. This job is also scheduled without sync points.
Render system
This is the last stage of the rendering process. Render system builds unity meshes and CommandBuffer from previously batched vertices. I can create one command buffer for all canvases or one CommandBuffer per canvas. Multiple command buffers are faster if we have static and dynamic canvases (frequently updated rect transforms), but the single command buffer is better for debugging purposes and testing.
Canvases are sorted according to their sorting IDs. Vertices are copied from dynamic buffers to managed Lists and pushed to the GPU. I’m using hacks with NoAllocHelpers to avoid GC spikes. After that, I build command buffers. This process is quite simple. It is just an iteration over submeshes and calls to DrawMesh with material and MaterialPropertyBlock. Since MaterialProperyBlock is copied into CommandBuffer, I can reuse one instance for all submeshes (no GC alloc).
Render system is single threaded, because of Mesh and CommandBuffer. It is very hard to improve this system further.
Input system
Mouse and Touch
I parse mouse clicks and touches from Input class, and translate it to DOTS friendly array of structs. Mouse clicks are stored as touches with negative finger ID (left click and first touch share the same ID). I run hierarchy tree traversal in the job (again one thread per canvas). Results are stored in NativeArray and then properly sorted in another job (by canvas sorting ID). These jobs require .Complete(), because I have to know which entity is focused (for keyboard input).
All inputs from mouse and touches are stored in DynamicBuffer. I also add an empty component to flag entities with filled buffers.
Keyboard
Keyboard events are gathered from Event.PopEvent() method. I tried to use the new Unity Input System, but I gave up (it’s still in active development and lacks low-level documentation). Keyboard inputs are added to DynamicBuffer of the focused entity.
Event system
This one is tricky. I couldn’t find satisfactory solution for user defined events. Natural “ECS style” is adding flag components + entity queries, but common button system doesn’t know about user defined types. I can attach my own flags permanently, but that will create many chunks for basically the same archetypes. I’m still thinking about pure ECS solution to this problem. For testing purposes I implemented simple delegate based system. It keeps mapping from entity to delegate. In OnUpdate I’m querying for all buttons with “click” component, and call delegate with entity as argument. Since old UnityEngine.UI is heavily based on delegates, it is easier to switch form old UI to DOTS with this approach. Example usage:
World.GetOrCreateSystem<ButtonEventSystem().OnClick(entity, (ent) => {
Debug.Log($"Click: {ent}");
});
World.GetOrCreateSystem<InputFieldSystem>().OnSubmit(entity, (ent) => {
Debug.Log(TextData.ToManagedString(EntityManager.GetBuffer<TextData>()));
});
I can still manually filter “click” events in other systems. All mouse and keyboard events are cleared at the end of the frame (concurrent EntityCommandBuffer).
Supported controls
Right now I made support for:
- Sprites (with 9-slice)
- Texts
- Rect masks
- Input fields (very primitive implementation, keyboard only, no selection, no shortcuts, no mobile keyboard)
- Buttons
Results
Rebuilding complex layout is very efficient with the job system. In some cases, my system is about 20x faster than UnityEngine.UI. I’m sure performance will drop with new features, but there is still room for more optimization. I’ll compare only layout rebuild because rendering performance is almost the same (UnityEngine.UI makes better batching in some cases, but I’ll improve it soon). Here is a comparison of similar UI layouts:
Complex panel hierarchy + text (one canvas)
300 rect transforms + heavy text (4496 words, 30314 characters)
UnityEngine.UI:
DOTS UI (you can easily distinguish it by missing italic font style):
DOTS vs UGUI comparison:
Zoomed DOTS:
Results are clear. Single canvas has poor CPU utilization, but still beats UGUI.
Complex panel hierarchy + text (multiple canvases)
Same canvas as above, but duplicated 7 times
DOTS vs UGUI rebuild comparison:
Zoomed DOTS:
This screenshot is very interesting. While layout rebuilding greatly benefits from multithreading, updating unity Mesh class and CommandBuffer is still single threaded and… slow. There is probably nothing I can do about it. But I’m very happy with results anyway. 500k vertices batched in about 100ms is still very promising. I’ll try to test new Mesh API once it’s available in 2019.3a and post results.
Runtime instantiation (20000 sprites)
This profiler snapshot show instantiation of new canvas with 20k very small sprites (all in the viewport):
It’s fast. Instantiation took about 30ms, parent system less than 50ms, and render system another 120ms. The whole player loop took less than 200ms. Such performance level is unachievable in MonoBehaviour world. Even with pooling, it would take a few seconds for activation and layout rebuild.
Conclusion
I started this project as an ECS rookie. I had years of experience with Unity Engine, but absolutely no background in data-oriented design. After a few days of playing with ECS examples, I felt very confident with this new API and design. I have to admit - Unity did a great job with DOTS. Despite poor documentation, API is clean and simple, and very powerful.
This UI system was written in 5 weeks. I think it’s not bad, considering this was my first attempt with pure ECS.
Is ECS suitable for UI system? Yes. Actually, I think it fits better than object-oriented design. UI controls are usually a set of components. Add sprite - it’s an image. Add selectable component - it’s an interactive control. Add event listener - it’s a button. The only problem I see right now is event handling (as I mentioned in InputSystem description).
What’s next
Currently, I’m finishing core and making the code cleaner. Once I finish, I’ll release the source code on github. I’m also working on WYSIWYG editor for this UI (with code generation and easier event handling).
===========================================================
My DOTS feedback
This post was quite long :). I dived deep into details, and now it’s time to share my thoughts.
Burst
I love the Burst Inspector. As someone with good assembly background, I can easily check results and find bottlenecks in the code (usually my mistakes). Actually, I was surprised how good Burst is in its current stage. I didn’t find any case where burst missed an opportunity for optimization. And vectorization working out of the box… I’ve been working on path-tracer in C++ and I know how hard it is to write code with SIMD support. Being able to write it in C# is just incredibly easy. Typical job execution is about 20-100x faster with burst compared to managed C# code. I was thinking about further improvements to the burst and here are my thoughts:
1. NativeList<> performance. Right now burst can compile direct pointer access for NativeArray, but struggle with NativeList. Same with any user-written native container. The burst should compile similar code for lists, arrays, and all other containers with linear memory layout.
2. Non-jobified code support. This is a must-have feature. Burst is just too powerful. I have a lot of main thread code without managed types for potential burst optimization. Even without vectorization, burst compiled code is at least 2-10x faster than managed C#.
3. Static read-only/const arrays. They are great candidates for Burst, because:
- They are immutable (thread safe)
- They can be easily optimized to constant values at compile time (no memory access at all)
- We don’t care about their allocation/deallocation, they just exist without initialization from code
Example use cases are MD5 and LZ4 algorithms. I already made jobified implementation of these, and they are way faster compared to C# implementations. However, I was forced to make hacks for const arrays (persistent NativeArrays with [ReadOnly] attributes).
4. Burst inspector readability. It’s a great tool, but it could be easily improved:
- Instructions coloring. I already made a modification to burst inspector code, just to test things, and it’s way easier to read:
- Ability to filter/gray out engine code. Since we can detect source file and line, it could be useful to focus on actual job code. Currently, there is a lot of code from chunk management, component data access, and native containers. It makes it harder to inspect the user-written code.\
- Filter functions. Currently, all compiled functions are combined in one plain text wall. There could be a combo box with compiled functions to select.
- Make selectable lines with Ctrl+C support.
- Clickable jumps. Usually jumps lead to labels within a function. It shouldn’t be hard to do.
- Open burst internal types for community development. We could implement all of these features in a few days, but we don’t have access to internal types from Burst.Runtime. Some unity packages are explicitly defined to see burst internal types. Why not make these types public? I think we miss an opportunity to make good open source tools by the community.
API
- [DeallocateOnJobCompletion] for NativeHashmap (and other custom containers).
- SetSharedComponentData(NativeArray, T). Since setting shared component data affects chunks layout, it seems like possible performance improvement to set one shared component for many entities (especially for the same archetype).
Other improvements
- Better memory layout visualization. I’d like to see where are my entities, components, and dynamic buffers, just like in the new Memory Profiler package. Something similar for entities could be very useful (chunks as zoomable bricks with components, sizes, and empty spaces).
- Transform-independent hierarchy system. Parent-children dependency is not always tied to matrix representation. Current Parent System requires LocalToWorld and LocalToParent components. For this reason, I had to write my own hierarchy system which is basically copy-pasted Unity’s implementation with my UI components. I’d like to see simple and fast hierarchy with just Parent component and Child buffer (+all necessary “hidden” components like PreviousParent).
- NativeArray/NativeSlice APIs in UnityEngine (Mesh, CommandBuffers etc.). I know this is being worked on as we speak (2019.3). I just wanted to say how badly we need that feature.
- Value-type textures, meshes, etc. They are wrappers over native IDs and buffers anyway, so it could be possible to keep IDs in ComponentData instead of references in SCD.
- Visualize jobs dependencies. Sometimes my jobs are waiting for each other, and there is no easy way to find which resource is blocking them from running in parallel. I’d like to see some kind of “debug mode” where all dependencies are tracked and displayed in the editor.
- Ability to free all leaked allocations. Sometimes my TempJob allocations are lost (in example exception is thrown after allocation but before job schedule) and editor starts throwing warnings. The only way to clear these warnings is to restart the editor.
- Make adding/removing empty components cheaper. In current design adding/removing empty components requires relocation. Example:
The only difference between these two chunks is RectMask. It’s an empty component. Can anyone explain it to me, why do they have to be stored in different chunks? I do not expect a simple “out of the box solution”. I can accept more boilerplate code just to make these operations faster.
Thats all. Thanks for reading. I hope you enjoyed it :).








