Optimization Data

I’ve started looking into optimization, the language commonly used is to optimize for the target platform.

Two things, This leaves me with no idea what good data for what I’m doing might look like and it suggests there is a time to benefit ratio. That it doesn’t matter how inefficient a project is as long as it works well for the minimum planned requirements.

Even so, I still need a frame of reference to recognize problems in a data set, because just looking at the tallest graph lines near the end of a production cycle probably isn’t a swell practice.

I’ve started building and running tasks on poorly optimized code.

I have 2 builds running different solutions of terminal windows, each intentionally left poorly optimized. Each build has 18 chatboxes, each updating text to looping string array of size 32 once on a fixed update and a regular update. Both windows are built about the the same with functional scroll event to iterate through the array.

Solution 1:
Each frame adds 10 strings from the together using a for loop and outputs it in 1 text box on the UI canvas.

CPU PlayerUpdate pulled 16.5ms
Garbage collector created 36kb p/frame

solution 2:
rotated out 15 vertically stacked text boxes in the hierarchy as their texts were updated, producing the same effect as solution 1.

CPU playerupdate pulled 12.5 ms
Garbage collector created 1.4kb per frame.

A blank new project startup scene scored 7ms in the CPU section

Im not advocating that one method is a better way than the other. But the current trade off is solution 2 has 250 more UI draws than solution 1 each frame, and performed much better.

I don’t know how well 18 updating chat boxes gets to simulating a UI in a completed game, but based on those few numbers, how would they stack? Light, heavy, is 36 kb/s pretty high?Or is it all just not mature enough to say?

Are those realistic usecases in your project/game?
Doing “premature optimization” is bad enough. What you do feels like artificially creating something to prematurely optimize :wink:

Are you creating some massive text processing app? Because in 99% of the cases, your FPS will rather be capped by rendering or naturally heavy CPU stuff like AI and physics. Or of course by trivial mistakes like overuse of FindObject/Component methods etc. which the Profiler discovers quickly.

The only times such strings would become an issue perhaps is a mass multiplayer game with chat feature.

That said, 36 kb/s feel however fairly high. That should be avoidable if your usecase does not have such a mass of strings to process every second. Here a few tips:

  • Try to not create game objects or components all the time. Use pooling (aka reuse objects).
  • Whenever an UI element in a canvas is altered, the canvas is “dirtied” and will be redrawn. That means all it’s elements are redrawn! Therefore move things that change very frequently to a separate canvas to avoid redrawing too many other things that didn’t change.
  • There’s one way to avoid memory alocations of string manipulations by using the SetCharArray method of TPM and thus use your own mutable strings. However this is something I would only do for said mass multiplayer usecase.

Edit:

Hmm, what system are you running? That sounds like you barely get 150 fps in a blank project? That is very low.

GE 76 raider laptop.

It’s the school laptop, I feel like something is off with it as well. I remember benchmarking it when I first got it, I’ll run it again when I get a chance.

I think 36kb a sec should be high because I left things broken on purpose so I could see how each correction impacts performance. 18 chat boxes was mainly an attempt to pronounce problems I would never see or notice with 1 chat box.

As far as project intentions go, I plan on the game also being expressed through text and having a fully customizable UI to help manage all the cached game data, or to sort through it and express it in different forms. But I think I got a way to isolate types of data into new groups on demand which might cause spikes that equal these 18 chat boxes.

But these use cases are just speculation. Would anyone care if the controls were made?

The thing is does your text chat need to be real time e.g. every frame.

Most people view <100ms response times as near instantaneous and at 60 fps you could spread the load of text chat over multiple frames (6@60hz).

1 Like

With regard to text if you’re combining it then Stringbuilder tends to be much more memory efficient than just stringA = stringB + stringC.

The chat box doesn’t have its own update. It alters its contents only when it’s methods are called. I see no reason to worry about the frequency of its call right now. Testing methods in update is suitable for testing.

With that said, it does have a slider bar to scroll through cached messages stored in the array. Wildly swinging that slider around does cause the chat box to update its content multiple times per frame. And I did add a coroutine to slow down the rate of that slider can update chat box contents. That slider bar is probably the most realistic and intense thing the chat box will have to do.

Yes, I got a slider bar to simulate scroll bar functionality because it seemed the scroll bar wouldn’t work for my setup. I’m quite proud of it.

I’m aware of strings inefficiency and string builder. But I kinda got side tracked getting behaviors right and functionality added. Resizable chat windows is the next item. But I want the resizable window to be decoupled from the chat box window so I can reuse it for other gui displays. Learning curve is keeping progress slow.

Why? There’s no point updating the content until it’s about to be displayed.

The general model here is that input events flag that the visuals need updating, and change whatever properties are relevant. Then next time the GUI is rendered that’s when the updated properties are used to change what’s shown. This way the expensive bit is done only when it matters, rather than once per input event (or whatever).

1 Like

Would a monospaced grid of letter sprites be more efficient than TMP or Unity Text.

A single draw call could display the grid if the meshes are combined and UV offsets to a font texture would provide the content.

Updating would just be converting from array of characters to UV cords via a lookup table.

The letter index to UV might even be movable to the shader so you only pass a byte array to the GPU for changes.

No, because that’s already pretty much how they work, except that they’re more optimised. For instance, TMP’s Signed Distance Field approach gets better shapes and more effectively scales to different display resolutions while using less texture data, while the arithmetic is essentially free (as far as I’m aware).

Also, font sprite sheets aren’t typically monospaced as that makes them less efficient, for a variety of reasons. Wasted space, letters are in fact different shapes, you need to account for kerning and other spatial considerations anyway, stuff like that.

4 Likes

Few UI optimization tips for this case:

  • Make sure to utilize UI draw call batching.

TextMesh Pro loves to break batching when its placed in hierarchy wildly.
This is due to sort order messing around with it.

Try to place text into single hierarchy, this will ensure it is batching together.
Double check with Frame Debugger.
Its possible to batch ALL text into single call, if you’re creative with the layout and font material matches.

  • Less memory allocations - better. For the strings, as mentioned, use StringBuilder. FixedString from Collections package, and char manipulation also worth looking into.

  • Coroutines aren’t free. StartCoroutine produces an allocation, as well as yield instructions do. Make sure there’s no “new WaitForGarbage”. If there is - cache them. Avoid starting coroutines often. Ideally, for zero GC policy, and your future sanity - avoid coroutines completely. Use single “system” to update multiple objects.

  • Default Unity’s UI implementation of scroll view is painfully slow when using lots of items to display.
    Use something like EnhancedScroller, or some kind of asset to represent data that supports pooling.

  • Use Profiler to see what’s actually taking lots of CPU time.
    Either Deep Profile build, or use Profiler.BeginSample / Profiler.EndSample to track related code if deep profiling takes up too much resources.

Also how much text are you using in what conditions where this would be anything other than a microoptimization even if it did work?

1 Like

Video Link

I thought I would just make a quick demo so everyone can see what I did. I’m beginning to feel like communication is breaking down.

As you can see in the video, the chat box creates all the TMP texts required to fill up the chat box on the assumption that they will take up one line. Multi Line text will extend into the scroll view. Single line text will not or do it very little.
The prefab I’m making a copy of is just a child text of the viewport and set to inactive.

Those cloned tmp texts are reused throughout the life of the program. Resizing will recalculate the number of texts needed and instantiate or trim the cloned children in the contents. I realize I’m creating garbage when resizing. I might be wrong, but I don’t currently foresee the player being able to or willing to resize the chat box fast enough to generate a concerning amount of garbage. Maybe If the player chose font size 1 and created a 1000 text boxes …

The scroll bar is a slider bar. The reason why it has multiple updates is because it has to update the text in every single text box in the content window or the text will be displayed out of order. If the text box can display 10 texts, it will display from array index 0 or 9, or 25 to 34, etc. You can basically jump gaps and display 10 congruent items from any point in the array.

Normal operation will add a message to an internal array and increment the main iterator. It’s not visible hierarchy, but I update the oldest text box in the content window and move the position of the child. Only 1 text in 1 text box is updated.

The scroll wheel works the same as normal operation and will increment and decrement an offset iterator by 1 and the methods will determine if the chat box should display from the offset iterator instead of the main iterator.

I understand that there will be a trade off in the way I have done things. I put all my optimization concerns into memory management. I don’t know how heavy moving children around in the hierarchy actually is, and it might be a really bad idea. That is something I hope to track, capture and understand better with the profiler.

That is all it is.
Array of message objects
controlling and managing array iterators
mapping text objects in the hierarchy to the array using iterators.

Got the last big bug fixed for window resizing. I can start profiling again. I ran a scene with 9 chat boxes at font size 10, with 92 children in each chat box. It gets about the same framerate running 1 chat box with font size 1 and 900 children. Why, you ask? Oh, No reason.

I ran a Deep Profile for the first time, it’s really nice. I see your point about coroutines. Starting the Coroutine creates garbage, the yield return creates garbage even without a newing a waitforsomething. Coroutines in my test pulled 3.5kb GC. It’s a substantial amount of garbage, almost half of all.

All that’s left on the GC side is the way I make strings. I’ll see what the string builder does, but right now the string helps me debug so I won’t mess with it much.

I’m into looking at batching now. It’s new to me. I guess this is where I’m looking at frame rate, the profiler seems to crush it more than anything. But the main goal is to notice a difference when I change something.

The Resizable Window adds 1 batch, the Chat Box adds 9 batches Total. 7 Come from the base Chat Box with script disabled. 1 Batch is added by the script when the children are made. 1 Batch is added when the slider bar is set to active. Though, with font size 1 with 902 text prefabs, it adds two batches.

I basically get 8 batches just on an unpopulated chat display. That seems high. I bet I could knock a few off it.
Title adds 2: Both come from the text(tmp)
Output Container adds 4
Input Box adds 2: Both come from the inputfield(tmp) (I can delete the place holder I bet.)

Output Container: flexible height on the Layout element removes 3 batches. That’s interesting.
Scroll View adds 1
View Port Image and Mask each add 1: for a total of 2
Slider adds 1 batch

content with a populated prefabs adds 1, I guess will be unavoidable because it’s from the script.

It seems I can delete some things to reduce batches, but also seems I can add things to reduce batches.
I know you said to make my texts siblings if I can, I don’t see how I can do that here, but I can see why I would want to.

I got some research to do because I suspect there are some nice tricks to build ui element right.

1 Like

The thing is once you have updated and drawn a chat box like this it could just be a static image and that could be rendered in a single draw call.

So, you could use a chat box camera to capture the chat box as a renderer image and replace it with an image until it needs to update.

It would take some work but in a system with lots of chat boxes it was massively reduce the graphics calls to only updating* or selected/interacting chat boxes.

Maybe something as simple as a chat box layer screen grad could work with any chat box being interacted with or updated drawn in front of the chat box screen grab.

*And as I have mentioned previously you can spread updates over a number of frames to reduce load.

Overdraw is going to be a way bigger performance impact than almost anything text related.

I read a big unity learners article about UI.
I think understand what you’re getting at now. Updating the text and changing the sibling order the way I am will still cause all other sibling to be recalculated. I think what you were saying and what I’ll want to do is make sure I cache and store a group of messages within a time frame before I start drawing.

The aim is to regulate the number of draw calls.

I think if I add messages to a list first, (the irony), I create a method that can empty its contents into the array every 100ms in 1 frame. I inadvertently had to write a method that could populate the text boxes without moving siblings around to solve a problem that came from changing the number of children on resize. I may be able to use that method again.

During normal operation, while the newest messages are automatically being displayed, I could resort to using that method without reordering children around. I don’t think doing this will be difficult. handling the iterators was the finicky part.

I think this will be more like a real scenario where I might receive a packet of data, then I got 40 messages to dispatch to chat boxes.

What about using a single chat box string and updating it’s content using a stringbuilder. TextMeshPro can allow wrapping and scrolling of a multiline text.

I have a separate script and chat box for that solution. I was planning on developing both solutions side by side, but that one got neglected the past week.

I would still like to get it up to speed and run a profiler on it to compare it.

I rebuilt the Chat box down to 3 batches.
Test Criteria: 9 chat boxes, 90 text boxes in each window updating at a fixed rate. I had to create a non development build to test the frame rate because deep profiler was pretty heavy.

I wrote Code to regulate draw calls, updating every text box in the display every 100ms. It is kind of wasteful code because it only has time update 5 elements in the main array. looped through 90 elements to update 90 text boxes. It basically just caused cpu spikes down to 16 frames a second every 100ms for 5 messages
I tried a list first, it generated loads of garbage constantly resizing itself, I must have handled it wrong, I should have just treated it like an array using an iterator, handle the condition to control when it resizes itself. I ended up using an array in the final test.

The original solution which changed the order of the siblings 1 time per fixed update ran at 72 frams per second.
Looping the method 10 times per fixed update ran at 72 frames per second, occasionally dropping to 42.
Looping the method 90 times per fixed update, which is maximum number of texts that can fit in the window ran at 16 to 18 frames per second. The same as the other.

It could be that I haven’t created fair test conditions because this is the first time I’ve ever dove this deep into performance. But I think I’m off to a good start.

I think the correct solution if the number of messages incoming in is less than what can be displayed, I can just rotate the children.
If the number of messages are greater than what can be displayed, then I can update the storage and just print what can be displayed. But I’m not sure collecting them to release every 100ms is better, because it seems I’m just saving them up to spike a cpu usage that 1 frame like I’ve seen in the profiler.

Get the message and dispatch them as soon as possible, try to spread loads over many frames rather than saving them for 1.

Also, 800 active text objects is way more than a user could fit on the screen for a few reasons.

Originally I thought it would be a cool user control to allow the player to change font sizes. But because fonts are added to an atlas, I would fill up the atlas causing performance problems. This means I can assume 1 font size which is at least double the size I tested. I can also Hide chat boxes in tabs means the canvas doesn’t need updating for hidden chat boxes.

I might not know how many hundreds of messages I’ll be handling in the arrays, but I can safely assume a worst case scenario where the canvas rebuilds around 100 text boxes a frame.