Looking for performance tips on updating text every frame

I am looking for performance tipps on updating a TMPPro UGUI Textfield every frame. Right now it eats about 1 ms from my precious 16 ms budget. It’s a time label for a racing game, that’s why it is updated every frame.

Here is what I have tried so far:

  1. no auto size, no rich text etc.
  2. added a canvas to avoid unnecessary batching

Ideas:

  • Use another TMPro shader / Material (maybe the pixel texture shader with just the digits in it)
  • As a last resort I was thinking about updating it only every 2nd or 3rd frame, but that’s the final aproach which I may end up doing anyways.

I have compiled all the infos + profiler output in a screenhot for reference (attached).

Using Unity 2019.4.1f1, TMPro 2.0.1., Target: Android 10

Thanks to anyone looking at this :slight_smile:

It is important to understand the Deep Profiling adds significant performance overhead where this reported 1ms in a release build will likely be 0.1ms or less.

Deep Profiling is great to identify the relative performance overhead of functions / systems to help identify the areas where optimization would potentially yield the most benefits.

In terms of potential optimizations, how are you changing and setting this text every frame? Are you using the .text property? Are you doing formatting to these values?

Thank you for your reply.

I generate the string using a StringBuilder with a fixed capacity to reduce memory allocations. Then I assign it to the “.text” property of the “TMPro.TextMeshProUGUI” textfield.

I know profiling generates some overhead but the text rendering on this particular TextField stuck out in comparison to other things. So it was more about the relative portion of a frames budget it took than the absolute MS. I assume these ratios stay roughly equal, even in release builds. Maybe I am wrong there (update: my assumption was probably wrong, should have turned deep profiling off, just as you suggested).

If it’s 0.1 ms in release, then I’ll not bother digging any further. I have now resolved to only update the label every 5 frames, which brings it down enough to a level which I no longer worry about , even in profiling builds.

I guess my real question is: Can the “text mesh” generation be sped up by using another shader (thinking of bitmap)? And if yes, is it worth it in your experience?

The processing of the text (parsing and layout) is CPU related and identical regardless of whether you are using an SDF or Bitmap font asset.

Using Signed Distance Field or Bitmap affects rendering performance and GPU where between rendering plain white text using an SDF Shader or Bitmap shader the performance is virtually the same. As you add Outline, Underlays, Glow, etc. the shader has to do more work per pixel but again the performance differences are marginal.

In terms of performance, take a look at the Benchmark (Floating Text) example scene. This scene reflects the performance of both the text processing and rendering.

By default 250 static text object and 250 dynamic text objects are instantiated (total 500) where these 250 dynamic text objects are updated each frame. This scene includes a frame counter to display the frame rate.

You can change the # of text objects in the test scene to check the performance differences between Development Build, Development Build + Auto Connect Profiler, Development Build + Auto Connect Profiler + Script debugging and Release build. You observe performance differences between all of these where Development + Profiler (Recording) and Script debugging is the slowest.

For instance, on my Galaxy Tab A (SM-T510) using this test scene with 1000 static text objects + 1000 text objects being updated each frame, I get the following results. Note that the rendering / GPU impact on the fps is the same between these. We are only comparing CPU performance overhead in the text update process.

Development Build + Profiler (Recording) + Script debugging ~ 7fps

On that frame 745 text objects were updated which took 95.51 / 745 or 0.12ms per text object. Disabling Record on the profiler increased the fps to 16.7 fps.

Development Build + Profiler (Recording) ~14fps


On that frame 583 text objects were updated which took 41.67 / 583 or 0.07ms per text object. Disabling Record on the profiler increased the fps to 28 fps.

Development Build ~35fps
There are no images here to post since we are not profiling. Again note that a big chunk of the fps is impact by the number of objects were are displaying which is a constant between all 3 build modes tested here.

We could measure the performance overhead of the text processing by using the C# Stopwatch Class where we would measure the elapsed times but we would likely confirm the processing of individual text objects being less than 0.1ms or better.

In terms of the profiler, here is an example with just 1 static text object and 1 dynamic where I compare Development + Profiling (Recording) vs. same with Script Debugging. Notice the reported times are ~2x slower with script debugging.

With Script Debugging

Without Script Debugging

As per the above, we can see the performance overhead resulting from profiling and whether we are profiling with script debugging enabled or not adds additional overhead. The number of objects being tracked likely impacts these results. For instance, in the first example, we had an average of 0.12ms per text object whereas tracking just one was 0.19ms.

The above tests were done using Mono as a backend. Using IL2CPP in release build produced 37.8fps.

Like I said before, the profiler is great to get relative performance but those results are not indicative of the actual performance. I typically use both the profiler and the StopWatch to test specific functions and stuff but also test on the actual target platform / device to check the fps (where applicable) and overall responsiveness / performance while using the app / whatever I am testing.

Note: The # of characters contained in a text object and rich text tags does impact the parsing and layout. The differences between are marginal between character count but would be more significant between 12 character, 100, 1000 or more.

Updating a few text objects every single frame, even on slow devices should not be an issue. Hopefully the above information proves useful.

4 Likes

As usual you have gone above and beyond with explaining and helping, thank you :slight_smile:

I have been using TMPro since Unity 4/5 days, that’s why it struck me to see a TMPro related method to show up in the profiler like that. Usually it just has stellar performance out of the box.

Don’t really know why I thought that switching shaders would speed up mesh generation, really no connection there, thanks for the reminder.

I am still a little confused that the relations between how long each part takes are seemingly inconsistent between “deep profiling”, “profiling”, “release” etc. (at least in my test case). I expected some deviations but not something that would make deep profiling results missleading (showing TMPro being slow when in fact it is not). Maybe I see it all wrong but I can’t really explain how one textfield can generate 1 ms for me and in your tests it’s more like 0.12 ms. Of course this will come down to how powerful the hardware is (comparing your absolute values to mine) but still, that’s like a 10x factor. I don’t think your Galaxy Tab A (SM-T510) is 10 times as powerful as my Nokia 7 Plus (TA-1046). Something feels off for me here.

I’ll put this down to me not being very well acquainted with the “deep profile” feature. I shoud have tested more with deep profiling off, just as you suggested.

Try the example scene I referenced which is included in the TMP Examples & Extras.

This will enable you to compare the results for those same build options for your device.

Yes, I have done so now.

I get 35fps with mono and 38fps with IL2CPP. When deep profiling it almost freezes (expected with 1000 textfields). Without deep profile it runs at ~29 fps. I have attached a screenshots of the profiles. In Deep Profiling it shows 95% is coming from TMPro (680 out of 700 ms), without deep profiling it’s more hidden. All within expectations I would say.

I have then isolated my textField setup from the game and also profiled it to check if I see the ~1 MS in deep profile again and how it shows up in non deep profiling.

Deep Profile (1000 TF test):


Shallow Profile (1000 TF test):

Custom Setup Test Scenario:


Custom Test Deep Profile:

Custom Test Shallow Profile:

So in total this confirms that a method showing up as 1 ms in deep profile can be ignored (what a suprise ^^).

Still, it does not tell me why this is reported as such a big part (1 ms) of one frame (17 ms) in my game when doing deep profiling. I assume deep profiling just has some more overhead on whatever TMPro is using than on other parts. Imho this implies that the relations are totally off and therefore missleading during deep profiling. But that’s an issue which is definitely not TMPro related and the isolated test case has proven that too.

Thank you again. I presume this means “case closed”.

Update: on second thought, it may be that during deep profiling the VSync part just shrinks down (nothing to profile there) and all the interesting parts (script etc.) will eat up that time. This could explain what’s happening in my case. Anyhow, it’s more a profiler thing than anything else.

The best optimisation would be to update your text at a slowest frequency (20fps for example), while giving an identical impression.

Just Gold, they call me optimizer freak, bec I care about micro details in code and what behind scene, and thats insight is awesome.

I hope this benchamrk be part of unity documentation , bec I believe it will be super useful for everyone and easier to find than this thread.

Thank you @Stephan_B