I would like to discuss a question that has been on my mind regarding string handling and garbage collection in Unity. To provide some context, let me quote an excerpt from Unity’s documentation:
Now, here’s my question: If strings in C# seem to hold the value of a string, why is it not possible to make them behave or act as any other value type? By allowing this, we could potentially have more control over garbage collection issues.
I apologize if this question seems dumb or noobie, but I would appreciate any insights or explanations from the community. Thank you in advance for your help!
In exchange multiple string contains the same value doesn’t take that much memory only the reference size, not the string size.
Also there are FixedString types in Unity which more or less behave like value types.
C# was made so you don’t have to care about garbage. The fact that we use it for “high performance” game development is not what Microsoft made it for. So their choice of not caring about the garbage immutable strings cause was made from that point of view.
The String Builder version took doubled as much time to compute and generated almost twice as much garbage as the regular/concat string format, so I figured either I’m using it wrong or it doesn’t work for the specific use case I had in mind.
Also, I would like to ask how your suggested “FixedString types” could play a role in, for instance, the use case I showed.
I’ve also never seen issues with strings being a gc problem… I theorize it might be due to C# string interning?
Either way, make sure it’s worth making your code all ugly with StringBuilder noise… it probably isn’t gonna make a big difference and I think the API is awful.
The key words are “seems to”. They are reference types pointing to data allocated on the heap, not the stack. You don’t have to define a string, it can be returned by a method, another class, a service, etc. If you want pointers you can use C or C++ and have maximum control. Most use cases don’t require that level of control.
As a thought experiment… you’re given control over the string allocation and destruction that you don’t presently have… what changes are you going to make to your code?
From what I understand of String Builder (which admittedly is very little since I’ve never bothered using it) you’re just not performing enough operations and/or working with large enough strings.
Same, and I think part of the problem is that while the advice that working with strings directly isn’t ideal is constantly brought up what’s not brought up is how much improved garbage collection has become.
Also while they haven’t landed in Unity yet there are optimizations in C# 10 for interpolated strings.
@GuirieSanchez You are calling ToString() on stringbuilder twice, that is probably causing it to do double the work. Also try creating a new StringBuilder instead of clearing. And all the Append calls can be replaced by a string literal - stringbuilder shines when it runs in a loop that puts a string together piece by piece which your example isn‘t doing.
I just had a simplistic thought: being able to manipulate strings without having to worry about garbage allocations. I know it’s not possible, I was just curious about why.
Unity just provides some tips if your project heavily relies on strings:
I find working with strings a little frustrating, knowing that almost every operation can potentially generate garbage. It often feels like the only solution is to constantly be vigilant about minimizing string manipulations.
Consider how inconvenient it would be if value types were also allocated on the heap, requiring us to be cautious about manipulating and creating variables at all times.
I just really don’t understand this mindset… I mean I appreciate that you feel this way but there is simply no concrete evidence you should even spend any brain cycles on these concerns at all, in at least 99.99% of games.
I’ll give you a really egregious example: my KurtMaster2D game contains MS-DOS and native mode C games running under Unity. I didn’t want 57,000 different entrypoints so I made one and Unity communicates to the native code back and forth… entirely with strings passed in and out one of four functions:
[DllImport ("__Internal")]
private static extern System.IntPtr dispatcher1_entrypoint1( int opcode1, int arg1);
[DllImport ("__Internal")]
private static extern System.IntPtr dispatcher1_entrypoint2( int opcode2, string arg2);
[DllImport ("__Internal")]
private static extern System.IntPtr dispatcher1_getkpworkbuf();
[DllImport ("__Internal")]
private static extern void dispatcher1_rendercolor( System.IntPtr pixels, int span, int flags);
Yes, every frame there are about 8 to 10 transactions performed: update input bits, update mouse touch, select game, gain focus, pump one frame, play sound, emulate particular system instruction, render one frame, etc., and every one of those things sends back an itemized string that I parse in C#. The string might look like:
Every frame I chop up hundreds of those strings and make business decisions in Unity as to what to do next, then I have it all blast over the graphics, essentially one big string, which I cram into a Texture2D on Unity and present it.
EVERY FRAME. 30fps to 60fps… every frame makes that much string noise.
The important point is that the size of value types is fixed and known at compile time. If you allocate an array of structs or if the runtime allocates a stack frame, it can calculate exactly how much memory is needed ahead of time.
Strings, however, are variable size. If you have a string variable or an array of strings, .Net cannot know ahead of time how much memory will be required. Therefore, each string needs to be allocated individually on the heap and GC-managed.
@ mentioned FixedString, using a fixed amount of memory for a string. In which case it can be used as a value type, with the downside that you might waste memory for short strings or have a string that is too long to fit in the memory you’ve allocated.
There’s also string interning, where some strings are allocated in a special table and not garbage collected, complicating things further.
@Kurt-Dekker I get you. At first, I got a bit concerned when I saw around +14 KB of garbage being generated in just a single “DateTime.TryParse” operation and some +2KB of garbage generated in very frequent string concat operations that I do in almost every frame. Although to be honest, my concerns are mainly driven by curiosity and love for optimization (and also because I’m a bit of a performance freak) :). It is more about my personal enjoyment rather than thinking that any modern device would actually suffer from these concerns.
I see, that makes sense.
So, to sum up, no matter how big, say, an int you’re operating with is, it will fit within the int32, which .NET knows in advance and allocates a fixed amount of memory to store it on the stack (unless it’s too large, for which you’d use long or any other suitable alternative).
On the other hand, since a string length range can vary from a single word or letter to a full text, the only reasonable choice is to determine the memory required to store it after creating it and then allocate space on the heap accordingly.
Just a quick follow-up question: when we modify an existing string, this sort of dynamic allocation of memory always goes on the heap. Would it be inviable to allocate it on the stack?
Nothing on the stack survives the return from the function containing it. The return statement literally frees / destroys it simply by stepping the Stack Pointer over all the data. As per ABI convention (eg, how the CPU works!), everything below / beyond the stack pointer is unallocated free-to-be-used RAM, so nothing could ever survive in it.
If you do return and references are outstanding to a stack-based variable, AFAIK the data is copied off the stack beforehand and boxed up and put on the heap instead, and that happens at return time.
Others may have made similar points while I was typing this up, but I’m just going to add my bit to the conversation.
Why do strings create garbage?
Manipulating strings allocates new memory for later garbage collection because strings are immutable reference types. The string of characters (which are value types) in memory that the string object points to cannot be manipulated without allocating new memory for a new string of characters. You can think of a string as essentially a char[ ]. The new string of characters probably has a different length than the previous string. You wouldn’t want your strings to take up more space in memory than the actual characters that make up the text. If you kept the same length and just changed which characters are in the string, theoretically you wouldn’t need to allocate new memory, just change the value of the characters. However, that only applies to that one, less common case, and they wanted string objects to feel like value types when they were designed. So, one standard behavior for every case is to make them immutable reference types.
On a quick side note, delegate objects, and hence events, are also immutable reference types. Every time you += or -= an event a brand new object is created behind the scenes just like with string manipulation. The MulticastDelegate class handles equality comparison even though they are technically different objects. It’s a way of making things convenient for the user, but yes it comes with the new memory allocation caveat.
I’m not sure if this was directly relevant in their design decisions, but immutable reference types may be relevant in multithreaded situations. Remember that strings are not unique to Unity. Swapping a reference is an atomic operation. There could still be race conditions where an unexpected thread has the final assignment of the value of a particular string, but you’re not going to end up with a garbled string object where two threads were fighting over which particular characters need to be in it. If you had a StringBuilder that two threads were trying to manipulate (which you shouldn’t ever do), then you could end up with a very mixed up string of characters, or just throwing exceptions.
The amount of memory they need to occupy is not known at compile time, unless the strings are interned string literals, in which case they don’t contribute to garbage memory allocation anyway. One thing to keep in mind here is that a lot of your value types are also stored on the heap. Only certain things like local variables that get declared within the scope of a method live on the stack. The stack is only 1MB or 4MB in size, depending on the CPU. Most of your other value types at run time are probably members of a reference type that is on the heap at run time. They are already trying to make strings resemble a value type. If you wanted to allocate a string of characters locally on the stack you can actually do that like so:
using System;
public class Foo
{
public void Bar()
{
// baz is a value type and it lives on the stack. I hope I don't want to change the length.
int length = 12;
Span<char> baz = stackalloc char[length];
// But now I have to do this:
baz[0] = 'H';
baz[1] = 'e';
baz[2] = 'l';
baz[3] = 'l';
baz[4] = 'o';
baz[5] = ' ';
baz[6] = 'w';
baz[7] = 'o';
baz[8] = 'r';
baz[9] = 'l';
baz[10] = 'd';
baz[11] = '!';
// And many APIs don't accept Span<char>. Now I have to do this to use it with the method I want.
string message = new string(baz);
}
}
I have a few thoughts about your benchmark. Firstly, the profiler, or any other way of measuring how long a bit of code takes is going to have a maximum resolution of 100ns. So, if that one bit of code is taking less than 100ns, you’re not going to get accurate results. It’s better to test in Benchmark.NET, or if you need to compare assigning to a text field of a Unity component, as you are doing here, then you should make sure each test is as minimal and comparable as possible. Only do one thing at a time if you can, and run each test in a big loop that will guarantee the total time is well over 100ns. Then divide the total by the number of iterations to get the average time, and do that over and over and over again to get a more accurate result.
Also, I know you were trying to do a comparison to the two string interpolations in the other test, which is why you call .ToString() twice, but calling .ToString() will definitely allocate a brand new string object every time it is called. I’m not sure if there should be a big difference though between the two test at first glance. String interpolation and StringBuilder in that scenario aren’t doing much that is different. Breaking down the specific memory allocations is another issue that I feel like I would have to look at myself to see exactly where it is coming from. That being said, be aware that StringBuilder is maintaining it’s own array or multiple arrays of characters behind the scenes. That’s in addition to the new strings that are created when you call .ToString(). Also, depending on which version of Unity and .NET version you are using, StringBuilder could also be allocating a brand new string when you append value types. However, you are appending string variables example, I just wanted to point out that older versions of StringBuilder allocate just from appending value types. The .NET Standard 2.1 version should not, but I haven’t benchmarked the speed of the new append methods for comparison.
As far as the StringBuilder API is concerned. I don’t intend to argue in favor of the API or anything, but I do want to point out that it is designed to enable a fluent interface. So your code could be changed to this:
Allocating a new StringBuilder would also contribute to garbage memory allocation. It’s essentially like a List and you wouldn’t want to be creating new instances all the time because it would somewhat defeat the purpose. The separate Append calls in the example cannot be reduced because the name values, while they are strings, are variable and not known what they will be at compile time.
Oh, but it is possible, you just have to use alternate means. Here is an open source pluging for working with strings without allocating memory that should be compatible even with assigning text to text components without ever allocating any new memory along the way: https://github.com/Cysharp/ZString
I know exactly how you feel. However, this is where I’m actually going to agree with others that if you haven’t profiled and found that garbage collection is actually having a definite and noticeable impact on performance, then you probably don’t need to worry about it. Different applications use different garbage collectors, and determining if the time you save in garbage collection is actually greater than the time you working around allocating memory is VERY complex. The people who work on garbage collection have spent a lot of time to make sure it’s good at what it does. You need to create really good test cases that run for a long time, not just little unit tests because you are comparing an asynchronous scenario to a synchronous blocking scenario. It’s apples and oranges. You have to compare the overall performance of the entire test scenario, profiling the actual garbage collection with PerfView, and see if one overall scenario is faster than the other. It’s not simple at all to know if you’re even saving anything, or actually making things worse.
I like optimization as well.
But I’d like to think of it differently. If you can remove many bolts from a car, it becomes lighter. Being lighter means less gas consumption. If the total weight loss means you have to spend 5% less gas on the car without consequences then yeah sure. If it means that the integrity of the car goes to shit then don’t. It’s not worth it.
The string optimization depends on how often you run the code, how large the strings are and how many appends you do. If you’re running it every frame with large amounts of strings then sure the string builder reduces some garbage.
If it is only every now and then. Then don’t bother and use string interpolation. The garbage collector will take care of it.
If you really want to spend time optimizing then focus on the “hot paths”.
Heavy calculations that run every frame spending multiple ms.
Or you know, look at the profiler and see what is taking most of the time in a build. (Editor profiling is not accurate)
Here’s a little experiment setup to test the garbage generation.
I’ve profiled it in a development build (IL2CPP).
I don’t know what the ex / mod were so I replaced them with GameObjects.
Used Script:
using System.Collections;
using System.Text;
using TMPro;
using UnityEngine;
using UnityEngine.Profiling;
public class GarbageTest : MonoBehaviour
{
[SerializeField]
private TextMeshProUGUI InterpolatedPanel;
[SerializeField]
private TextMeshProUGUI StringBuilderPanel;
public GameObject Ex;
public GameObject Mod;
private readonly StringBuilder stringBuilder = new();
private IEnumerator Start()
{
// Yield .5 second before starting the test
yield return new WaitForSeconds(.5f);
// Perform the test 10 times with 2 frames in between
for (var i = 0; i < 10; i++)
{
Test();
yield return null;
yield return null;
}
OtherTest();
yield return null;
Application.Quit();
}
private void OtherTest()
{
stringBuilder.Clear();
Profiler.BeginSample("Sample StringBuilder append");
stringBuilder.Append("<size=65%><color=#FFFFFF>").Append(Ex.name).Append("
(").Append(Mod.name).Append(")</color></size>
<color=#FFE081>Sets</color>");
Profiler.EndSample();
}
private void Test()
{
Profiler.BeginSample("Sample Interpolation");
var interpolateString = $"<size=65%><color=#FFFFFF>{Ex.name}
({Mod.name})</color></size>
<color=#FFE081>Sets</color>";
Profiler.EndSample();
Profiler.BeginSample("Set PanelText Interpolated string");
InterpolatedPanel.text = interpolateString;
Profiler.EndSample();
Profiler.BeginSample("Sample StringBuilder");
stringBuilder.Clear();
stringBuilder.AppendFormat("<size=65%><color=#FFFFFF>{0}
({1})</color></size>
<color=#FFE081>Sets</color>", Ex.name, Mod.name);
Profiler.EndSample();
Profiler.BeginSample("Sample StringBuilder.ToString()");
var stringBuilderString = stringBuilder.ToString();
Profiler.EndSample();
Profiler.BeginSample("Set PanelText StringBuilder");
StringBuilderPanel.text = stringBuilderString;
Profiler.EndSample();
}
}
First time AppendFormat allocated 0.5 KB.
Second time it allocated only 334 B.
Third time and the times after that, it allocated only 82 B
That together with the 210 B from the .ToString() results into 292 B as apposed to string interpolation which is 364 B.
Using the StringBuilder does help reduce the garbage generated by string concatenation.
But only after a few times of running.
I only glimpsed over the other comments so far so maybe someone has already answered it but the general answer has to do with how stacks and heaps work. If you fully understand what those two things are and how they work then you pretty much have everything you need to know to understand why strings are the way they are. Strings are objects that essentially wrap a variable sized amount of memory. Because of this they needed to be designed in the most general way possible where many different, potentially unforeseen, use cases could occur making them heap-based objects was the only sensible way to go.
As a person that learned to program in C using books written in the 80s (even though it was the early 2000s at the time lol) and still programs with a ‘C accent’ to this day, I can say that I will gladly give up that small bit of control and potential performance for the sake of not having to deal with character arrays and null-terminating bytes It’s true they can be a bit of a headache in very large-scale industrial-sized applications where you’re dealing with hundreds-of-thousands if not millions of requests that all require tons of strings but thankfully this just doesn’t come up much in video games. Usually you only need to update a couple of strings once a frame or so in most cases and the rest of the time internal data can actually just be indexed ints or hashed values.
As for your test with StringBuilder, you are doing a LOT of string allocating in that test which is essentially defeating the purpose of using StringBuilder in the first place.
You only really need to worry about string garbage if you’re doing thousands of string operations every frame, or operating on a large amount of strings at once.
Also, literal strings are different: they don’t become garbage since they are part of the assembly. So when you write this:
public string Speak()
{
return "hello world!";
}
This will not allocate a new string on every call. Instead, it always returns a reference to the same string object. This is the main upside from C#'s immutable strings: since that string can’t be modified, it can be reused and passed around by reference safely.
This, however, will create allocate a new dynamic string every time:
public string SpeakTheNumber(int value)
{
return value.ToString();
}
Converting something to a string always requires the creation of a new string. Getting the name or tag of any Unity object also allocates a fresh new string every time you access those properties.
About StringBuilder and how it works: it uses a resizable List of characters and each .append() call just adds the characters from the input strings to the list. Calling .ToString() builds a new string that contains all the characters from the list. The list will create garbage whenever it needs to be resized, but if you reuse the same string builder it will recycle the largest reached capacity.
Well about that string immutability… it’s kind of a social construct.
I mean, ok, a language design ideal, but it’s not like you can’t mutate the content of a C# string, or even it’s length.
It’s good to keep in mind the ideas of why it is designed as immutable, but if you do, and have a concrete situation where these considerations don’t apply, and when in a pinch and where this is necessary for performance critical improvements, I’ve gotten good results by breaking with that concept. Especially because StringBuilder doesn’t actually help to avoid that many allocations when you do something other than concatenation of strings and chars as all number conversion and all formatting still needlessly allocates garbage.
string GetNonInternedNonLiteralString(char functionParamToForceNonLiteral ='a')
{
var s = $"My mutable string{functionParamToForceNonLiteral} ";
Debug.Assert(!s.IsInterned, "If you modify interned strings, you're gonna have a bad time");
unsafe
{
fixed (char* c = s)
{
// Feel free to modify those chars here
c[2] ='\t';
// Trim those trailing white spaces by setting the length
// The int value of a strings length is directly Infront of it's char array.
var newLength = s.IndexOf(' ')+1;
*(((int*)c) -1) = newLength;
// Set a trailing null terminator for APIs that ignore the length
c[newLength] ='\0';
}
}
Debug.Log($"string length is {s.Length}"); // it is 11
return s;
}
So if you know that no other thread is doing anything with that string, that it isn’t a literal and you checked it is not interned, you can mutate it around as you like. You can even set your uGUI/TextMeshPro labels to use it. You might just need to force them to realize that it is not the same string that they already build a mesh for (afaik they do some reference equals checks to avoid rebuilding the same text and, for some reason [\sarkasm], don’t expect string contents to change).
But say you have a timer or an FPS counter: just allocate a new string('0', 10) and set those digits yourself. (I know there’s char array APIs on most places these days, buuut sometimes there isn’t).
Or maybe the OS gave you a file path and you know ToLower wouldn’t do anything weird with that charset like making it longer and you want to do it yourself char per char, or you just want to shave of the file ending, why waste a perfectly good, freshly minted and guaranteed not to be interned or literal string?
And if you really need this, you can write an API that treats strings as mutable and does some nice number formatting so you can use this optimization all over the place. It’s pretty reasonable everywhere were you don’t really care about weird Locale or Unicode shenanigans.