Which is more performant in Unity C#: array, class, or struct for storing two Color values in a dictionary that is readonly and persists through the app’s lifecycle?
I’m working on a Unity project where I need to store a readonly dictionary with string keys, and the values will contain two Color values. The dictionary will remain present throughout the entire app lifecycle and will never be set to null. I’m trying to decide which approach would be better in terms of performance:
Using an array with 2 Color elements.
Creating a class with two Color fields.
Creating a struct with two Color fields.
I’m particularly concerned about memory usage, garbage collection overhead, and overall performance since the dictionary will be accessed frequently. Any advice on which would be the most optimal approach, or if there’s a better alternative I should consider?
There is not a generic answer that can cover all the different use cases, implement all and benchmark for your specific code in your target hardware with a size of input comparable to the size you would expect in your final project.
these will be Color struct, these will be set mostly one time whenever user enter a screen.
as this will be reusable code, so these could be few like few dozens or few hundreds depending on project. So i want to use something that works everywhere.
As has been pointed out there isn’t really enough info to form an opinion but… it sounds a bit odd unless your app is about generating a light show or something.
If these are declared at startup is there any reason to use a dictionary and look colors up by a key? If you define element 3 as having the “ball color” could you just reference array element 3 (for instance) via a const value?
I will always sacrifice a few milliseconds at runtime for readability and maintainability so your mileage may vary.
So your concern is memory usage, garbage collection overhead, and performance… and your question isn’t about the dictionary or string, but about array vs class vs struct.
I would argue you’re looking at this through the wrong lens.
With that said… I’ll go into it with you, then I’ll have a follow up about what I’d really be thinking about in regards to this.
# Class & Struct
In C# classes and structs differ in one major way, and that is how they’re treated in regards to memory management. You may hear people say “structs are on the stack and classes are on the heap”… but that’s not 100% correct. It’s just that structs can and will be allocated on the stack if its available to them.
Really what it is, is that a class is allocated on the heap. Any variable that references the object will always point at its location on the heap (variables in C# are a little more than that since it’s managed memory, but for ease of understanding, it’s a pointer to where it is in the heap). The fact 2 variables can point to the same object is why we call it a “reference” type. The variable that points at it really is only the necessary information to look up the reference (you can think of this as an integer that represents an address). So for any given reference there exists 2 bits of memory… the object itself on the heap, and the reference to it where the variable is defined.
A struct on the other hand is stored by value in place. The value is stored “in place”. So if you have a variable in a function, the stack frame allocated for the function has enough space in it allocated for each of its variables, and if those variables are struct types, the size of the space for that variable is the size of the struct. Where as if the variable is a class level field (a variable defined as a member of the class), then when an instance of the class is allocated on the heap there is enough space for each of its variables, so the struct is technically on the heap as a member of the class instance object.
# Arrays
Arrays are just contiguous blocks of memory allocated in the heap (spans can be used to allocate contiguous blocks of memory “in place” like a struct). Each slot of the array is the size necessary to store what is in the array. If its a struct that is 8 bytes in size, then each slot is 8 bytes in size (note, there can be some fuzziness related to how memory is partitioned, we’ll get into that later). The index really is just effectively saying “the value is the size bytes at the position address of the HEAD + index * size”.
# Memory Usage
So for the most part a class and a struct are laid out internally rather similarly. It’s just a contiguous block of memory split up into parts big enough to fit each variable. If the variable is a reference, the size needed for it is the size of a reference pointer. If it’s a struct, the size of that variable is the size of the struct.
So for example a Vector3 is just 3 float values. This means that it is laid out as 12 contiguous bytes, the first 4 bytes being x, the next 4 y, and the next 4 z.
Note that the variables may “pack” weirdly depending on the size of each one. This is called “padding” where say you have a 1 byte variable and a 4 byte variable that packs into 8 bytes due to padding rather than the 5 you’d expect. This is what I was referencing before about how memory may be partitioned.
So in the end a class and a struct will generally end up using really similar amounts of space. Though technically a class will take up slight more info since 1) you need to have a reference to it which also takes up space and 2) class instances often have a little bit of header info to them.
The big difference is that the class being a ref type is in managed memory and therefore will need to be cleaned out via GC.
An array… well it’s a bit like a class in that they’re stored on the heap so have all the costs that come with that. And its layout is just a contiguous block of memory that is indexed. So effectively an array of length 2 and a class with 2 variables of the same type in the end… are effectively about the same cost memory wise.
So in the end… memory usage is MOSTLY the same between class, struct, array. The minor differences are so minor that it’s not what I’d base my decision on for the types themselves…
# Memory Usage in the Dictionary
Now here is actually where the memory usage could technically be a concern. A dictionary is technically just big ol arrays of keys and values and a hashing algorithm that links keys to indices in the values.
But here in lies the thing… classes and arrays exist on the heap and are referenced. This means the arrays in the dictionary aren’t storing the values, they’re storing references to the values.
So lets just say that the reference is 4 bytes and your values are 8 bytes. An array of length 100 if it were a struct is basically 800 bytes (give or take some header info). But an array of length 100 if it is a class or an array[2] is 800 bytes for the values scattered across the heap, and 400 bytes for the array that points at that scattering of values totaling 1200 bytes (give or take header info).
Not only are you storing more info, it’s also scattered. You don’t know where those 100 instances of your class are in the heap… so accessing members of the array[100] could result in cache misses on the CPU.
(note - in a dictionary the array may actually be larger than the count of entries. It has to do with how hashing algorithms so there is a memory hit there as well. Also there is an array for the keys as well which also takes up space and follows this same logic.
## DOES ANY OF THIS MEMORY MATTER
Does this matter though?
Those 100 entries came to a total of ~1KB of data. Even accounting for the overheads of the dictionary we’re talking ~2KB? (not accounting for strings, string memory management is wild and is a topic all its own called string interning). Even if you had 100,000 entries we’re talking 800K vs 1200K of ram. We’re barely at a megabyte!
How big is this dictionary going to be that the GIGS worth of work memory is a problem on your system?
# Garbage Collection VS Copying
This answer is obvious… classes and arrays need to be cleaned up/GC’d, structs don’t… if you want to avoid GC the struct is a no brainer. But if you’re not going to be throwing out these values often and creating new ones… this doesn’t really matter. The dictionary will need to be cleaned up in the end and if you clean up the values inside at the same time the GC is happening regardless. Arguably its faster to flag an entire contiguous block of memory for an array as unused than it is to flag N individual instances… but the GC is going to happen either way.
Though keep in mind… updating a dictionary of structs and updating a dictionary of classes/arrays is different. A struct is by value so a COPY of the value is returned when you read from the dict, if you update it, you have to set it back to the dict. But a class/array is a reference… the dict’s value collection and the returned value point to the same object on the heap… so you can edit its members in place and not have to set it back.
//struct
var value = dict[key];
value.x += 1;
dict[key] = value;
//class
var obj = dict[key];
obj.x += 1;
//class more susinctly
dict[key].x += 1;
And there is a thing about structs. You’re copying every time you move them around. For large structs this can actually become a performance issue… but from the sounds of it your situation isn’t that large of a struct.
…
# Where are my actual concerns
Why is it a dictionary? Why are the keys strings? How big is this? What is this all for?
Cause I’ll say this… and this isn’t to knock dictionary’s… I use them.
But I’ve been programming a fairly long time at this point. Professionally for 17… 18? dear god 20 years is coming soon. And I’ve been programming for funsies since the early to mid 90s. I have seen a LOT of stuff… hell I have DONE a lot of it.
And overuse of Dictionary (hashmap or equivalent) is a thing I see a lot of. Hell I’m working on a project right now where another dev on the team implemented logic with a lot of dictionaries that could have just been variables for raisins. They’re comfortable with dictionaries or something.
But why?
Ask yourself what is this dictionary doing for you?
Because you want to talk about performance??? It’s not if your 2 color values are in a struct vs a class. It’s the fact you’re looking them up in a hash table via a string as a key. That’s NOT a free action. The string needs to be hashed (which is not trivial), then the index is resolved, then collisions are resolved, and then the value is returned. (I actually have a thread about this from last night)
This is the thing I ask myself/team-mates when I see myself writing or a team-mate writing:
Dictionary<string, T...> dict...
OK… where is the string coming from? Why is it here? Sure now we can look up the value of T…, but how are we looking up the string?
Who knows… maybe the source of the string is reasonable. Maybe you’re writing a json messaging system where your data is coming to you in string form anyways and you need to quickly map some string id to some value really quick.
But is that the situation going on?
Just think of my team-mate who implemented their logic as a Dictionary<string, T…> and then was doing this:
var value = stats["Strength"];
??? Why not just have done?
var value = stats.Strength;
I’ll tell you what that second one is MUCH faster since it’s a direct memory access rather than resolving some hash algorithm. And it also is less error prone!!!