Hi
Generally does it matter for the performance if I use
Array of Structs
OR
Struct of Arrays
?
/Thomas
Hi
Generally does it matter for the performance if I use
Array of Structs
OR
Struct of Arrays
?
/Thomas
I wouldn’t say either are directly interchangeable.
I’ll say this though… Arrays are ref types, and Structs are value types. It is usually suggested that a value type should NOT contain ref types as members (string being an exception since strings are immutable).
I’ll also say a struct of arrays means that each array is yet another object on the heap. This technically will make the garbage collection of those arrays, when the time comes, more costly. If you’ll notice it… well, that’s a completely other thing.
What are you attempting to do? Because from the sounds of this approach, you’re most likely attempting a very naive design that could probably benefit from properly structured code rather than debating the efficiency differences between those two options.
Well I come from writing C in small embedded systems, and there most of the time it’s more efficient to use struct of arrays or simply arrays (they get compiled to the same code) than an array of structs.
This code I’m writing is handling a lot of objects, called very frequently. And will very seldom get released.
But I agree that the code gets cleaner with array of structs, but the number of reference levels increases ( tArrat.structMember vs. structMember_) . At the same time copying an entry ( tArray = tArray[j] ) gets cleaner and simpler than keeping track of all the individual parts, and also easier to add a new member of the struct.
OK. You are probably right.
I’ll start out with a nicely written code, and if I find the need to speed it up I’ll try to see if it makes any difference.
Thanks!
/Thomas_
This topic is kind of interesting to me. Hope you don’t mind if I pick your brains a bit.
I’ve never heard this before:
Could you elaborate on why? Or just point me to where that suggestion is made?
I thought Arrays were always on the heap. If you need the array, then you need to deal with the heap allocation. Am I missing something?
Agree 100% but it is still an interesting thought exercise.
I would suspect that the efficiency of array struct vs struct array is largely an implementation detail. Which hardware you are using and what you are doing with the data, etc…
That’s basically the sum total of every piece of advice I’ve read on this forum ![]()
Sure.
First a link to the suggested practices for structs:
Note the point about it being “immutable”.
So, to expand on why it’s suggested to be immutable.
Because structs are value types, and classes are reference types. You get an issue when you have a struct of classes.
var objA = new MyClass();
var objB = objA; //objB references the same object as objA
objA.Value = 5;
Debug.Log(objB.Value); //will show 5, because objA and objB are the same object
//... elsewhere
var a = new MyStruct();
var b = a; //set b to the same value as a
a.Value = 5;
Debug.Log(b.Value); //won't show 5, because a and b are different values
In the case of a ref type, modifying the value will cause the value to change on all references to that object.
In the case of the value type, modifying the value on one only updates the value on that one.
This idea that the struct’s value is unique to that instance is expected behaviour of a struct, because it’s a value type.
So… what if you have the following scenario:
public struct MyStruct
{
int Value;
MyClass obj;
}
public class MyClass
{
int AnotherValue;
}
var a = new MyStruct() { obj = new MyClass() };
var b = a;
a.Value = 5;
a.obj.AnotherValue= 10;
Debug.Log(b.Value); //won't be 5
Debug.Log(b.obj.AnotherValue); //will be 10
The state of b is half tied to a. Value is unique from value to value, but AnotherValue is not.
Now, of course, this isn’t to say you can’t do it. You most certainly can. It’s just this mutability of the value comes with implications that aren’t so easily noticed at face value. You NEED to know the intricacies of the struct before using it.
Which is fine.
I have places in my code that I do so as to keep down the need for allocating on the heap. But it’s usually in the scope of a private struct inside of a class that never gets used outside of that scope. It’s encapsulated into the class in question. This way only the code that is using it needs to know this about the struct.
Yes, arrays are always on the heap.
My statement, “a struct of arrays means that each array is yet another object on the heap”, is in line with that.
What I’m saying is that every array in the struct will be a new entry on the heap. Relative to OP’s 2 scenarios, one called for only one array of many structs. And the other was a struct of many arrays. This means a scenario of 1 object on the heap, or a scenario of many objects on the heap.
This is a good example. The mixing of behavior would not be obvious to a new programmer which is reason enough for me.
I’ve done this before too, though without thinking about why it was a good practice. Another good point.
Okay, now I get this better. Another question though: Which is more expensive to clean up or are they equal? My guess is many small arrays would be more expensive than a single large array though I wonder by how much.
This is one that I’m not 100% sure on the specifics about. Primarily because it depends on the garbage collector, the implementation of which can be very different. For example the old version of Mono that Unity runs on has a very different garbage collection implementation than newer versions. The newer versions are A LOT more efficient at cleaning up large numbers of small objects… it’s why a lot of programmers familiar with Mono really wish Xamarin and Unity would hammer out their issues and get Unity updated (probably not going to happen either… from what I understand Unity is trying to go a completely different route instead…).
So stripping away the gc implementations. What we can gather from the information on hand.
1 array can be deallocated very quickly obviously, it’s just a single continuous block of memory. Just declare it unused. BUT, a large array is going to be a large continuous block of memory. When creating that array it’s going to be difficult to find a continuous block of memory that can be used. Furthermore, while it exists, it’s a large chunk of memory getting in the way when allocating new chunks of memory for other objects.
Many arrays on the other hand will take more time to deallocate because you have to do it for every array. It’s the same task as 1 array, just many times. Obviously more work. BUT when instantiating the array there’s less work because locating a chunk of memory to stick it in is relatively easy. And working around the item is relatively simple.
I like to think about it like in the game Diablo. Your inventory is a grid, and items have a tile size. Depending how you organize each item effects how much and what you can fit in the grid. Large items can screw you from fitting other items, despite the actual area of both are smaller than the total area of the grid. It’s all because their massive shape causes overlapping. But if you have lots of small items (like potions) you can just toss them in your inventory and they stack in nicely amongst each other.
Just linearly in memory.
Note, this is all opposed to structs.
Structs allocate differently as they usually allocate on the stack.
The stack is called stack for the very reason that it’s just one long continuous block of memory where the values just queue up one after the other in the stack as they’re scoped into existence. As they’re scoped out of existence, they just pop back off the stack.
A function is called, it takes some parameters… those parameters are pushed onto the stack. It then declares some variables, those too, pushed onto the stack. An if statement compares some values, and runs some code that declares some more variables, onto the stack. If statement closes… variables from in that if statement are popped off the stack. Function finishes up, it’s variables then pop, then the parameters passed in pop off.
This is why the article I linked way back says to keep struct sizes relatively small. The stack is not infinitely large. So large structs will fill it up faster. Which brings you closer to a stack overflow that much faster.
Furthermore, a large struct that is a member of a class (an array for instance), makes the class that much larger. So like an array of a large struct will take up HUGE amounts of memory on the heap, making working around that array (think the diablo grid again) that much more difficult.
Wait a minute. Are we talking about using a struct as an array? I thought the OP was wondering about:
var theStuff = new MyArrays();
struct MyArrays
{
int[100] array1;
int[100] array2;
// etc..
}
VS
var theStuff = new MyData[100]();
// ...
struct MyData
{
int thing1;
int thing2;
// etc..
}
Are we talking about using a struct in place of an array?
In the context of what you’re quoting was completely independent of the OP’s question.
I was referring to a scenario where you had a large struct, like say:
[StructLayout(LayoutKind.Sequential)]
public struct MyStruct
{
public long Id; //8 bytes
public string Name; //strings are immutable refs, so 4 bytes for the pointer
public Vector3 Position; //12 bytes
public Quaternion Rotation; //16 bytes
public Vector3 Scale; //12 bytes
//... imagine more and more values
}
So in this, that alone is a 52 byte struct, and imagine more fields in it, it could grow very large.
Then an array of MyStruct is going to require Array.Length * 52 bytes worth of memory allocated for it (or more if you added more fields).
I was explaining why MSDN suggests keeping structs under 16 bytes.
To relate it to OP, you could say that if they intend to have 1 array of structs, and that struct is fat… this would be an argument against doing that.
That is to say, it’s an argument against. Not a rule against doing it. The 16 byte suggestion is just that, a suggestion. Sometimes you need more than that. A 4x4 Matrix is a struct with 16 floats, that’s pretty dang fat… but it really should be treated as a struct, it’s a value, not an object.
Okay I understand. I was still approaching the question from the original context.
As @TEBZ_22 mentioned, there is a bit of a trade off in terms of performance because of the additional dereference. However, as you pointed out, a struct can quickly become very large. I doubt the cost of one additional jump is as bad as the cost of finding space for an array of massive structs.