# How to make array object allocation faster?

``````System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch();
stopwatch.Start();

for(int i = 0; i < 312536; i++)
{
OctreeNode[] faceNodes = new OctreeNode[2];
}

stopwatch.Stop();
Debug.Log("Time taken: " + (stopwatch.Elapsed.Milliseconds));
``````

So this test takes around 40 ms, what is a lot. Is there any faster way to allocate such array objects? `OctreeNode` is just and class with variables.

Each Array is a separate object instantiated on the heap.

Could you get away with a single giant array instead?

`OctreeNode[ ] masterArray = new OctreeNode[2 * 312536];`

1 Like

Nope. Not really. The real code in a recursion and total size is varying. But good tip. I wonder if there are other solutions.

By the way - neither of these will actually create the nodes themselves

List performs worse

``````for(int i = 0; i < 312536; i++)
{
List<OctreeNode> xx = new List<OctreeNode>();
}
``````
``````private static OctreeNode[] faceNodesX = new OctreeNode[2];

public static void xxx(OctreeNode node, ...)
{
for(int i = 0; i < 312536; i++)
{
//OctreeNode[] faceNodes = new OctreeNode[2];
faceNodesX[0] = node.children[0];
faceNodesX[1] = node.children[0];
}
}
``````

Still 12ms. Wtf.

No I meant creating one list and just adding OctreeNodes to it.

Doesnâ€™t seem terribly surprising. Just the loop index with no body probably takes a significant fraction of that time already

``````for(int i = 0; i < 312536; i++) {}
``````

Fermi estimation: A modern CPU has a clock rate on the order of gigahertz. A gigahertz is a billion cycles per second, which is a million cycles per millisecond. Just the loop index is doing most of a million steps (one increment and one comparison per loop iteration times 312536 iterations), and those â€śstepsâ€ť might take more than one clock cycle apiece. So youâ€™d expect the execution time to be on the order of milliseconds.

312536 is a lot of iterations. Doing anything 312536 times is going to be at least somewhat expensive.

You might consider breaking that up and doing it in pieces across multiple frames, or possibly doing it all on a separate thread that runs in the background.

If the sublists arenâ€™t of consistent length, then there might not be any way to convert the 2-dimensional logical index into an equivalent 1-dimensional index in order to access the correct element.

Though if the sub-arrays only vary slightly in size, it might be worth allocating the maximum possible space for all of them just so that you can allocate them all as a single block. Youâ€™d waste some memory, but might save some time.

You also might want to consider using a struct instead of a class. Classes are reference types, which means an array of some class is really an array of references to a bunch of individual objects that are separately allocated on the heap. An array of structs is a single contiguous block of memory that contains all of the data from all of the structs in one place.

(Though your current code isnâ€™t actually instantiating any of the nodes yet, so this definitely wonâ€™t speed up the part youâ€™re currently measuring.)

But itâ€™s some have annoying. For performance critical parts (like lots of iterations) itâ€™s seems to be better creating a lot of variables like x1, x2, x3 etc, rather then a small array to pass to another function. Code looks uglier, but performs so much better.

``````for(int i = 0; i < 312536; i++)
{
OctreeNode x = node.children[0];
OctreeNode y = node.children[0];
OctreeNode f = node.children[0];

something(x, y, f);
}
``````

The above gives me 0 ms.

Where

``````for(int i = 0; i < 312536; i++)
{
OctreeNode[] faceNodes = new OctreeNode[3];
faceNodes[0] = node.children[0];
faceNodes[1] = node.children[0];
faceNodes[2] = node.children[0];

something(faceNodes);
}
``````

30+ ms. And for the static re-used array 12+.

``````private static theStruct[] faceNodesX = new theStruct[2];
public static void xxx(OctreeNode node, ...)
{
for(int i = 0; i < 312536; i++)
{
faceNodesX[0] = new theStruct { int1 = 1 };
faceNodesX[1] = new theStruct { int1 = 2 };
}
}
``````

Seems to perform well. 0 ms. Not sure I can migrate to struct.

It occurs to me that you also might want to think about loop unrolling.

Iâ€™d speculate that in your first example your local variables are being optimized out, whereas even with a re-used array the computer has to actually perform the variable assignments because they donâ€™t immediately go out of scope.

For now I try to stick with structs and re-used struct arrays. I am ok also with passing 2-3 more variables and work with a bot uglier code, but sometimes its 8 or 16 and thatâ€™s to ugly

My real code doesnâ€™t include a huge single loop, but a deep recursion. The effect (timing) is very similar, so I simplified the example for this post.

Games are smoke and mirrors, you try fake that the world is changing while only a small part of the game world is in fact changing. Using quad trees, oclussion culling etc you mininize data that needs to be iterated for a single frame

Can you tell us more about the actual use case?
What are you going to use it for? Do you actually need to initialize the entire tree from the beginning or is lazy initialization something that could help you here?

Can you add the important part of the actual code?
If it takes 30ms to 40ms, whatâ€™s so critical about it? When / how often does the initialization occur that it makes you feel like it definitely needs to be optimized?