Native Containers Allocators

When Unity says:

Is the performance cost only on allocation or is it using the Persistent container is less performant than using a TempJob container in a Job?

For example I could technically create only one Persistent container OnCreate and dispose of it OnDestroy to use throughout the duration of my application and pass it to my job, then clear it and reuse it on job complete. Would that be less performant than creating a new TempJob container and disposing it every time the job is complete?

1 Like

Just allocating, but that also affects any internal allocations the container does, including a NativeList resize for example.

Is the actual difference that Persistent goes to heap and Temp goes to stack? In that case could I assume that the stack block would be reused more often and therefore Temp is also faster on usage and not just on allocation? (I don’t know why should that be, but by intuition there should be some optimization on repeated usage of the same memory area)

Stack memory isn’t faster than heap memory on modern platforms for accessing, as they are backed by the same memory hardware. The last time I’ve seen a dedicated stack memory chip was on the Nintendo DS and some microcontrollers still have them.

But also no, that’s not quite how it works.

Persistent allocates most likely using a system call allocation (similar to “new” in C++). Whereas Temp and TempJob use segments of preallocated memory that Unity uses.

1 Like

Would it then be safe to say that as long as the memory usage is “fixed” like an array, it’s better to allocate it once through Persistent and reuse that instead of making a TempJob array every frame? I’m starting to think it’s one of those things that really don’t matter much.

You should settle this with performance testing package on real device. As far as I know I have not seen anyone test about this before other than throwing theories and assumptions around. (so no one could come and confirm)

2 Likes

Profiling is technically the correct answer. But a little tip from experience:

Unless you are creating thousands of NativeContainers every frame, don’t spend the time worrying about it. That time is better spent optimizing your jobs. Just because your jobs use Burst doesn’t mean they are as optimized as they can be. I typically squeeze out a 2x - 10x improvement just from looking at the Burst inspector and making changes to the C# code. That’s going to make a lot bigger difference.

3 Likes

Stack is faster on allocation and deallocation. Stack allocation is a simple return of the current stack pointer and set the stackpoint to the end of the allocated memory, deallocation switch the stackpointer to the previous position. The max default Stacksize in C# (Windows) for 32bit processes ist 1MB and 4 MB for 64 bit processes (I don’t know have Unity increased the default stacksize).
I don’t think Temp allocation are on the stack, but maybe an custom stack allocator (work on Heap in a similar way as the system stack).

It’s not possible the Unity allocator uses the C stack. Not to be confused with a stack like structures ie LIFO. Which If I remember correctly they posted it is based on that.

C stack is function local and static, known at compile time. So the C compiler actually generates static code for all of the deallocations.

In C# you can: https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/operators/stackalloc

No that follows the same rules as the C stack.

You can use stackalloc in Bursted jobs, and the allocation is faster the Temp for small fixed-size arrays. It is useful if you need to set up a small temp fixed-sized array in a static method that gets used by jobs.

My original point was that there is no difference in performance when iterating through an array allocated on the stack or an array allocated on the heap. That iteration is often far more expensive than the allocation.

As long as the data are streamlined in memory, i also think the iteration performance should the same.

Allocation performance depends on the frequency, only once at frame the allocation difference is negligible. but if you allocate many variables per frames, not only the allocation take more time, but with heap also the GC takes in.
In .NET core, the most performance come from using Span and never life on the heap.

I come back to add relevant articles from Jackson Dunstan :

Native Memory Allocators: More Than Just a Lifetime

How Long Does a Temp Allocation Last?

What Does Deallocating Temp Memory Do?

Are Temp Allocations Always Fast?

Allocating Memory Within a Job

Allocating Memory Within a Job: Part 2

Temp Memory Reuse

2 Likes