So I’m receiving some feedback because I use NativeArray(Allocator.Temp) for a temporal use inside a job, to get some items, use them for the calculations, then it gets disposed after the job automatically.
I was told to use
stackalloc int [ x ]
instead. I’ve never used it before and never had big performance issues.
But it makes all my calls unsafe. Not sure if that’s really faster and in what context and if it’s really worth it.
I suspect that unless you’re doing these allocations in a hot loop (which in general might not be a good idea from a performance standpoint) then I don’t think it’ll make much of a difference. stackalloc might be slightly faster. But as always, I can just recommend doing performance profiling and measuring the difference. There might be some complex compilation interactions that give non-obvious results, so it’s always worth measuring, if performance needs to be improved in that area.
stackalloc can be used without an unsafe context if you use a Span. var a = stackalloc int[x]; // var becomes int* Span<int> a = stackalloc int[x];
Although allocating on the stack is generally faster than the heap, only use stackalloc if you know it will be a relatively small array. Otherwise you could risk causing a stack overflow.
There’s an example of how to limit how much memory is allocated:
const int MaxStackLimit = 1024;
Span<byte> buffer = inputLength <= MaxStackLimit
? stackalloc byte[MaxStackLimit]
: new byte[inputLength];
I’d change it to something like this:
const int MaxStackLimit = 1024;
Span<byte> buffer = inputLength * sizeof(byte) <= MaxStackLimit
? stackalloc byte[MaxStackLimit]
: new NativeArray<byte>(inputLength, Allocator.Temp).AsSpan();
Now it calculates the actual memory size of the array instead of the number of elements, and it uses a NativeArray as a fallback instead of a managed array.
sizeof() might not be able to find the size of certain user-made structs but you can use Unity.Collections.LowLevel.Unsafe.SizeOf() for those. Despite the name, it doesn’t require the unsafe context.
Are you doing an allocate for every chunk execute? The best optimization here is to allocate per thread, once. The cost reduction is quite big unless this is single scheduled where it depends on your inner loop. If you only have 1 allocation there’s no difference.
so you can do it OnWorkerBegin and even get rid of the IsCreated check.
I didn’t find any performance differences with temp allocated arrays or stackalloc even though stackalloc sounds much better in theory. The downside with stackalloc is that it’s much more restrictive.
For both cases, the allocation is the expensive part.