Should i dealocate a big nativearray that i use to store chunk data?

So i have a nativearray of size 3000*3000(fixed always the same size, i just change the content of it), i use persistent.

I don’t want to regenerate the nativearray at runtime every time i need one, since it causes lag spikes, and i use the container to store data for my map generation every time i generate a chunk.

So my question is;

Can i reuse the one i created at the beginning for the duration of the application, or i have to call deallocate always?, or is there another way to handle this?

The way i write to the same nativearray is using this [NativeDisableContainerSafetyRestriction] since the indexes never overlap, it doesn’t seems to have a problem, but I’m not a programmer i just learn as i go, so i don’t know if this is causing a leak or something else.

Allocations using Allocator.Persistent should be disposed on domain reload or app shutdown, otherwise you’ll have a leak and trigger leak warnings. Unity 6 has Allocator.Domain which doesn’t need to be manually disposed, which is convenient. Re-using an allocation for different things, even between Play Mode sessions w/o a domain reload is fine, as long as it makes sense for your use case and you clean up the allocation on domain reload / app exit or use Allocator.Domain.

NativeDisableContainerSafetyRestrictionAttribute would be appropriate for having completely separate writer jobs, but you would then need to be very careful that you chain / completes jobs properly. For a single parallelized job, NativeDisableParallelForRestrictionAttribute lets you bypass the index restriction on IJobFor/IJobParallelFor.

3 Likes

hi, thanks for the info, il check that out!

For a single parallelized job, NativeDisableParallelForRestrictionAttribute lets you bypass the index restriction on IJobFor/IJobParallelFor.

Yea in a recent post i saw that mentiont but i didn’t quite understood the idea at first, but it makes sense now, il have to think about it tho, thanks.

You did calculate what this amounts to in terms of memory usage?

If it’s a byte array, it would be 9 MiB. If you store a decently sized struct (assuming 128 Bytes) this would allocate 1 GiB of memory!

For a quick check:
sizeof(TypeThatsInTheArray) * 3000 * 3000 = ? ??? ??? ??? bytes

I mention this because it’s not uncommon for devs to be totally oblivious to that sort of thing. :wink:
A transform equivalent (2x Vector3, Quaternion) for instance is already 44+ Bytes (actual size may depend on padding).

Also something to consider is how you iterate over that array. If you iterate over it from start to end (parallel or not) then that’ll be fast. But if you often process a rectangular area within that array then performance will not be optimal because each row being processed is in a different locattion in memory, and every time you switch rows the CPU can’t predict that lookahead and needs to wait for that memory to be loaded into the cache. Although, to put this in perspective, it will still be tons faster than single-threaded managed C# code.

1 Like

hi, you are right, it went over my head to do that calculation, even if was aware of it.

The nativearray is type color32, so im assuming 4 bytes per element(?).

But if you often process a rectangular area within that array then performance will not be optimal because each row being processed is in a different locattion in memory, and every time you switch rows the CPU can’t predict that lookahead and needs to wait for that memory to be loaded into the cache.

That is interesting, im not very familiar with memory management(data oriented programming, tightly packing stuff, etc…), altou, i knew that doing things by row is better for some cpu processing reason.

because each row being processed is in a different locattion in memory

If its a 1D array of Color32 shouldn’t it be like a contiguous row (just to be clear i have no idea, im just assuming base on what ive seen and read on the internet xD)?

So if i have something like this;

for(int z = 0; z < 256; z++)
{
      for(int x = 0; x < 256; x++)
      {
              nativeArrayOfColor32[y * size + x] = new Color32();

      } 
}  
                
      

That would access a chunk of size 256 on a 1k by 1k array of color32 for example, is that still a miss somewhere, or a fail prediction?

My guess is that the reason it would fail to predict is because after element 255 im not looking element 256 in the array, but some other element that’s abit further in the array, even if the forloop says what’s the next element to be processed is(?)

Anyways thanks!

Must be 4 bytes total. Color is 4 floats (16 bytes) but Color32 is four individual bytes.

It depends on the layout of the array. If X is your major row direction then you are accessing 256 rows in sequence but may have 256 times a cache miss every time z advances. This is because z+1 jumps by 3000*4 bytes to a new memory location. The CPU may or may not be able to predict that jump - since this happens regularly it might just predict it correctly.

However, if you were to iterate jumping through rows eg you increment z in the inner and x in the outer loop you’d have a big memory jump on every iteration.

One more thing to keep in mind: cache lines are 64 bytes long. The CPU always loads an entire cache line into the cache. So if you process those 64 bytes entirely this will be fast. If the CPU loads 64 bytes but you only read four, then the memory load operations will slow the process down.

In theory … modern CPUs and compilers can be quite clever in optimizing such code so there’s never a guarantee.

1 Like

This is because z+1 jumps by 3000*4 bytes to a new memory location. The CPU may or may not be able to predict that jump - since this happens regularly it might just predict it correctly.

However, if you were to iterate jumping through rows eg you increment z in the inner and x in the outer loop you’d have a big memory jump on every iteration.

Ohh, that makes more sence now.

One more thing to keep in mind: cache lines are 64 bytes long. The CPU always loads an entire cache line into the cache. So if you process those 64 bytes entirely this will be fast. If the CPU loads 64 bytes but you only read four, then the memory load operations will slow the process down.

by loading you mean this part;

nativeArrayOfColor32[y * size + x] = new Color32();

?
That is interesting.

That would load only 4 bytes i assume(theoretically), but how do you even load 64 bytes instead of the color32, i guess with a struct maybe, but well il have to research on this, thanks!

Compiler and CPU try to be clever. They likely understand your loop well enough to assume that you will be assigning more values to the array and thus the array index + 64 bytes is written into the cache (and eventually wrote back to main memory). Since new Color32() is equivalent to assigning (int)0 the compiler may even optimize the loop to a series of memcopy or memclear - eg write the same value to a long sequential area in memory. This is ultra-fast.

Actually, perhaps you don’t even need to clear these values? Native arrays can be configured upon new() whether to leave the memory uninitialized or not. So unless you specify “uninitialized” the array will already be “cleared” with zeros (with memclear/memcopy) aka “new Color32() all the way”.

I overlooked that this loop writes, not reads. If it were to read, the array index + 64 bytes would be read into memory under the assumption that the most likely course of operations is more read operations at index+1, index+2 and so on.

That’s a very handy allocator when working with native containers in Editor code, thanks! However, is there actually any way to dispose of containers before domain reload? I can’t find any events that take place before the domain reload - only after it.

AssemblyReloadEvents.beforeAssemblyReload is a good place to put these kinds of calls in the Editor. You can also try the pattern used by Entities in which a MonoBehaviour is programmatically added to a DDOL GameObject, where the MB has an OnDestroy method which will get called when the object is destroyed as part of unloading the scripting domain.

1 Like