Is there really no form of array usable in a struct inside a Native container?

DragonCoder · January 31, 2022, 10:18pm

Hello Community,

After having spent about an hour trying all container types known to man, am stuck with a weird workaround to achieve something trivial looking like:

NativeArray<NativeArray<int>> nested_array;

Have resorted to have a struct like this instead of the inner NativeArray:

public struct ArrayIntX26
{
    int v0, v1, v2, v3, v4 ...; // all the way to 26
    public void Set(int index, int value)
    {
        switch (index)
        {
            case 0: v0 = value; break;
            case 1: v1 = value; break;
            case 2: v2 = value; break;
            case 3: v3 = value; break;
            case 4: v4 = value; break;
            ...
            ...
        }
    }
    public int Get(int index)
    {
        switch (index)
        {
            case 0: return v0;
            case 1: return v1;
            case 2: return v2;
            case 3: return v3;
            case 4: return v4;
            ...
            ...
        }
        return -1;
    }
}

which feels very inefficient (or isn’t it?).

It feels quite necessary for a somewhat larger architecture to have some form of nested arrays.
The fact that my workaround technically works, means there isn’t an inherent limitation though, is there? A compiler could easily generate such a code when the user writes something like:

int[26] the_array;

Is it a deliberate decision to exclude this usecase?
Or am I overlooking a better solution after all?

MaxPfeil · February 1, 2022, 7:33am

If you know the per-array length, you could use a single combined 1D-array.
For variable per-array length, NativeMultiHashMap might be an option.

What data would you like to pass as nested arrays to your job?

sheredom · February 1, 2022, 8:20am

The problem is that native containers contain a single managed object, the DisposeSentinel (that we use to ensure the code is safe). We can workaround this managed object when it sits at the base of a job struct, but not anywhere else.

Good news is that there is an effort (ongoing, I don’t know the ETA) to replace this managed object with some unmanaged system, which will let you nest native containers within others (but there will be a performance cost to pay, most likely all vectorization would no longer be possible for instance).

As @MaxPfeil says - if each sub array is the same size then just mush them down to a single 1D array would be your best bet.

Baggers · February 1, 2022, 10:50pm

Two other options for if you always have a fixed length of 26:

26 * sizeof(int) = 104 bytes, so maybe you could use FixedListInt128 (or FixedList128)
If you are fine with unsafe code then you could try a fixed array in a struct e.g.

unsafe struct FixedBufferExample
{
    public fixed int Data[26]; // This is a fixed buffer.
}

I’ve not tried these in Burst, but these are where I’d start if I really wanted to avoid just using the NativeArray approach suggested by others

Carpet_Head · February 2, 2022, 5:45pm

this would be absolutely huge - though the vectorisation point is interesting. Would we still be able to manually use things like [NoAlias] to get that performance back?

sheredom · February 3, 2022, 8:46am

Not really, or at least it becomes much much harder. Modern CPU hardware doesn’t really have support for sparse load / stores, and so you’d have to have an insanely high amount of ALU operations to justify the cost of stitching together vectors from disjoint memory locations, do the ALU, then rip the vectors apart from storing. It can be done, but its generally not worth it and our compiler wouldn’t do it - even if you had [NoAlias] on all the correct places.

Really this is one of these cases where the default we have now, while I acknowledge can be annoying when you want to have containers-in-containers, is a sort of forcing function to ensure we can have lovely vector code. Burst will still provide pretty optimal code in the containers-in-containers case, but I can all but guarantee that you’ll lose vectorization (and potentially the big multiplier that that gives your performance!).

Carpet_Head · February 3, 2022, 5:36pm

sheredom:

Not really, or at least it becomes much much harder. Modern CPU hardware doesn’t really have support for sparse load / stores, and so you’d have to have an insanely high amount of ALU operations to justify the cost of stitching together vectors from disjoint memory locations, do the ALU, then rip the vectors apart from storing. It can be done, but its generally not worth it and our compiler wouldn’t do it - even if you had [NoAlias] on all the correct places.

Really this is one of these cases where the default we have now, while I acknowledge can be annoying when you want to have containers-in-containers, is a sort of forcing function to ensure we can have lovely vector code. Burst will still provide pretty optimal code in the containers-in-containers case, but I can all but guarantee that you’ll lose vectorization (and potentially the big multiplier that that gives your performance!).

Maybe the use case you are thinking of is a bit different to ours - for example we have a struct containing a few native containers that performs some mathematical operations on those arrays individualy, lets say looping over all of them, which should be vectorised. Right now, we have 1000 of these structs that we have to schedule 1000 identical jobs because we cannot make an IJobParallelFor, because we cannot put those structs into another native container.

The thousand of scheduled jobs has a lot of overhead - and a parallel for job would greatly reduce the overhead. If we could add all of those structs to a single nativearray of inputs, we could use a parallel for job. I don’t see why this would break performance in this case? It’s basically the same code - it’s just how the job is scheduled, and how we set up the inputs and outputs to be a parallel for job

MaxPfeil · February 3, 2022, 6:20pm

It might depend on your concrete use case, but I don’t see a particular reason why you couldn’t split this struct containing arrays into its base components and allocate one large NativeArray for all instances per component.

For parallel scheduling, as long as you make sure your jobs don’t write outside their intended range, you can give the respective NativeArray the [NativeDisableParallelForRestriction] attribute.

One great example of vertical component usage (and beautiful SIMD generation):
Intrinsics: Low-level engine development with Burst - Unite Copenhagen

The example starting at 28:15 - splitting the Door struct into its base components (posX, posY, size etc.) - might help you.

DragonCoder · February 3, 2022, 8:21pm

Huh, this thread became quite active!

Wow this works like a charm! Thank you. Wonder why do those structs feel hidden in some way. I have looked through some burst-related tutorials and did not stumble across them at all. Indeed it is what I had hoped for and it’s relatively close to how I would reserve memory as part of a struct if this were C++.

What I want to store in those 26 slots already are indices of another, larger read-only array which would exist anyways because its entries are shared.

For context: It’s a procedural 3D network of points on a large 3D grid. Every point can be connected to each neighbor (thus max 26 connections) and I want to get rid of the 3D grid itself after generating new points (as it’d become immense otherwise) and thus only store the points that actually exist and which other points they connect to.

Therefore having my struct store the index within an array which again contains indices to that other array feels rather messy. Main point here was readability. After all, object-oriented programming has become the norm so that we do not have to manage massive chunks of memory via ‘pointers’ anymore… It helps with maintainability too and it’d be quite strange to have a more maintainable solution possible in C++ than in C#, haha.

Therefore I like the Burst and Job Approach of Unity a lot. It forms a nice middleground and while there is no full OOP with inheritance, there is solid (partially enforced) encapsulation.
In every case it already increased performance by factor 6-8 compared to a pure single threaded C# prototype in my case.
Great to see that it provides a way to achieve such memory structures after all. It was the only roadblocker I had encountered. 4096 bytes feels generous enough too.

Maybe the compile errors complaining about managed types in the structs, could give a hint towards those ala “Try using fixed types from Unity.Collections.”?

That is very reasonable of course. Just coming from C++ I expected there would be some way to have unmanaged data since that’s where the power of cache-aware programming lies.
That surely is a thing with Burst too, isn’t it? Like accessing the contents of a struct that’s looped in order, should be faster than effectively randomly accessing contents of a separately stored, huge array which doesn’t fit into L1 and probably L2 caches anymore.
With effort you can partially achieve the same if you ‘manually’ ensure that both the list of structs and list of separate data always grow simultaneously, but even then, the CPU has to cache two separate chunks of memory.

That does sound neat!
Does the restriction regarding vectorization apply to the FixedList types as well by the way?
Not of matter in my case since there’s no arithmetic applied to those indices, but am curious.

Are your native containers too large even for FixedList4096Bytes structs?
What kind of game component is repeatedly handling this much non-texture data, if I may be so curious? Only one of those for each of your 1000 structs and you already occupy 4MB.

Per-Morten · February 4, 2022, 4:37pm

Carpet_Head:

Maybe the use case you are thinking of is a bit different to ours - for example we have a struct containing a few native containers that performs some mathematical operations on those arrays individualy, lets say looping over all of them, which should be vectorised. Right now, we have 1000 of these structs that we have to schedule 1000 identical jobs because we cannot make an IJobParallelFor, because we cannot put those structs into another native container.

The thousand of scheduled jobs has a lot of overhead - and a parallel for job would greatly reduce the overhead. If we could add all of those structs to a single nativearray of inputs, we could use a parallel for job. I don’t see why this would break performance in this case? It’s basically the same code - it’s just how the job is scheduled, and how we set up the inputs and outputs to be a parallel for job

We had a similar problem to this. We had a bunch of work we wanted to do on inherently 2D array data (including sorting, appending to lists, etc) and we wanted jobs to work on all that data in parallel. Scheduling one parallel job per ‘outer array’ had massive overhead, so it wasn’t really an option. In our particular case we knew exactly what the maximum sizes of the 2d data was, so we put all our data in one huge array and then create another array of RangeInt describing what parts of the huge array belonged to what ‘outer array’/batch. This allowed us to both schedule parallelfor jobs on an ‘outer array’/batch level when needed and allowed us to schedule parallel for on the ‘inner array’ level when knowledge of the ‘outer array’/batch wasn’t needed.
For the ‘inner array’ level jobs we got perfect vectorization and per entry parallelization, while for the ‘outer array’/batch level jobs we got parallelization on a per ‘outer array’/batch level but perfect vectorization on the ‘inner array’ data.

Essentially we had these two job types:

struct MyInnerArrayElementJob : IJobParallelFor
{
    public NativeArray<float> InnerArrayData;

    void Execute(int valueIdx)
    {
        var value = InnerArrayData[valueIdx];
        // Work on value
    }
}

struct MyOuterArrayElementJob : IJobParallelFor
{
    public NativeArray<float> InnerArrayData;

    public NativeArray<RangeInt> Batch;

    void Execute(int batchIdx)
    {
        var batch = Batch[batchIdx];
        for (int i = batch.start; i != batch.end; i++)
        {
            var value = InnerArrayData[i];
            // Work on value
        }
    }
}

Topic		Replies	Views
Unsure about two things for Unity Jobs. Nesting array data and Allocators Unity Engine Entities , Question , com_unity_entities	12	2920	November 23, 2020
NativeArray<> not blittable. Is this intended or likely to change? Unity Engine Entities , com_unity_entities	6	4073	January 28, 2019
How to pass array of Native Arrays to the job? Unity Engine Entities , Question , com_unity_entities	8	2362	September 13, 2020
How to create a struct for the Unity Job System (with Burst) that contains a collection? Unity Engine Job-System	6	7635	December 14, 2022
Native Arrays approximately an order of magnitude slower than arrays Unity Engine Entities , com_unity_entities	42	15478	August 25, 2018

Is there really no form of array usable in a struct inside a Native container?

Related topics