I did read that more, and changed to (sizeof(int)) but:

On a test array of 24 ints, numbers now has a length of 24 when it needs to be 6 x v128.

Assuming a NativeArray simply holds a pointer and a length, how do I change pointer size from 4 to 16 and the length from 24 to 6? Easily done with void pointers, not sure how in Unity. Trying to do things the safe way!

Ok I got further, but here’s what I don’t understand. Without the commented lines to get the memory out of the loop it takes 300 ticks. With the line running so I can get the sum out… it takes 5000000 ticks…

I’m assuming 0+1 is much faster than 1000000+1, this can be the only reason, but why, more bits to add up?

Or is some crazy safety check based on the size? I thought 0+1 is no different from 1000000+1 in speed (well actually these output -1 so it would be -1000000±1)

When you use line 26 instead of line 27, each iteration of your loop completely replaces the value of tally without using anything from the previous iteration. This means only the last iteration does any actual work and the rest is useless.

The massive difference in timings suggests the burst compiler noticed this too and optimized things so the loop doesn’t exist anymore: it only needs to do the last iteration to obtain the same result (with idx = 0). The actual work is done only n times for an input array of size of n.

With line 27 instead of 26, the amount of work for n items is n², an exponential increase as your items list grows in size.

Ok that makes sense. Well no idea how intrinsics here can make things faster, seems pretty slow unless I’m missing something (safety checks off). I’m also realising now how inputs/outputs are backwards. Intrinsics is one thing, thinking backwards is another! But at least I made a load of consts backwards for shuffling (after realising they need a backwards control). Maybe I can use that part for re-arranging.