Burst-efficient way of storing bytes or shorts in floats?

Hello Community,

am working on some burst-accelerated smart particle magic where I need to store small amounts of data alongside each particle. The “custom data” system seems like predestined for that, but 8 values total are a bit tight.
Would rather not introduce a parallel data structure to store the data to keep the system self contained.

What would be a good way to store 4 bytes or 2 shorts in every float of a vector4 (or more like its floats) when the goal is to be burst compiled efficiently?
There’s the “BitConverter” ( BitConverter.GetBytes Method (System) | Microsoft Learn ) but that does not seem like a very Unity-friendly solution.
Are there low level bit operations?

Huge thanks for any input!

The Unity.Mathematics package has bitwise reinterpretations between float and uint. You can easily pack bytes and shorts into uints using bitwise operations and shifts.

Why “in floats”?
Do you mean “float vectors” and thus actually SIMD registers?

Well that’s my assumption anyways.
2 Approaches I like:
1: Use my SIMD math library (signature). It has vectors up to (s)byte32, aswell as (u)short and (u)long vectors (and matrices), while using assembly language, exposed by Burst as compiler intrinsics.
2: Just use unsafe code.

unsafe
{
    byte[] myArray = GetItFromSomewhere();
    fixed (void* ptr = myArray)
    {
        float4 firstFloat4 = *(float4*)ptr;
        ptr = (float4*)ptr + 1;
        float4 scndFloat4 = *(float4*)ptr;
    }
}

Oh, math.asint \ math.asfloat! I see, thank you, think that will work.

No, they are Vector4, albeit maybe in the background the burst compiler will use SMD registers(?).
Am working with particles and burst jobs with the IJobParticleSystemParallelFor interface. Now the execute function looks like this:

public void Execute(ParticleSystemJobData particle_data, int i)
{
      float a = particle_data.customData1.x[i];
      // now read two shorts out of that float
      // ...
      // and write two different shorts to it
      particle_data.customData1.x[i] = new_value;
}

I do not need interaction between different particles, so I do not loop over a single array in one go, but over all particles in separate calls (managed by the job scheduler). Therefore am not sure how I can make use of your suggestions.

Your library certainly looks impressive though for true low-level programming in Unity. Amazing what’s possible.

EDIT: Guess I can use your library if I convert the Vector4 into float4 (and when writing back, turn back into vector4). Do you know whether those conversions would come without cost that negates any advantages of a SIMD library over following DreamingImLatios’s suggestion?
It is a bit unfortunate that Unity’s particle system does not use the datatypes from Mathematics yet.

@DragonCoder Hmm unless your code has any branches or variable bit shifts in it (the // ... part of your code :wink: ), Latios’ approach will be just as fast as using for example a ushort8 instead. Although I tend to like the latter approach more, since it result in fewer bitwise operations (which will probably still be compiled away, depending on implementation details - performance will most likely be optimal either way) and the intent of you writing shorts into a block of memory is expressed in a more clear way. PLUS, with my lib, you can use 256 bit vector types (float8 and friends) which either uses AVX SIMD vectors (they handle twice as much data) or exploit instruction level parallelism by handling 2 128bit vectors at once. Burst doesn’t do it automatically since it considers vectorized code not to be “vectorizable” any further. I’m getting lost in micro optimizations again…

But who knows… Maybe the specific job code will result in way faster machine code when using a short vector. I’ve seen it happen before but I cannot attempt predicting the result without looking at the details of your code.

Vector4float4 conversion is completely free (0 instructions) in Burst code for sure and I strongly suspect that it is even free in Mono JIT code. No worries there. But always stick to the “rule of 128”. The vector you use should ALWAYS be a multiple of 128 bits or 16 bytes wide. It is very important for performance reasons.

2 Likes