Why this loop doesn't get vectorized?

So started experimenting with the burst expect vectorized and found out that a very simple case doesn’t work as expected.
.
The code.

public void Execute(ArchetypeChunk batchInChunk, int batchIndex, int indexOfFirstEntityInQuery)
{
    NativeArray<TargetInternalOptimized2>.ReadOnly targets =
        batchInChunk.GetNativeArray(this.tHandle).AsReadOnly();

    for (int index = batchInChunk.Count - 1; index >= 0; index--)
    {
      
        Unity.Burst.CompilerServices.Loop.ExpectVectorized();
        var target = targets[index];
    }
}

Component that breaks vectorization.

public struct TargetInternalOptimized2 : IComponentData
{
    //xyz position, w entityQueryIndex
    public float4 positionAndQueryIndex;
}

Its odd. Just a component with float4 inside it.

This doesn‘t perform any work. I‘d expect the compiler to optimize this away. Nothing to vectorize.
Try to assign to another array of same type and length.

Thanks for the reply, I changed the code to this and it still claims it can’t be vectorized.

[BurstCompile]
public struct InitChunksFromTargets2 : IJobChunk
{
    [ReadOnly] public ComponentTypeHandle<TargetInternalOptimized2> tHandle;
    [WriteOnly] public ComponentTypeHandle<TargetInternalOptimized2> tHandle2;
    internal int4 key;
    internal float4 positionAndHalfChunkSize;
    private TargetChunk targetChunkToAdd;

    public void Execute(ArchetypeChunk batchInChunk, int batchIndex, int indexOfFirstEntityInQuery)
    {
        NativeArray<TargetInternalOptimized2>.ReadOnly targets = batchInChunk.GetNativeArray(this.tHandle).AsReadOnly();
        NativeArray<TargetInternalOptimized2> targets2 = batchInChunk.GetNativeArray(this.tHandle2);
       
        for (int index = 0; index < batchInChunk.Count; index++)
        {
            Unity.Burst.CompilerServices.Loop.ExpectVectorized();
            targets2[index] = targets[index];
        }
    }
}

8231838--1075707--upload_2022-6-25_13-31-13.png

Someone else has to get into more detail why that happens.
I could not get float4 to vectorize. Float on the other hand works. Maybe the 512 register doesn’t have auto vec support?

On the other hand. Getting the pointers and a memcpy would be much faster.

1 Like

And the math package says that we should use float4 just to be safe.

Yea its faster, but does it help if you need to change things in between?

Not really, it’s only a solution when we are talking about straight up copying.

Yea, thanks for replying.
I tried to strip down all the code to the minimum to find out the issue. But no matter what I dot it wont get vectored

Some further tips, not necessarily for this case but what I learnt. Burst doesn’t know what to do with structs, so casting to simple type pointers like float4* or Reinterpret an array helps.

1 Like

Burst has two modes of vectorization. The first mode is loop-vectorization. The second is instruction vectorization. The ExpectVectorized() intrinsic only checks the first, but using the math types like float4 causes Burst to switch to the second mode.

2 Likes

Also the missing StructLayout attribute on the struct could make a difference.

I actually was experimenting with this yesterday

    Check.Assume(schema.BaseValue.Length % 4 == 0);
    Check.Assume(schema.BaseValue.Length == modifiers.Length);
    Check.Assume(schema.BaseValue.Length == result.Length);

    var min = schema.Min.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
    var max = schema.Max.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
    var baseValue = schema.BaseValue.Reinterpret<int4>(UnsafeUtility.SizeOf<int>());

    var added = modifiers.Added.Reinterpret<int4>(UnsafeUtility.SizeOf<int>());
    var increased = modifiers.Increased.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
    var reduced = modifiers.Reduced.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
    var more = modifiers.More.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
    var less = modifiers.Less.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());

    var stats = result.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());

    for (var index = 0; index < baseValue.Length; index++)
    {
// #if UNITY_BURST_EXPERIMENTAL_LOOP_INTRINSICS
//                 Unity.Burst.CompilerServices.Loop.ExpectVectorized();
// #endif
    
        var addedResult = baseValue[index] + added[index];
        var additiveResult = 1 + increased[index] - reduced[index];
        var multiplicativeResult = more[index] * less[index];

       stats[index] = math.clamp(addedResult * additiveResult * multiplicativeResult, min[index], max[index]);
    }

The code generated looks near perfect yet still fails the check.

So yeah, I think it is as Dreaming says. However, I did manage to make burst generate the exact same code without the Reinterpret (after some additional attributes) and it still failed.

Hi I’m still lost with the vectorization stuff.
Could you share with us the code for schema, modifiers and results ?
It would help me understand the structs layout and their reinterpretation.

Exactly what you see. They’re just NativeArray int/float which I’m reinterpretting to int4/float4

Ok so schema.Min is a native array and schema.Max another native array.
Do we have to split them or is there a way to have a native array of range struct with a min float and a max float and somehow reinterpret the min floats as float 4 for vectorization ?

Now after a while figuring out the burst inspector my code seems similar. No interpret in use.
Also in your screenshot the loop part is not vectored same as me. If the burst just check the loop part and returns its not vectored and doesn’t care about if the other part is. Then it makes sense.

I wonder if there is any way the loop can be vectored

Ok interesting. Just for laughs I converted the loop to use int4.