Unity.Burst.CompilerServices.Loop.ExpectVectorized() doesn't work.

Hello, I have discovered the CompilerServices namespace recently and have been experimenting with the different intrinsics it offers. Unfortunately, I cannot seem to be able to make Loop.ExpectVectorized() and Loop.ExpectNotVectorized() function in any way. The Burst compiler throws me this error message any time I recompile:

Burst error BC1321: The loop is not vectorized where it was expected that it is vectorized.

I have rebuilt the code snippet below with the bare minimum to make the code fill a NativeArray with zeroes:

using Unity.Burst;
using Unity.Burst.CompilerServices;
using Unity.Collections;

    [BurstCompile]
    public struct OneRoomMapGenerationAlgorithm : IMapGenerationAlgorithm
    {
        [BurstCompile, SkipLocalsInit]
        public NativeArray<int> Generate(int length)
        {
            NativeArray<int> tilesIDs = new(length, Allocator.Temp);

            for (int x = 0; x < length; x++)
            {
                Loop.ExpectVectorized();
                tilesIDs[x] = 0;
            }

            return tilesIDs;
        }
    }

I have no idea how to fix this error. Any guidance would be appreciated. Thank you for your time.

Check the Burst Inspector. It likely failed to vectorize this loop as it says.
The culprits aren’t often obvious but I think what’s going on here is that the compiler cannot know what the “length” value is at runtime, it could be any number. Therefore it cannot aggressively vectorize the loop because it needs to accound for length being an odd number like 1 or 13.

You can likely force it to vectorize if you increment x by 4 (assuming 128 bit vectors, eg 4* 4 bytes = 16 bytes).
Then make four assignments manually if you know that the length will always be a multiple of four. Perhaps there’s also a Burst intrinsic that let’s you tell this “multiple of four” fact to the compiler.

If length can be any value, you can still use the +4 approach but you’d have to have separate code afterwards that assigns the remainder, if any. Don’t add this to the actual loop or else the vectorization fails again due to the conditional.
But then again, I vaguely remember Burst actually does this and generates code for the remainder of assignments. However that may be because I was using float3 and the padding was implicit by using that odd type. Like I said, it’s complicated and it helps to be able to (or learn to interpret) the Burst assembly output.

And this is just with my barely scratching the surface experience with vectorization, so: grain of salt, and it may be more or less complicated. :wink:

You would maybe want to use the Burst inspector in these instances to check what happens with the code generation.

If the code seems to be vectorized, could I get you to create a bug for it? :slight_smile:

I suspect what is happening is that the loop is optimized into a memset call, which isn’t seen as a vectorization per say, leading to the error you see.

The compiler not being able to know the size of length shouldn’t be the culprit here (if I’m not mistaking), as it hopefully would create code (this is very hand-wavy) with two loops: 1) vectorized loop running for as much as possible and 2) scalar loop taking up the slack, if the length was not divisible by the vector size.

I tried reproing in the newest burst and got a similar error (though it was a bit nicer to me, saying the loop was optimized away). Looking at the generated code, the compiler had been nice and optimize into a memset call.

For this case I would simply remove the Loop.ExpectVectorize from the code

Thanks for clearing this up. After I wrote this I thought: yeah, the compiler should be able to handle that. :slight_smile:

Hello, and thank you for your time. I didn’t use the Burst Inspector so far, as I have no idea how to interpret its content.

Removing the Loop.ExpectVectorized() compiles the code as intended and displays the assembly variant in the Burst Inspector, but leaving that method inside the code only reprints the error message instead of any assembly code.

I will file a bug report for you in the meantime, and come back to you once it’s done.

Before you send a bugreport, make your loop actually do something besides assigning a constant value to every index (equals memset).

In a previous version of my code, it also incremented an external counter that was used for future for loops. I don’t know if it’s enough, but even back then the loop wasn’t vectorized. I’ve sent a bug report for this issue.