So started experimenting with the burst expect vectorized and found out that a very simple case doesn’t work as expected.
.
The code.
public void Execute(ArchetypeChunk batchInChunk, int batchIndex, int indexOfFirstEntityInQuery)
{
NativeArray<TargetInternalOptimized2>.ReadOnly targets =
batchInChunk.GetNativeArray(this.tHandle).AsReadOnly();
for (int index = batchInChunk.Count - 1; index >= 0; index--)
{
Unity.Burst.CompilerServices.Loop.ExpectVectorized();
var target = targets[index];
}
}
Component that breaks vectorization.
public struct TargetInternalOptimized2 : IComponentData
{
//xyz position, w entityQueryIndex
public float4 positionAndQueryIndex;
}
This doesn‘t perform any work. I‘d expect the compiler to optimize this away. Nothing to vectorize.
Try to assign to another array of same type and length.
Someone else has to get into more detail why that happens.
I could not get float4 to vectorize. Float on the other hand works. Maybe the 512 register doesn’t have auto vec support?
On the other hand. Getting the pointers and a memcpy would be much faster.
Some further tips, not necessarily for this case but what I learnt. Burst doesn’t know what to do with structs, so casting to simple type pointers like float4* or Reinterpret an array helps.
Burst has two modes of vectorization. The first mode is loop-vectorization. The second is instruction vectorization. The ExpectVectorized() intrinsic only checks the first, but using the math types like float4 causes Burst to switch to the second mode.
Check.Assume(schema.BaseValue.Length % 4 == 0);
Check.Assume(schema.BaseValue.Length == modifiers.Length);
Check.Assume(schema.BaseValue.Length == result.Length);
var min = schema.Min.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
var max = schema.Max.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
var baseValue = schema.BaseValue.Reinterpret<int4>(UnsafeUtility.SizeOf<int>());
var added = modifiers.Added.Reinterpret<int4>(UnsafeUtility.SizeOf<int>());
var increased = modifiers.Increased.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
var reduced = modifiers.Reduced.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
var more = modifiers.More.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
var less = modifiers.Less.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
var stats = result.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
for (var index = 0; index < baseValue.Length; index++)
{
// #if UNITY_BURST_EXPERIMENTAL_LOOP_INTRINSICS
// Unity.Burst.CompilerServices.Loop.ExpectVectorized();
// #endif
var addedResult = baseValue[index] + added[index];
var additiveResult = 1 + increased[index] - reduced[index];
var multiplicativeResult = more[index] * less[index];
stats[index] = math.clamp(addedResult * additiveResult * multiplicativeResult, min[index], max[index]);
}
The code generated looks near perfect yet still fails the check.
So yeah, I think it is as Dreaming says. However, I did manage to make burst generate the exact same code without the Reinterpret (after some additional attributes) and it still failed.
Hi I’m still lost with the vectorization stuff.
Could you share with us the code for schema, modifiers and results ?
It would help me understand the structs layout and their reinterpretation.
Ok so schema.Min is a native array and schema.Max another native array.
Do we have to split them or is there a way to have a native array of range struct with a min float and a max float and somehow reinterpret the min floats as float 4 for vectorization ?
Now after a while figuring out the burst inspector my code seems similar. No interpret in use.
Also in your screenshot the loop part is not vectored same as me. If the burst just check the loop part and returns its not vectored and doesn’t care about if the other part is. Then it makes sense.
I wonder if there is any way the loop can be vectored