This is a very weird, very specific burst performance issue.
Was doing some benchmarks between a fixed point library and float and turned off a bunch of checks and the fixed point library suddenly performed much worse, even though it was executing less code.
I simplified this down to a basic performance test and it pretty much comes down to this.
This
var product = (long)this.Input[i] * this.Input[j];
var result = product >> 12;
this.Output[i] = (int)result;
executes 8x slower than
var product = (long)this.Input[i] * this.Input[j];
var result = product >> 12;
// 1 extra line of code which speeds up job 8x
result += (product & IntSignBit) >> (12 - 1);
this.Output[i] = (int)result;
Which makes no sense to me as the code is the same exact it has extra operations (add, sub, shift, and). I’d expect it to run slightly slower, not nearly an order of magnitude faster.
.I have repeated this dozens of times, tweaking stuff, run the test in different orders, upgraded burst (this is running on latest package now same result), forced recompiles and the result are the same.
Results:
Full Source:
The burst jobs
[BurstCompile]
private struct Test1Job : IJob
{
public NativeArray<int> Input;
public NativeArray<int> Output;
public void Execute()
{
for (var i = 0; i < Count; i++)
{
for (var j = 0; j < Count; j++)
{
var product = (long)this.Input[i] * this.Input[j];
var result = product >> 12;
this.Output[i] = (int)result;
}
}
}
}
[BurstCompile]
private struct Test2Job : IJob
{
public NativeArray<int> Input;
public NativeArray<int> Output;
private const int IntSignBit = 1 << (12 - 1);
public void Execute()
{
for (var i = 0; i < Count; i++)
{
for (var j = 0; j < Count; j++)
{
var product = (long)this.Input[i] * this.Input[j];
var result = product >> 12;
// 1 extra line of code which speeds up job 8x
result += (product & IntSignBit) >> (12 - 1);
this.Output[i] = (int)result;
}
}
}
}
The tests
[Test]
[Performance]
public void Test1()
{
NativeArray<int> input = default;
NativeArray<int> output = default;
input = new NativeArray<int>(Count, Allocator.TempJob);
output = new NativeArray<int>(Count, Allocator.TempJob);
for (var i = 0; i < Count; i++)
{
input[i] = Random.Range((int)MultiMin, (int)MultiMax);
}
Measure.Method(() =>
{
new Test1Job
{
Input = input,
Output = output,
}
.Schedule().Complete();
})
.Run();
input.Dispose();
output.Dispose();
}
[Test]
[Performance]
public void Test2()
{
NativeArray<int> input = default;
NativeArray<int> output = default;
input = new NativeArray<int>(Count, Allocator.TempJob);
output = new NativeArray<int>(Count, Allocator.TempJob);
for (var i = 0; i < Count; i++)
{
input[i] = Random.Range((int)MultiMin, (int)MultiMax);
}
Measure.Method(() =>
{
new Test2Job
{
Input = input,
Output = output,
}
.Schedule().Complete();
})
.Run();
input.Dispose();
output.Dispose();
}
Please tell me I’ve done something stupid. I can’t see it though.
-edit-
safety system off
leak detection off
job debugger off
burst compilation on
synchronous compilation on