Why Jobs are slower in this case

I was wondering what is better to have two jobs that do same small tasks one after another (one with if statement) or is better to combine them into one. From my test, it turned out that two jobs are slightly better despite I go thru all the elements twice. But just for comparison, I have recreated the same logic in a standard for loop and the outcome have surprised me. A single-threaded code was 5 times faster. Here is my code:

using Unity.Collections;
using Unity.Jobs;
using UnityEngine;

public class TEST : MonoBehaviour
{
    public int Count = 5000000;
    public bool one = true;
    public bool normal = false;
    private NativeArray<float> _floats;
    private float[] _array;

    private void Start()
    {
        _floats = new NativeArray<float>(Count, Allocator.Persistent);

        var job = new InitJob()
        {
            counters = _floats,
            init = 3
        }.Schedule(Count, 64);
        job.Complete();
        _array = _floats.ToArray();
    }

    private struct InitJob : IJobParallelFor
    {
        public NativeArray<float> counters;
        public float init;

        public void Execute(int index)
        {
            counters[index] = init;
        }
    }

    private struct DecreaseJob : IJobParallelFor
    {
        public NativeArray<float> counters;
        public float deltaTime;

        public void Execute(int index)
        {
            counters[index] -= deltaTime;
        }
    }

    private struct DecreaseAndReset : IJobParallelFor
    {
        public NativeArray<float> counters;
        public float deltaTime;
        public float init;

        public void Execute(int index)
        {
            counters[index] -= deltaTime;
            if (counters[index] < 0)
                counters[index] = init;
        }
    }

    private struct ResetJob : IJobParallelFor
    {
        public NativeArray<float> counters;
        public float init;

        public void Execute(int index)
        {
            if (counters[index] < 0)
                counters[index] = init;
        }
    }

    private void Update()
    {
        if (normal)
        {
            var deltaTime = Time.deltaTime;
            for (int i = _array.Length - 1; i >= 0; i--)
            {
                _array[i] -= deltaTime;
                if (_array[i] < 0)
                    _array[i] = 3;
            }
        }
        else
        if (one)
            One();
        else
            Two();
    }

    public void Two()
    {
        var jobhandle = new DecreaseJob()
        {
            counters = _floats,
            deltaTime = Time.deltaTime
        }.Schedule(Count, 64);

        var second = new ResetJob()
        {
            counters = _floats,
            init = 3
        }.Schedule(Count, 64, jobhandle);

        second.Complete();
    }

    public void One()
    {
        var jobhandle = new DecreaseAndReset()
        {
            counters = _floats,
            deltaTime = Time.deltaTime,
            init = 3
        }.Schedule(Count, 64);

        jobhandle.Complete();
    }

    private void OnDestroy()
    {
        _floats.Dispose();
    }
}

Am I doing something wrong in my jobs?

I am using Unity 2018.2.3f1.

edit:
I have added burst compiler and now things have changed.
A normal way is 5 times slower, two jobs vs one are comparable.
But still why without burst it is that much slower it uses 8 cores so that alone should give a nice boost.

Are you profiling in the editor or on the device? In editor there is a heavy cost on safety system check every time you access ECS data element/NativeArray. In your main thread code it is just managed array so no safety check there.

Edit : Might be related https://gametorrahod.com/unity-ecs-native-containers-performance-test-aca8964ba80c

1 Like

I have tested it in the editor, but there was no burst so I could not disable safety checks. It is not an ECS it is MonoBehaviour with jobs. I will disable burst and will profile in build and see what will be the result.

You are completing the jobs immediately after scheduling which just forces them to run on the main thread.

You should have a JobHandle field and then in update you Complete() that then schedule it again.

You still get burst because burst is tied to a job context not a thread.

You should use IJobParallelForBatch for this. But this doesn’t change alot (~5-10%) which makes me think @5argon is right.

Replacing your execute code in DecreaseAndReset by this:

var current = counters[index];
current -= deltaTime;
if (current < 0)
    counters[index] = init;
else
    counters[index] = current;

and removeing one index access makes it ~30% faster. Seems to be nearly 100% safty checks of NativeArray job usage.

Some Tests:

Editor:
Normal: 25ms
One and Two: somewhere around 100ms

Editor Burst:
Normal: 25ms
One: 2.7ms
Two: 4.0ms

Debug Build Mono:
Normal: 25ms
One: 19ms
Two: 22ms

Release Build Mono:
Normal: 25ms
One: 19ms
Two: 22ms

ReleaseBuild Mono + Burst:
Normal: 25ms
One: 1.4ms
Two: 2.4ms

Release Build IL2CPP:
Normal: 9.5ms
One: 1.44ms
Two: 2.5ms

ReleaseBuild IL2CPP + Burst:
Normal: 9.5ms
One: 1.4ms
Two: 2.3ms

Conclusion: Mono Release is still slower than expected, probably because of function call overhead. IL2CPP get’s pretty much what you would expect (4core + HT in this case) probably due to inlining and much more optimized native code. Burst doesn’t change that much for IL2CPP in this particular case but fixes the bad code generated by Mono.

1 Like

He uses IJobParallelFor. The main thread will wait until everything is done but it still uses all CPU cores.

Exactly I have expected that this alone will give good bust.
But as always newer jump to conclusion when profiling in the editor is a good advice :slight_smile:

I did not use IJobParallelForBatch because I do not have access to it. Is it in ECS package?
I was just testing jobs performance and the main point of this tests was to check if two smaller jobs (one without if statement) will be better than one in this case.

Thx for your help :slight_smile:

Never actually tested that but it makes sense.

It is distributed across all cores, it can be seen in profiler. I’ve used Complete on an update to make a more fair comparison.

I think the safety check is on NativeArray struct and so it does not matter if you use ECS system or MonoBehaviour.

Also like @julian-moschuering said when building to device if you are using Android + Mono (for faster build) you still have something more to gain by using IL2CPP because it enables “fast path” ( https://discussions.unity.com/t/704160/10 )

The “Jobs > Leak Detection and/or Enable Burst Safety Check” seems to be designed to disable this and enable you to profile truthfully from editor, but I remembered turning them on-off and see no difference. Not sure if it works in the current version now or not.

It only affects the bursted code. So if you have a lot of loops on the main thread accessing NativeArray there is no way to make that fast in the editor.

Of course in reality why would you use NativeArray and then write main thread code?

1 Like

There is no answer to this question as a blanket rule. It totally depends on the data you’re working with. As a rule of thumb, this most straightforward approach is to start with simpler split jobs and merge them as you discover the data access can be shared.

If you want to micro-benchmark a specific case like this you’ll want to make sure the code is Burst-compiled unless you’re specifically looking at the performance of mono (or il2cpp).

You can also create a PerformanceTest, which is very much like a UnitTest which you can run from the editor.

e.g. A similar test to yours:

using Unity.Burst;
using Unity.Collections;
using Unity.Collections.LowLevel.Unsafe;
using Unity.Jobs;
using Unity.Mathematics;
using Unity.PerformanceTesting;

namespace Unity.Entities.PerformanceTests
{
    public class NativeArrayIterationPerformanceTests
    {
        [BurstCompile(CompileSynchronously = true)]
        struct AddDeltaAndReset : IJobParallelFor
        {
            public NativeArray<int> Source;
            public int Delta;
            public int ResetThreshold;

            public void Execute(int index)
            {
                var projectedValue = Source[index] + Delta;
                Source[index] = math.@select(0, projectedValue, projectedValue < ResetThreshold);
            }
        }
       
        [BurstCompile(CompileSynchronously = true)]
        unsafe struct AddDeltaAndResetPtr : IJobParallelFor
        {
            [NativeDisableUnsafePtrRestriction]
            public int* Source;
            public int Delta;
            public int ResetThreshold;

            public void Execute(int index)
            {
                var projectedValue = Source[index] + Delta;
                Source[index] = math.@select(0, projectedValue, projectedValue < ResetThreshold);
            }
        }
       
        [BurstCompile(CompileSynchronously = true)]
        struct AddDelta : IJobParallelFor
        {
            public NativeArray<int> Source;
            public int Delta;

            public void Execute(int index)
            {
                var projectedValue = Source[index] + Delta;
                Source[index] = projectedValue;
            }
        }
       
        [BurstCompile(CompileSynchronously = true)]
        struct Reset : IJobParallelFor
        {
            public NativeArray<int> Source;
            public int ResetThreshold;

            public void Execute(int index)
            {
                var value = Source[index];
                Source[index] = math.@select(0, value, value < ResetThreshold);
            }
        }
           
        void SingleIterationWork(NativeArray<int> source, int delta, int resetThreshold)
        {
            var addDeltaAndResetJob = new AddDeltaAndReset
            {
                Source = source,
                Delta = delta,
                ResetThreshold = resetThreshold
            };
            var addDeltaAndResetJobHandle = addDeltaAndResetJob.Schedule(source.Length, 1024);
            addDeltaAndResetJobHandle.Complete();
        }
       
        unsafe void SingleIterationWorkPtr(NativeArray<int> source, int delta, int resetThreshold)
        {
            var sourcePtr = (int*)source.GetUnsafePtr();
            var addDeltaAndResetJob = new AddDeltaAndResetPtr
            {
                Source = sourcePtr,
                Delta = delta,
                ResetThreshold = resetThreshold
            };
            var addDeltaAndResetJobHandle = addDeltaAndResetJob.Schedule(source.Length, 1024);
            addDeltaAndResetJobHandle.Complete();
        }
       
        void SplitIterationWork(NativeArray<int> source, int delta, int resetThreshold)
        {
            var addDeltaJob = new AddDelta
            {
                Source = source,
                Delta = delta
            };
            var addDeltaJobHandle = addDeltaJob.Schedule(source.Length, 1024);
            var resetJob = new Reset
            {
                Source = source,
                ResetThreshold = resetThreshold
            };
            var resetJobHandle = addDeltaJob.Schedule(source.Length, 1024, addDeltaJobHandle);
            resetJobHandle.Complete();
        }

        [PerformanceTest]
        public void SingleVsSplitIterationJob()
        {
            var count = 10 * 1024 * 1024;
            var source = new NativeArray<int>(count, Allocator.TempJob);
            var delta = 1;
            var resetThreshold = 1;

            // Mask sure Burst is compiled.
            SingleIterationWork(source, delta, resetThreshold);
            SingleIterationWorkPtr(source, delta, resetThreshold);
            SplitIterationWork(source, delta, resetThreshold);
           
            var sampleSingle = new SampleGroupDefinition("SingleIteration");
            var sampleSinglePtr = new SampleGroupDefinition("SingleIterationPtr");
            var sampleSplit = new SampleGroupDefinition("SplitIteration");

            using (Measure.Scope(sampleSingle))
            {
                SingleIterationWork(source, delta, resetThreshold);
            }
           
            using (Measure.Scope(sampleSinglePtr))
            {
                SingleIterationWorkPtr(source, delta, resetThreshold);
            }
           
            using (Measure.Scope(sampleSplit))
            {
                SplitIterationWork(source, delta, resetThreshold);
            }
               
            source.Dispose();
        }
    }
}

The asmdef in this case includes the following references:

    "references": [
        "Unity.PerformanceTesting",
        "Unity.Entities",
        "Unity.Mathematics",
        "Unity.Jobs",
        "Unity.Burst",
        "Unity.Collections"
    ],

When you run the test, you’ll see the output timing. On my particular machine, the above looks like:

SingleIteration 3.07 Millisecond
SingleIterationPtr 3.15 Millisecond
SplitIteration 6.54 Millisecond

And the win as expected in this particular case is with a single iteration. (Also tested here is a comparison of NativeArray versus raw pointer - which are, also as expected, basically the same.)

7 Likes

I did not know that there is a performance test tool good to know for the future :).
I have added Unity.PerfrmanceTesting to asmdef file but I get an error :
Assembly has reference to non-existent assembly ‘Unity.PerformanceTesting’ (Assets/Tests/Tests.asmdef)

Do I have to add some package? I have Burst 0.2.4.-preview.25 and Entities 0.0.12-preview.8.

You need the performance testing package. This is the one we use internally for ECS:
“com.unity.test-framework.performance”: “0.1.31-preview”,

Including it in packages manifest.json should fix the issue. Afterwards you will need to manually edit assembly definition files to include this Unity.PerformanceTesting assembly.

You can find some documentation on package readme.

I have added package to the manifest and Unity.PerformanceTesting to asmdef file but Unity.PerformanceTesting namespace is not available.

Add perf package to testables:

      "testables": [
            "com.unity.test-framework.performance"
      ],

if it still gives some errors then add these modules to manifest.json dependencies

        "com.unity.modules.jsonserialize": "1.0.0",
        "com.unity.modules.unitywebrequestwww": "1.0.0",
        "com.unity.modules.unitywebrequest": "1.0.0",
        "com.unity.modules.vr": "1.0.0"
1 Like

Ok I managed to make it work :slight_smile: thx.
I like the fact that I can just wrap all sample groups in a loop and get things like median, min, max :slight_smile: