JobSystem: When to use which way?

Hello,

I tried to figure out how the unity job system works and when to use it. I made a basic performance test doing one million calculations using four different methods: TestMe1A(), TestMe1B(), TestMe2() and TestMe3(). They take 1300ms, 2000ms, 4ms and 17ms and lead to the same output. The amount of time taken by the last Method (TestMe3()), where no job has been used is remarkable.

Because of the variety of the amount of time taken, my basic questions are if there is anything wrong with my code and how to know which way to choose, particularly with regard to three different IJob Methods.

Here is the code:

using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using UnityEngine;
using Unity.Jobs;
using Unity.Collections;
using Unity.Burst;

public class StartGame : MonoBehaviour
{
    private Stopwatch stopwatch;

    // Start is called before the first frame update
    void Start()
    {
        stopwatch = new Stopwatch();

        TestMe1A();
        TestMe1B();
        TestMe2();
        TestMe3();
    }

    int num = 1000000;

    // IJob
    public void TestMe1A()
    {
        stopwatch.Start();
        NativeArray<float> _results = new NativeArray<float>(num, Allocator.Temp);
        JobHandle jH = new JobHandle();

        for (int i = 0; i < num; i++)
        {
            float _a = i;
            float _b = i + 1;
            float _c = 0;

            TestJob theJob = new TestJob
            {
                results = _results,
                a = _a,
                b = _b,
                c = _c,
                i = i,
            };

            jH = theJob.Schedule();
        }

        jH.Complete();

        stopwatch.Stop();
        UnityEngine.Debug.LogFormat("IJob completed after {0} ms. The results: {1}", stopwatch.ElapsedMilliseconds, string.Join(", ", _results)); // 1300+ ms
        stopwatch.Reset();
        _results.Dispose();
    }


    // IJob Array
    public void TestMe1B()
    {
        stopwatch.Start();
        NativeArray<float> _results = new NativeArray<float>(num, Allocator.Temp);

        NativeArray<JobHandle> jobHandleArray = new NativeArray<JobHandle>(num, Allocator.Temp);

        for (int i = 0; i < num; i++)
        {
            float _a = i;
            float _b = i + 1;
            float _c = 0;

            TestJob theJob = new TestJob
            {
                results = _results,
                a = _a,
                b = _b,
                c = _c,
                i = i,
            };
            jobHandleArray[i] = theJob.Schedule();
        }
      
        JobHandle.CompleteAll(jobHandleArray);

        stopwatch.Stop();
        UnityEngine.Debug.LogFormat("IJob Array completed after {0} ms. The results: {1}", stopwatch.ElapsedMilliseconds, string.Join(", ", _results)); // 2000+ ms
        stopwatch.Reset();
        jobHandleArray.Dispose();
        _results.Dispose();
    }

    // IJobParallelFor
    public void TestMe2()
    {
        stopwatch.Start();
        NativeArray<float> _results_ = new NativeArray<float>(num, Allocator.Temp);
        JobHandle jobHandle = new JobHandle();

            TestJob2 theJob2 = new TestJob2
            {
                results = _results_,
            };
            jobHandle = theJob2.Schedule(num, 64);

        jobHandle.Complete();
        stopwatch.Stop();
        UnityEngine.Debug.LogFormat("IJobParallelFor completed after {0} ms. The results: {1}", stopwatch.ElapsedMilliseconds, string.Join(", ", _results_)); // 4+ ms
        stopwatch.Reset();
        _results_.Dispose();
    }

    // No Job
    public void TestMe3()
    {
        stopwatch.Start();
        float[] theResults = new float[num];

        for (int i = 0; i < num; i++)
        {
            float _a = i;
            float _b = i + 1;
            float _c = _a / _b;
            theResults[i] = _c;

        }
        stopwatch.Stop();
        UnityEngine.Debug.LogFormat("NoJob completed after {0} ms. The results: {1}", stopwatch.ElapsedMilliseconds, string.Join(", ", theResults)); // 17+ ms
        stopwatch.Reset();
    }

    [BurstCompile]
    public struct TestJob : IJob
    {
        public NativeArray<float> results;
        public float a;
        public float b;
        public float c;
        public int i;

        public void Execute()
        {
            c = a / b;
            results[i] = c;
        }
    }

    [BurstCompile]
    public struct TestJob2 : IJobParallelFor
    {
        public NativeArray<float> results;

        public void Execute(int i)
        {
            float a = i;
            float b = i + 1;
            float c;

            c = a / b;
            results[i] = c;
        }
    }
}

Thanks for your feedback!

In your first two attempts you create 1 million jobs that all do one trivial thing. So, even if the job system is pretty efficient, managing those jobs alone becomes by far the overwhelming majority of calculations necessary here.

Your test without multithreading took 17ms, your test using the appropriate IJobParallelFor took 4ms. That’s about the time difference i would expect for utilizing multithreading vs not doing so.

I feel like your test does not make it easy to understand the difference and when to use which job type. As a rule of thumb, use the IJobParallelFor when you can use it. For example, if you have an array of some length and need to apply some calculation to each value stored there individually. Similar to what you are doing here.
But now imagine you want to add up all the values in the array such that each element now stores the value of [ i] + [i-1]. You cant (meaningfully) do this in parallel since this would require knowledge about previously finished cycles (you cant calculate [ i] + [i-1] without knowing the value of [i-1], which requires the value for [i-2] to be calculated first and so on). So you may as well use a normal IJob for that and execute it in the old fashioned sequential way.

What’s the difference? For one, if you can parallelize the calculations over the entire array, then assuming your CPU has nothing else to do, the calculations for an IJobParallelFor will be roughly faster by the amount of worker threads deployed by Unity (which is based on your core count). Using an IJob runs it sequentially, with the only advantage being that it runs on some free thread, meaning a lot of these simple IJobs could still run in parallel, but a single one would not be as fast as an IJobParallelFor with the same workload. I hope these examples make it a bit more clear for what the difference between the jobs is or when to use them.

Edit: If you are just confused for why the IJob in your example is that much slower; then as i mentioned it’s because of the amount of jobs. If you instead create 4 jobs which each manage 1/4 of the calculations, you will see a huge difference.

Thank you very much for your detailled answer. One last thing I’d like to figure out is when to use TestMe1A() over TestMe1B()? Or in other words: When to use one JobHandle and call myJob.Schedule(); and maybe myJobHandle.Complete(); instead of using an NativeArray<JobHandle> myJobHandleArray and calling JobHandle.CompleteAll(myJobHandleArray);?

It’s been some time since i last used DOTS, but if my memory serves me well these were just convenient ways to call complete on multiple job handles. Imagine you schedule a couple jobs per frame and want to wait for them all to finish at the end. You would then have to write job1.Complete(), job2.Complete(), …, jobN.Complete() - or you could just put them in an array and write allJobs.CompleteAll().
Maybe the documentation has some additional information, but if it’s still as bad as when i worked with DOTS it’s basically undocumented anyways :smile:

I wouldn’t have to, because in my first example I do

public void TestMe1A()
    {
        NativeArray<float> _results = new NativeArray<float>(num, Allocator.Temp);
        JobHandle jH = new JobHandle();
        for (int i = 0; i < num; i++)
        {
            // initialize multiple job structs of count = num here
            // ...
            TestJob theJob = new TestJob {...}

            jH = theJob.Schedule();   // schedule each one after another
        }
        jH.Complete();   // complete them all at once

        _results.Dispose();
    }

In this case, I initialize job structs in a for-loop and schedule them within that loop, one after antother.
I complete them all at once by calling jH.Complete();

In this case, all these single jobs were put on an nativeArray and on the JobHandle .CompleteAll(nativeArray); was called to complete the array of Jobs.

public void TestMe1B()
    {
        NativeArray<float> _results = new NativeArray<float>(num, Allocator.Temp);
        NativeArray<JobHandle> jobHandleArray = new NativeArray<JobHandle>(num, Allocator.Temp);
        for (int i = 0; i < num; i++)
        {
           
            // initialize multiple job structs of count = num here
            // ...
       
            jobHandleArray[i] = theJob.Schedule();
        }
   
        JobHandle.CompleteAll(jobHandleArray);
       
        _results.Dispose();
        jobHandleArray.Dispose();

Why do these different Methods for doing the same thing exist (I’m just wondering because I would have to find out which one is faster every time I use Jobs).

You are scheduling a job and then calling on it to complete almost immediately - what performance improvements could you reasonably expect to see? You’re essentially making the main thread wait on job completion which is arguably no different to doing it on the main thread.

You should schedule jobs ASAP and complete as late as possible.

This.

Sometimes actually scheduling a job and returning the result is more expensive that running on the main thread (example: asking a job to calculate 2 + 2)

How you schedule jobs, complete jobs, and utilize that data can have a noticeable impact on performance.

Take this example. Same work - different schedule and completion cycle.

For example, where possible temp jobs should be called to complete in late update.

TempJobs, where possible, should only be called to complete after 3/4 frames (you get 4 frames for a temp job).

IJobParralelFor should be used when you know the job data can be split into different slices without dependencies.

Consider a list of 10,000 integers and you want to go through each one and “+1” to each number.

IJobParralelFor would split those 10,000 tasks evenly between available threads (also a thread that finishes its work early can steal work from another thread).

If your data can not be split between threads than a standard job should suffice.

2 Likes

I meant to write JobHandles instead of just Jobs in my last post.
In reality you oftentimes have jobs that rely on the completion of other jobs before running. You may have a few of those JobHandles created per frame and could add them to a NativeArray to make managing them more easy. Having different JobHandles in a nativearray also allows for other actions, such as merging their dependencies.
https://docs.unity3d.com/2020.1/Documentation/Manual/JobSystemJobDependencies.html

If your only concern here is performance… there wont be a noticable difference. The entire DOTS architecture is very efficient, and as long as you dont actively bottleneck it, everything else you can do for performance optimizations pales in comparison. If anything, make sure the code inside your jobs is efficient. Afterall, that’s the “expensive workload” you attempt to make run faster.

Imagine you have a factory producing some kind of goods. There are multiple buildings, a lot of workers, and different tasks - some of which rely on each other. Those would be your jobs in this example. Any performance difference that may or may not exist for scheduling the jobs using one or the other method, would be equal to the manager of that factory taking 1 vs. 2 seconds to press the big red “start” button in the morning. In the grand scheme of things, this wont affect the production of the factory at all.

1 Like