Job is slower than standard code. Am I doing something wrong?

I am digging into the jobs system and I am having some problems getting the performance gains that I am expecting. I don’t know if I am approaching this wrong or if the task I am trying to jobify isn’t appropriate. Here is what I have so far.

This is the standard code.

                foreach (Cell cell in connectingCellWorkingList)
                {
                    float cost = Vector3.Distance(position, cell.Coordinate);
                    cost += Vector3.Distance(cell.Coordinate, connectedSection.EuclideanCenter);
               
                    if(cost >= lowestCost)
                        continue;

                    bestCell = cell;
                    lowestCost = cost;
                }
public Cell GetBestConnectingCell(Vector3 startPoint, Vector3 endPoint, List<Cell> potentialCells)
        {
            //Setup data for the jobs to process
            float3 origin = startPoint.ConvertToFloat3();
            float3 terminus = endPoint.ConvertToFloat3();
            NativeArray<float3> input = new (potentialCells.Count, Allocator.TempJob, NativeArrayOptions.UninitializedMemory);
            NativeArray<int> output = new(1, Allocator.TempJob, NativeArrayOptions.UninitializedMemory);
           
            for (var i = 0; i < potentialCells.Count; i++)
            {
                input[i] = potentialCells[i].Coordinate.ConvertToFloat3();
            }
           
            //Build the job
            var evaluateConnectionOriginJob = new EvaluateConnectionOriginsJob
            {
                StartPoint = origin,
                EndPoint = terminus,
                Input = input,
                Output = output
            };

            //Schedule the job
            JobHandle jobHandle = evaluateConnectionOriginJob.Schedule();
            jobHandle.Complete();

            //Get result
            int cellIndex = output[0];
           
            //Cleanup native arrays 
            input.Dispose();
            output.Dispose();
           
            //Return result
            return potentialCells[cellIndex];
        }

    [BurstCompile(CompileSynchronously = true)]
    public struct EvaluateConnectionOriginsJob : IJob
    {
        [ReadOnly] public float3 StartPoint;
        [ReadOnly] public float3 EndPoint;
        [ReadOnly] public NativeArray<float3> Input;
        [WriteOnly] public NativeArray<int> Output;

        public void Execute()
        {
            var lowestCost = float.MaxValue;
            var lowestIndex = int.MaxValue;
           
            for (var i = 0; i < Input.Length; i++)
            {
                float cost = math.distance(StartPoint, Input[i]) + math.distance(Input[i], EndPoint);
                if(cost >= lowestCost)
                    continue;

                lowestCost = cost;
                lowestIndex = i;
            }

            Output[0] = lowestIndex;
        }
    }

I tried doing an IJobParrallelFor but I was getting worse results than just doing this single job.

Standard code executes in between 20-35 ticks
Jobified code execute in between 300-500 ticks

First, what is a “tick”?
Second, are you sure that the time to copy the data into native containers for the job is faster than doing the work in the main thread. That’s a fairly light O(n) workload.
Third, it seems like you are including allocating the arrays as part of the job timing. Is this true?
Fourth, are you scheduling or running the job? Jobs sacrifice latency to gain concurrency. If you are immediately completing the job after scheduling, use Run() instead.

You’re doing a lot of ConvertToFloat3() on the main thread. If potentialCells is a Vector3 array you could simply use ReinterpretCast.

Check that Burst Compilation is enabled.
Remove the if (cost >= lowerCost) from the for loop. Do a second loop afterwards that does the compare and update value over a costs array. It is likely to be faster because Burst can vectorize the first loop by removing if.

  • A tick is a 100 nanoseconds see here for more info.
  • About half the processing time is the setup. So just doing the work on the main thread is likely the best bet here.
  • Yes I am.
  • I am having the job complete immediately. I will try using Run and compare them.

Thanks for responding.

  • I have never used ReinterpretCast before I will give that a shot.
  • Burst compilation is enabled.
  • I will try this and compare.

Thanks for responding.

using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using Unity.Burst;
using Unity.Collections;
using Unity.Jobs;
using Unity.Mathematics;
using UnityEngine;
using Debug = UnityEngine.Debug;


public class BurstSmallestDistance : MonoBehaviour
{
    public int
        cellSize = 1,
        iterations = 64,
        batchSize = 32;

    private void Start()
    {
        var test = new BurstDistance();
        BurstDistance.batchSize = batchSize;

        List<BurstDistance.PotentialCell> somePotentialCells = new(cellSize);
        BurstDistance.PotentialCell[] somePotentialCellsArr = new BurstDistance.PotentialCell[cellSize];

        for (int i = 0; i < cellSize; i++)
        {
            var rand = UnityEngine.Random.insideUnitSphere * 100f;
            somePotentialCells.Add(new BurstDistance.PotentialCell(rand));
            somePotentialCellsArr[i] = new BurstDistance.PotentialCell(rand);
        }

        Debug.Log("\nCells: " + cellSize);

        BurstDistance.PotentialCell cell;
        double us;

        test.GetBestConnectingCell(float3.zero, math.float3(50, 50, 50), somePotentialCells, out cell);
        test.GetBestConnectingCellSThread(float3.zero, math.float3(50, 50, 50), somePotentialCells, out cell);
        test.GetBestConnectingCellMThread(float3.zero, math.float3(50, 50, 50), somePotentialCells, out cell);


        Stopwatch sw = new Stopwatch();
        sw.Restart();
        sw.Stop();

        sw.Restart();
        for (int i = 0; i < iterations; i++)
            test.GetBestConnectingCell(float3.zero, math.float3(50, 50, 50), somePotentialCells, out cell);
        sw.Stop();

        us = 1000.0 * 1000 * sw.ElapsedTicks / Stopwatch.Frequency;
        Debug.Log("List\nCells ForEach [Avg Ticks]: " + us / iterations);


        sw.Restart();
        for (int i = 0; i < iterations; i++)
            test.GetBestConnectingCellSThread(float3.zero, math.float3(50, 50, 50), somePotentialCells, out cell);
        sw.Stop();

        us = 1000.0 * 1000 * sw.ElapsedTicks / Stopwatch.Frequency;
        Debug.Log("\nCells SThread [Avg Ticks]: " + us / iterations);


        sw.Restart();
        for (int i = 0; i < iterations; i++)
            test.GetBestConnectingCellMThread(float3.zero, math.float3(50, 50, 50), somePotentialCells, out cell);
        sw.Stop();

        us = 1000.0 * 1000 * sw.ElapsedTicks / Stopwatch.Frequency;
        Debug.Log("\nCells MThread [Avg Ticks]: " + us / iterations);



        sw.Restart();
        for (int i = 0; i < iterations; i++)
            test.GetBestConnectingCell(float3.zero, math.float3(50, 50, 50), somePotentialCellsArr, out cell);
        sw.Stop();

        us = 1000.0 * 1000 * sw.ElapsedTicks / Stopwatch.Frequency;
        Debug.Log("Array\nCells ForEach [Avg Ticks]: " + us / iterations);


        sw.Restart();
        for (int i = 0; i < iterations; i++)
            test.GetBestConnectingCellSThread(float3.zero, math.float3(50, 50, 50), somePotentialCellsArr, out cell);
        sw.Stop();

        us = 1000.0 * 1000 * sw.ElapsedTicks / Stopwatch.Frequency;
        Debug.Log("\nCells SThread [Avg Ticks]: " + us / iterations);


        sw.Restart();
        for (int i = 0; i < iterations; i++)
            test.GetBestConnectingCellMThread(float3.zero, math.float3(50, 50, 50), somePotentialCellsArr, out cell);
        sw.Stop();

        us = 1000.0 * 1000 * sw.ElapsedTicks / Stopwatch.Frequency;
        Debug.Log("\nCells MThread [Avg Ticks]: " + us / iterations);

    }
}

public class BurstDistance
{
    public static int batchSize = 8;

    public struct PotentialCell
    {
        public float3 coordinate;

        public PotentialCell(float3 coord)
        {
            coordinate = coord;
        }
    }

    [BurstCompile(CompileSynchronously = true)]
    public void GetBestConnectingCell(float3 origin, float3 terminus, in List<PotentialCell> potentialCells, out PotentialCell cell)
    {
        cell = potentialCells[0];
        var lowestCost = float.MaxValue;

        foreach (PotentialCell item in potentialCells)
        {
            float cost = math.distance(origin, item.coordinate) + math.distance(terminus, item.coordinate);

            if (cost < lowestCost)
            {
                cell = item;
                lowestCost = cost;
            }
        }
    }

    [BurstCompile(CompileSynchronously = true)]
    public void GetBestConnectingCell(float3 origin, float3 terminus, in PotentialCell[] potentialCells, out PotentialCell cell)
    {
        cell = potentialCells[0];
        var lowestCost = float.MaxValue;

        for (int i = 0; i < potentialCells.Length; i++)
        {
            float cost = math.distance(origin, potentialCells[i].coordinate) + math.distance(terminus, potentialCells[i].coordinate);

            if (cost < lowestCost)
            {
                cell = potentialCells[i];
                lowestCost = cost;
            }

        }
    }


    [BurstCompile(CompileSynchronously = true)]
    public void GetBestConnectingCellSThread(float3 origin, float3 terminus, in List<PotentialCell> potentialCells, out PotentialCell cell)
    {
        //Setup temp data for the jobs to process. Will be disposed automatically due to 'using'

        using var input = new NativeArray<PotentialCell>(potentialCells.ToArray(), Allocator.TempJob);
        using var lowestCost = new NativeReference<float>(float.MaxValue, Allocator.TempJob);
        using var lowestIndex = new NativeReference<int>(int.MaxValue, Allocator.TempJob);
     
        //Build the job
        var evaluateConnectionOriginJob = new EvaluateConnectionOriginJob
        {
            StartPoint = origin,
            EndPoint = terminus,
            Input = input,
            lowestCost = lowestCost,
            lowestIndex = lowestIndex
        };

        //Schedule the job single threaded
        evaluateConnectionOriginJob.Run(potentialCells.Count);

        //Return result
        cell = input[evaluateConnectionOriginJob.lowestIndex.Value];
    }


    [BurstCompile(CompileSynchronously = true)]
    public void GetBestConnectingCellSThread(float3 origin, float3 terminus, in PotentialCell[] potentialCells, out PotentialCell cell)
    {
        //Setup temp data for the jobs to process. Will be disposed automatically due to 'using'

        using var input = new NativeArray<PotentialCell>(potentialCells, Allocator.TempJob);
        using var lowestCost = new NativeReference<float>(float.MaxValue, Allocator.TempJob);
        using var lowestIndex = new NativeReference<int>(int.MaxValue, Allocator.TempJob);

        //Build the job
        var evaluateConnectionOriginJob = new EvaluateConnectionOriginJob
        {
            StartPoint = origin,
            EndPoint = terminus,
            Input = input,
            lowestCost = lowestCost,
            lowestIndex = lowestIndex
        };

        //Schedule the job single threaded
        evaluateConnectionOriginJob.Run(potentialCells.Length);

        //Return result
        cell = input[evaluateConnectionOriginJob.lowestIndex.Value];
    }





    [BurstCompile(CompileSynchronously = true)]
    public void GetBestConnectingCellMThread(float3 origin, float3 terminus, in List<PotentialCell> potentialCells, out PotentialCell cell)
    {
        //Setup temp data for the jobs to process. Will be disposed automatically due to 'using'

        using var input = new NativeArray<PotentialCell>(potentialCells.ToArray(), Allocator.TempJob);
        using var lowestCost = new NativeReference<float>(float.MaxValue, Allocator.TempJob);
        using var lowestIndex = new NativeReference<int>(int.MaxValue, Allocator.TempJob);
     
        //Build the job
        var evaluateConnectionOriginJob = new EvaluateConnectionOriginJob
        {
            StartPoint = origin,
            EndPoint = terminus,
            Input = input,
            lowestCost = lowestCost,
            lowestIndex = lowestIndex
        };
     
        // Schedule the job multi-threaded
        evaluateConnectionOriginJob.Schedule(potentialCells.Count, batchSize).Complete();

        //Return result
        cell = input[evaluateConnectionOriginJob.lowestIndex.Value];
    }



    [BurstCompile(CompileSynchronously = true)]
    public void GetBestConnectingCellMThread(float3 origin, float3 terminus, in PotentialCell[] potentialCells, out PotentialCell cell)
    {
        //Setup temp data for the jobs to process. Will be disposed automatically due to 'using'

        using var input = new NativeArray<PotentialCell>(potentialCells, Allocator.TempJob);
        using var lowestCost = new NativeReference<float>(float.MaxValue, Allocator.TempJob);
        using var lowestIndex = new NativeReference<int>(int.MaxValue, Allocator.TempJob);

        //Build the job
        var evaluateConnectionOriginJob = new EvaluateConnectionOriginJob
        {
            StartPoint = origin,
            EndPoint = terminus,
            Input = input,
            lowestCost = lowestCost,
            lowestIndex = lowestIndex
        };

        // Schedule the job multi-threaded
        evaluateConnectionOriginJob.Schedule(potentialCells.Length, batchSize).Complete();

        //Return result
        cell = input[evaluateConnectionOriginJob.lowestIndex.Value];
    }



    [BurstCompile(CompileSynchronously = true)]
    public struct EvaluateConnectionOriginJob : IJobParallelFor
    {
        [ReadOnly] public float3 StartPoint;
        [ReadOnly] public float3 EndPoint;
        [ReadOnly] public NativeArray<PotentialCell> Input;

        [NativeDisableParallelForRestriction]
        public NativeReference<float> lowestCost;

        [NativeDisableParallelForRestriction]
        public NativeReference<int> lowestIndex;

        public void Execute(int i)
        {
            var coord = Input[i].coordinate;

            float cost = math.distance(StartPoint, coord) + math.distance(EndPoint, coord);

            if (cost < lowestCost.Value)
            {
                lowestCost.Value = cost;
                lowestIndex.Value = i;
            }
        }
    }
}

For a small job, anything below 128 items seems pointless, and multithread seems better than singlethread above 8192. I should delete the list part which converts to an array in the method but does show how expensive using lists is from the start if you are converting to an array to put in a job. At 65536 items (I like using base 2 numbers), the difference is about 25x faster.

8506313--1133114--upload_2022-10-11_21-38-9.png8506313--1133117--upload_2022-10-11_21-38-32.png
8506313--1133120--upload_2022-10-11_21-39-16.png8506313--1133123--upload_2022-10-11_21-40-22.png

If however (in normal terms) you were going to reference the main array multiple times in a frame, you could then take the ‘using’ statement outside of the method, which takes NativeArray as an input:

using (var ptr = new NativeArray(somePotentialCellsArr, Allocator.TempJob))
for (int i = 0; i < iterations; i++)
GetBestConnectingCellMThread(float3.zero, math.float3(50, 50, 50), ptr, out cell);

public static void GetBestConnectingCellMThread(float3 origin, float3 terminus, in NativeArray potentialCells, out PotentialCell cell)

And get even more speed. It all depends on allocating memory and how often you access it before it’s disposed. (‘using’ automatically disposes after the enclosed statement, so no need for Dispose).
8506391--1133135--upload_2022-10-11_22-18-46.png

Getting the performance even higher, unwrapping the distance into a method:

public static void GetDistanceADD(in float IN, in float3 A, in float3 B, out float OUT)
{
OUT = IN + math.sqrt((A.x - B.x) * (A.x - B.x) + (A.y - B.y) * (A.y - B.y) + (A.z - B.z) * (A.z - B.z));
}

and using it like:

float cost = 0;
GetDistanceADD(in cost, A, Input*.coordinate, out cost);*
GetDistanceADD(in cost, B, Input*.coordinate, out cost);*
Makes the normal method 2x faster, and the job 4x faster on a large loop, so the 46x faster than without a job. Anyway I like messing with code to make it faster, Burst seems extreme!
8507030--1133201--upload_2022-10-12_7-33-4.png

Forget about the in out part, was testing, no difference in performance from:

public static float GetDistance(in float3 A, in float3 B)
{
return math.sqrt((A.x - B.x) * (A.x - B.x) + (A.y - B.y) * (A.y - B.y) + (A.z - B.z) * (A.z - B.z));
}

float cost = GetDistance(A, Input_.coordinate) + GetDistance(B, Input*.coordinate);*_