Unity job system is not working as expected

I’m learning unity DOTS technology and tried to create a small project which instantiates some prefab and moves it to the final target. project is simple but my issue is the performance. my code is running faster in old mono implementation then the new DOTS implementation. how is this possible? I know that in some situations may mono run faster but in my case it should not but still, I’m not getting where I’m doing wrong.
I have attached my code and some screenshots please someone helps me to find where I’m doing something wrong?

using System.Collections.Generic;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Jobs;
using Unity.Collections;
using Unity.Jobs;
using Unity.Burst;
using Unity.Entities;
using Unity.Mathematics;

public class CubeMovementJob : MonoBehaviour
{
    public Transform gameObjectToBeInstatiated;
    public int count = 2000;
    public float speed = 20;
    public int spawnRange = 50;
    private Transform[] transforms;
    private float3[] targets;
    private List<Transform> cubes;
    public bool useJob;

    // Start is called before the first frame update
    void Start()
    {
        transforms = new Transform[count];
        cubes = new List<Transform>();

        for (int i = 0; i < count; i++)
        {
            Transform objTransform = Instantiate(gameObjectToBeInstatiated, Vector3.zero, Quaternion.identity);
            cubes.Add(objTransform);
            transforms[i] = objTransform;
        }

        targets = new float3[transforms.Length];
        for (int i = 0; i < targets.Length; i++)
        {
            targets[i] = new float3(UnityEngine.Random.Range(-spawnRange, spawnRange), UnityEngine.Random.Range(-spawnRange, spawnRange), UnityEngine.Random.Range(-spawnRange, spawnRange));
        }
    }

    // Update is called once per frame
    void Update()
    {
        if (useJob == true)
        {
            NativeArray<float3> nativeTargets = new NativeArray<float3>(targets, Allocator.TempJob);
            TransformAccessArray transAccArr = new TransformAccessArray(transforms);

            MovementJob job = new MovementJob
            {
                deltaTime = Time.deltaTime,
                Targets = nativeTargets,
                Speed = speed
            };

            JobHandle newJobHandle = job.Schedule(transAccArr);

            newJobHandle.Complete();
            transAccArr.Dispose();
            nativeTargets.Dispose();
        }
        else
        {
            for (int i = 0; i < targets.Length; i++)
            {
                cubes[i].position = Vector3.Lerp(cubes[i].position, targets[i], Time.deltaTime / speed);
            }

            // This is just to simply give some extra task in order to check the performance difference.
            float value = 0f;
            for (int i = 0; i < count; i++)
            {
                value = math.exp10(math.sqrt(value));
            }
        }
    }
}

[BurstCompile]
public struct MovementJob : IJobParallelForTransform
{
    public float deltaTime;
    public float Speed;
    public NativeArray<float3> Targets;
    public void Execute(int index, TransformAccess transform)
    {
        transform.position = Vector3.Lerp(transform.position, Targets[index], deltaTime / Speed);

        // This is just to simply give some extra task in order to check the performance difference.
        float value = 0f;
        for (int i = 0; i < Targets.Length; i++)
        {
            value = math.exp10(math.sqrt(value));
        }
    }
}



Above Images: Framerate without job system (avg 135fps).



Above Image: Framerate with job system implementation (avg 108fps).

Even though my movementjob is running parallelly why I’m not getting the extra performance at least more than the old mono behaviors implementations?

You should profile in a build to get the real numbers, not inside the Editor.

Try deactivate safety checks and burst debugger.

And the code is not the same in the 2 scenarios, in mono you run the “extra cost” loop once, in the job for each transform!

1 Like

Ya but even in editor it cannot show this much performance error and also here I’m looking for a massive performance change.
and also just now I tested on mobile the result is the same as in editor I mean almost similar still job system implementation is lagging then older mono implementation.

I tried that but it dint effect anything.

In your profile the actual work seems to really fast at around 0.08ms. I guess the setup code is slower.

  1. Add profiling sections
  2. Try using the TransformAccessArray constructor receiving a Transform[ ] instead of using Add
  3. Reuse your TransformAccessArray. (TransformAccessArray.SetTransforms)

The mono code should this:

for (int i = 0; i < targets.Length; i++)
{
        cubes[i].position = Vector3.Lerp(cubes[i].position, targets[i], Time.deltaTime / speed);

        // This is just to simply give some extra task in order to check the performance difference.
        float value = 0f;
        for (int i = 0; i < count; i++)
        {
            value = math.exp10(math.sqrt(value));
        }
}
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Jobs;
using Unity.Collections;
using Unity.Jobs;
using Unity.Burst;
using Unity.Entities;
using Unity.Mathematics;

public class CubeMovementJob : MonoBehaviour
{
    public Transform gameObjectToBeInstatiated;
    public int count = 2000;
    public float speed = 20;
    public int spawnRange = 50;
    private Transform[] transforms;
    private float3[] targets;
    private List<Transform> cubes;
    public bool useJob;
    TransformAccessArray transAccArr;
    // Start is called before the first frame update
    void Start()
    {
        transforms = new Transform[count];
        cubes = new List<Transform>();

        for (int i = 0; i < count; i++)
        {
            Transform objTransform = Instantiate(gameObjectToBeInstatiated, Vector3.zero, Quaternion.identity);
            cubes.Add(objTransform);
            transforms[i] = objTransform;
        }
        transAccArr = new TransformAccessArray(transforms);
        targets = new float3[transforms.Length];
        for (int i = 0; i < targets.Length; i++)
        {
            targets[i] = new float3(UnityEngine.Random.Range(-spawnRange, spawnRange), UnityEngine.Random.Range(-spawnRange, spawnRange), UnityEngine.Random.Range(-spawnRange, spawnRange));
        }
    }

    // Update is called once per frame
    void Update()
    {
        if (useJob == true)
        {
            NativeArray<float3> nativeTargets = new NativeArray<float3>(targets, Allocator.TempJob);

            MovementJob job = new MovementJob
            {
                deltaTime = Time.deltaTime,
                Targets = nativeTargets,
                Speed = speed
            };

            JobHandle newJobHandle = job.Schedule(transAccArr);

            newJobHandle.Complete();
            nativeTargets.Dispose();
        }
        else
        {
            for (int i = 0; i < targets.Length; i++)
            {
                cubes[i].position = Vector3.Lerp(cubes[i].position, targets[i], Time.deltaTime / speed);
            }

            // This is just to simply give some extra task in order to check the performance difference.
            float value = 0f;
            for (int i = 0; i < count; i++)
            {
                value = math.exp10(math.sqrt(value));
            }
        }
    }

    private void OnDisable()
    {
        transAccArr.Dispose();
    }
}

[BurstCompile]
public struct MovementJob : IJobParallelForTransform
{
    public float deltaTime;
    public float Speed;
    public NativeArray<float3> Targets;
    public void Execute(int index, TransformAccess transform)
    {
        transform.position = Vector3.Lerp(transform.position, Targets[index], deltaTime / Speed);

        // This is just to simply give some extra task in order to check the performance difference.
        float value = 0f;
        for (int i = 0; i < Targets.Length; i++)
        {
            value = math.exp10(math.sqrt(value));
        }
    }
}

Yess finally reusing TransformAccessArray boosted my performance, but now I’m getting avg 150fps using the job system before it was avg 108fps. thank you for the advice.

But still, I was hoping for some extra performance as my jobs are running parallelly across the multiple cores, because still, my old mono implementation and my job system implementation doesn’t have any much performance gap, is this the final performance gain for this code is there anything still I can optimize so that I can get that expected boost. because in some videos I saw that code running in old mon implementation was taking around 80ms but after implementing a job suddenly it just took around 1ms for that same work. so I was expecting that kind of performance boost here.

If you are looking for the best possible performance you should take a look into the complete DOTS workflow (ECS + Jobs + Burst).

As I mentioned already, your mono has less work as the job version, so it is not comparable.

You could also try increasing the number of objects.

How? I dint get that because in both job and mono the same set code is getting executed. but In mono 2 for loops are running sequentially one after another but the job is running parallelly but still why my job is not getting that high-performance boost?

There is not alot to gain, the majority of the frame time is taken up by other stuff:
You had a frametime of 7.5 ms and the profiler shows 1.3ms in your C# implementation. Getting that to 0 would give you 160fps. These times are much too low to be safe in editor.
Increase your object count to 20k-200k and use objects without renderers if you want to have a more accurate performance comparison.

@runner78 is right, the work isn’t the same as your ‘extra task’ is acutally executed 2000 times in the jobified version vs 1. in the C# version. But the bursted code is so fast this isn’t really important here. The parallelfor jobs are actually done with all the work after only 0.08ms.

ya, I get that. but please don’t mistake me I’m still not satisfied with the performance in job+burst in my code. I think I’m doing something wrong or can someone give the full technical explanation of why my job performance is similar to the mono performance even though job is running parallelly?

okay, I got your point my second for loop is running in all the 2000 jobs but in mono only once it is running for the update. but I have removed that also now code looks like this

using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Jobs;
using Unity.Collections;
using Unity.Jobs;
using Unity.Burst;
using Unity.Entities;
using Unity.Mathematics;

public class CubeMovementJob : MonoBehaviour
{
    public Transform gameObjectToBeInstatiated;
    public int count = 2000;
    public float speed = 20;
    public int spawnRange = 50;
    private Transform[] transforms;
    private float3[] targets;
    private List<Transform> cubes;
    public bool useJob;
    TransformAccessArray transAccArr;
    // Start is called before the first frame update
    void Start()
    {
        transforms = new Transform[count];
        cubes = new List<Transform>();

        for (int i = 0; i < count; i++)
        {
            Transform objTransform = Instantiate(gameObjectToBeInstatiated, Vector3.zero, Quaternion.identity);
            cubes.Add(objTransform);
            transforms[i] = objTransform;
        }

        transAccArr = new TransformAccessArray(transforms);
        targets = new float3[transforms.Length];

        for (int i = 0; i < targets.Length; i++)
        {
            targets[i] = new float3(UnityEngine.Random.Range(-spawnRange, spawnRange), UnityEngine.Random.Range(-spawnRange, spawnRange), UnityEngine.Random.Range(-spawnRange, spawnRange));
        }
    }

    // Update is called once per frame
    void Update()
    {
        if (useJob == true)
        {
            NativeArray<float3> nativeTargets = new NativeArray<float3>(targets, Allocator.TempJob);

            MovementJob job = new MovementJob
            {
                deltaTime = Time.deltaTime,
                Targets = nativeTargets,
                Speed = speed
            };

            JobHandle newJobHandle = job.Schedule(transAccArr);

            newJobHandle.Complete();
            nativeTargets.Dispose();
        }
        else
        {
            for (int i = 0; i < targets.Length; i++)
            {
                cubes[i].position = Vector3.Lerp(cubes[i].position, targets[i], Time.deltaTime / speed);
            }
        }
    }

    private void OnDisable()
    {
        transAccArr.Dispose();
    }
}

[BurstCompile]
public struct MovementJob : IJobParallelForTransform
{
    public float deltaTime;
    public float Speed;
    public NativeArray<float3> Targets;
    public void Execute(int index, TransformAccess transform)
    {
        transform.position = Vector3.Lerp(transform.position, Targets[index], deltaTime / Speed);
    }
}

but still, both my mono performance and job+burst performance are almost the same just my jobs+burst is leading by just 10-15fps on an average.

Ignore your fps. Open the profiler and compare the time spend in CubeMovementJob.Update. That is what you are currently interested in.

If you reduce the amount, you will get to a point where mono is getting faster as jobs.
If you increase the amount, you will see the real performance boost for burst.

On top of what other people have said, while continuing to use gameobjects you wont see significant improvements compared to ditching them and only manipulating entities. Being able to use gameobjects with dots is more of a compatibility and ease of use thing rather than performance goal for unity. If you got a 80x speedup just using jobs to access gameobjects there would be a lot less impetus for unity to roll out the dots native ecosystem. That video you saw most likely compares pure entities to gameobjects, although if it doesn’t I’d surely like to see it.

If you aren’t ready for dots, but want to be able to take advantage of the jobs/burst system, just create some nativearrays to handle position, rotation, scale, matrix4x4 (you need a job to output to the matrix4x4), skip gameobjects and transforms… then you can draw those with drawmeshinstanced on the main thread and manipulate the position/rotation in another job… you will see ridiculous performance (possibly ludicrous speed). I’ve got a simulation that outputs a bunch of AIs with collision and pathing, targetfinding and animation running with 100,000 actors at 50-60FPS on a 4 core machine, without using DOTS.

If you need to debug just have a gameobject with a script that outputs or draws data/lines on the scene view to debug positions in realtime, obviously that will be a performance hit but its something you would only turn on to debug a couple thousand objects, you wouldn’t want that output on more than 1-2 thousand at a time.

1 Like

5376201--544404--upload_2020-1-16_10-15-7.png
Above image: performance with job+burst.
5376201--544407--upload_2020-1-16_10-17-58.png
Above image: performance in old mono implementation.

I got your point, but fps is directly proportional to profiler performance stats, right?
I can see that in profiler my job+burst implementation taking less time to execute a script then in mono. but what else is taking extra time? as you can see in job+burst implementation executing the script is around 0.40ms but in mono, it is taking 1ms. so there is almost 0.60ms performance increment in job+burst implementation but why still I’m getting fps almost similar to the mono implementation itself. is stats have any error (I’m excluding editor loop here).

my object is just unity primitive cube and a small texture I have added nothing else.

with the same implementation, I have tried with instantiating 10000 objects but in both mono and burst+job implementation I got the same fps results in my pc, it was around 25fps in both jobs and without a job test.
Then I have tried with 100000 objects there are also I got the same result around 3fps with or without a job.