Jobs perfomance slower than single thread.

Hello.
I created a 3D scene with 1000 spheres randomly walking back and forth. There are ground and spheres nothing else. There 2 versions of walking scripts. One that uses transform and the other uses transform with jobs.
I assumed that jobs would obviously be faster but I did something wrong and I need help to figure out mistake.
For now version without jobs perform far better than the one with jobs. On screenshots I use without Burst compiler. It works, it improves ms and fps, but single thread is still faster.

SingleThread Code

void Update()
    {
        if (!Coroutine_delay)        {
            StartCoroutine(NewDestination());
        }
		
        if (1f < Vector3.Distance(myTransform.position, destionationPoint3D))        {
        transform.position = Vector3.MoveTowards(myTransform.position, destionationPoint3D, speed * Time.deltaTime);
        }
	}
IEnumerator NewDestination()
    {
        Coroutine_delay = true;

        lowerLeftAngle = new Vector3(myTransform.position.x +20, myTransform.position.y, myTransform.position.z +20);
        upperRightAngle = new Vector3(myTransform.position.x -20, myTransform.position.y, myTransform.position.z -20);

        destionationPoint3D = new Vector3(UnityEngine.Random.Range(lowerLeftAngle.x, upperRightAngle.x), myTransform.position.y, UnityEngine.Random.Range(lowerLeftAngle.z, upperRightAngle.z));
		
        yield return delayC;
        Coroutine_delay = false;
    }	

Code with Jobs

public class Spawner : MonoBehaviour
{
    [SerializeField]GameObject unit;
    public int amount;
    public bool jobber;
    GameObject go;

    private List<GameObject> listok=new List<GameObject>();
    
    void Start()
    {
        for (int i = 0; i < amount; i++)
        {
            go=Instantiate(unit, new Vector3(UnityEngine.Random.Range(-100f, 100f), unit.transform.position.y, UnityEngine.Random.Range(-100f, 100f)),Quaternion.identity);

            listok.Add(go);
            //Debug.Log(listok[0]);
        }
    }

    void Update()
    {
        if (jobber)
        {

            NativeArray<float3> startPoint = new NativeArray<float3>(listok.Count, Allocator.TempJob);
            NativeArray<float3> destinationPoint = new NativeArray<float3>(listok.Count, Allocator.TempJob);
            NativeArray<float> speed = new NativeArray<float>(listok.Count, Allocator.TempJob);
            NativeArray<float3> endPoint = new NativeArray<float3>(listok.Count, Allocator.TempJob);
            TransformAccessArray transformAccessArray = new TransformAccessArray(listok.Count);

            for (int i = 0; i < listok.Count; i++)
            {
                startPoint _= listok*.transform.position;*_

destinationPoint = listok*.gameObject.GetComponent().destionationPoint3D;*
speed = listok*.gameObject.GetComponent().speed;*
transformAccessArray.Add(listok*.transform);*
}

UnitMoveJobParallel unitMoveJobParallel = new UnitMoveJobParallel
{
deltaTime = Time.deltaTime,
speed = speed,
startPoint = startPoint,
destinationPoint = destinationPoint,
endPoint = endPoint
};

JobHandle jobHandle = unitMoveJobParallel.Schedule(transformAccessArray);
jobHandle.Complete();

for (int i = 0; i < listok.Count; i++)
{
listok_.transform.position = endPoint*;*_

}
startPoint.Dispose();
destinationPoint.Dispose();
speed.Dispose();
endPoint.Dispose();
transformAccessArray.Dispose();

}
}
}
[BurstCompile]
public struct UnitMoveJobParallel : IJobParallelForTransform
{
public NativeArray startPoint;
public NativeArray destinationPoint;
public NativeArray speed;
public NativeArray endPoint;
public float deltaTime;

public void Execute(int index, TransformAccess transform)
{
endPoint[index] = Vector3.MoveTowards(startPoint[index], destinationPoint[index], speed[index] * deltaTime);
}

}
The only thing I found is that before jobs start on the timeline there is a lot of work on the main thread. Screenshot 2. I suppose it has something to do with allocating 1000 in 5 containers each frame…
So. My questions are:
1) Is there a way to precash native containers to avoid populating them each frame? Provided the number of units will stay the same. Declaring it outside update or on class level throws error. Either about not allowed to be called from mono or problem with disposing them.
----------
2) Suppose native collections suppose to work like that. (to be created each frame) Then where I made mistakes? Why multithreaded works slower than singlethreaded?
----------
3) Is there guide or documentation to get educated? unity documentation on jobs is a bit short.
----------
4) On the second screenshot the tooltip says something about 10k and 17k instances on the threads, my scene is only of ~1000 gameobjects. What are those? Is there a clear way to inspect those?
----------
Any help is greatly appreciated
[193310-jobs.png|193310]
[193311-screenshot-15.jpg*|193311]*
*
*

I found the solution.
Preallocate arrays with Persistent allocation attribute. Turns out transofmerArray don’t need to be updated manually. It is updated automatically.
So in the for loop in the Update method I left filling only destinationPointArray. It can be optimised with events but it is unnecessary because at 25k objects script execution is low-cost compared to rendering.

After removing allocations in Update() it finally started to work as expected. Jobs script version perform slightly faster.
I wonder jobs would work with inconsistent amount of gameobjects. You’d have to create new NativeArray each time amount has changed which would nullify all benefits of jobs. Or I again missing something. I’ll figure it out… somehow

You have 2.6 [ms] from vanilla code @ unspecified cpu

And that job-ified code contains many mistakes, some very costly ( TransformAccess transform totally ignored).

I’ve got 0.3 [ms] total, spread across worker threads @ i3-4170 ( 0.1 [ms] to complete)

MyUnitComponent.cs

using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Jobs;

using Unity.Mathematics;
using Unity.Collections;
using Unity.Jobs;
using Unity.Entities;
using Unity.Rendering;

using BurstCompile = Unity.Burst.BurstCompileAttribute;

public class MyUnitComponent : MonoBehaviour
{
	public static List<MyUnitComponent> Instances = new List<MyUnitComponent>();
	public static TransformAccessArray Transforms;
	public static NativeList<float> Speed;
	public static NativeList<float3> Destionation;
	public static JobHandle Dependency;
	public int Index { get; set; } = -1;
	[SerializeField] float _speed = 10f;
	[SerializeField] float3 _testDestination = new float3( 100 , 100 , 100 );// delete line later on
	public const int k_max_instances = 10000;
	
	void OnEnable ()
	{
		if( !Transforms.isCreated )
		{
			Transforms = new TransformAccessArray( k_max_instances );
			Speed = new NativeList<float>( Allocator.Persistent );
			Destionation = new NativeList<float3>( Allocator.Persistent );
		}

		Dependency.Complete();// immediate data access
		Index = Instances.Count;
		Instances.Add( this );
		Speed.Add( _speed );
		Destionation.Add( _testDestination );// delete line later on
		// Destionation.Add( transform.position );// uncomment later on
		Transforms.Add( transform );
	}

	void OnDisable ()
	{
		Dependency.Complete();// immediate data access
		Instances.RemoveAtSwapBack( Index );
		Speed.RemoveAtSwapBack( Index );
		Destionation.RemoveAtSwapBack( Index );
		Transforms.RemoveAtSwapBack( Index );

		if( Instances.Count!=0 )
		{
			// fix Index after RemoveAtSwapBack:
			if( Index>=0 && Index<Instances.Count )
			    Instances[ Index ].Index = Index;
		}
		else
		{
			if( Speed.IsCreated ) Speed.Dispose();
			if( Destionation.IsCreated ) Destionation.Dispose();
			if( Transforms.isCreated ) Transforms.Dispose();
		}
	}

	#if UNITY_EDITOR
	void OnValidate ()
	{
		if( Index!=-1 )
		{
			Dependency.Complete();// immediate data access
			Speed[Index] = _speed;
			Destionation[Index] = _testDestination;
		}
	}
	void OnDrawGizmosSelected ()
	{
		if( Index!=-1 )
		{
			Gizmos.color = Color.yellow;
			var pos = transform.position;
			Gizmos.DrawLine( pos , Destionation[Index] );
			Gizmos.DrawSphere( pos , 0.1f );
		}
	}
	#endif

	public void SetSpeed ( float value ) => Speed[Index] = value;
	public void SetDestination ( float3 value ) => Destionation[Index] = value;
}

public class MyUnitMoveSystem : SystemBase
{
	protected override void OnUpdate ()
	{
		var unitMoveJob = new UnitMoveJob
		{
			Destination		= MyUnitComponent.Destionation.AsArray() ,
			Speed			= MyUnitComponent.Speed.AsArray() ,
			DeltaTime		= Time.DeltaTime ,
		};
		Dependency = unitMoveJob.Schedule( MyUnitComponent.Transforms , Dependency );
		MyUnitComponent.Dependency = Dependency;
	}
	[BurstCompile] public struct UnitMoveJob : IJobParallelForTransform
	{
		[ReadOnly] public NativeSlice<float3> Destination;
		[ReadOnly] public NativeSlice<float> Speed;
		public float DeltaTime;
		void IJobParallelForTransform.Execute ( int index , TransformAccess transform )
		{
			transform.position = MoveTowards( transform.position , Destination[index] , Speed[index] * DeltaTime );
		}
		// Rewritten for Burst, src: https://github.com/Unity-Technologies/UnityCsReference/blob/master/Runtime/Export/Math/Vector3.cs#L59-L77
		float3 MoveTowards ( float3 src , float3 dst , float maxDistanceDelta )
		{
			float3 dir = dst - src;
			float distSq = math.lengthsq(dir);
			if( distSq==0 || ( maxDistanceDelta>=0 && distSq<=maxDistanceDelta*maxDistanceDelta ) ) return dst;
			return src + dir / math.sqrt(distSq) * maxDistanceDelta;
		}
	}
}

MyUnitSpawner.cs

using UnityEngine;
public class MyUnitSpawner : MonoBehaviour
{
	[SerializeField] GameObject _prefab = null;
	[SerializeField][Range(0,MyUnitComponent.k_max_instances)] int _amount = 100;
	void OnEnable ()
	{
		float y = _prefab.transform.position.y;
		for( int i=0 ; i<_amount ; i++ )
		{
			Vector3 pos = new Vector3( Random.Range(-100f,100f) , y , Random.Range(-100f,100f) );
			Instantiate( _prefab , pos , Quaternion.identity );
		}
	}
}