A question about pathfinding and the JobSystem , and performance.

Hello.

I have been looking into implementing HPA* pathfinding. Basically have a graph made up of clusters of cells. When a unit is told to go somewhere, it will do the following:

A* a path of clusters, based on cached entry points between them.

A* a path of cells from the current unit’s position to the entry point of the first cluster in path.

Follow cached paths between the entry points of the clusters along the path.

A* the path of cells from the last entry point to the target position.

Currently I have a* implemented using a burst job. Its very fast. Now I am asking how I should structure algorithm above with the job system, keeping pure performance in mind.

Should the entire thing be one job? Both the high level cluster path finding and the low level pathfinding inside each cluster calculated once and path is given to entity?

Or should I make one job to calculate the path of clusters, and then each time the entity enters a new cluster along the path I run another job to find the path through that cluster?

I am asking because I know there is some overhead to scheduling jobs, and in addition in order to run the jobs I need to pass in quite some data, including converting and/or copying the graph data to the job. My thinking is if it is all in one job I can avoid having to constantly re-copy this data and schedule a job every time the entity enters a cluster.

Any other tips or suggestions keeping performance in mind would really help.

Build your graph into a job-friendly format from the get-go and you don’t have to copy anything but pointers (as that’s what NativeContainer structs really are) to the jobs. Scheduling jobs have overhead, but like a couple dozen microseconds of overhead.

1 Like

Thanks for replying. If you will, I have another question?

I am getting quite high ms for scheduling multiple pathfinding jobs per frame. The each job itself only takes about 0.002 seconds, but the system itself is taking around 20ms. This is with 6 jobs per frame. Also, if I find all paths within one job, it goes up to 200 ms. I am not sure why, again the jobs themselves are very small. But it seems that scheduling many causes a large performance hit.

I’d have to see timeline view and code to give you any more insight.

Here it is.

I’d like to add this is being done on an AMD a10 8400 cpu (laptop), it isnt the best cpu.


Yeah. I definitely think you are doing something wrong in your code. There’s a huge chunk of time being spent on setup. And it seems you are forcing completion of the jobs inside the system.

I am forcing completion in my code because I need to write to BufferElements and this cant be done in parrallel.

This is the setup. Grid.GetPathNodeArray() just copies and array.

            List<FindPathJob> findPathJobList = new List<FindPathJob>();
            NativeList<JobHandle> jobHandleList = new NativeList<JobHandle>( Allocator.Temp );
            ComponentDataFromEntity<Path_Index> pathIndices = GetComponentDataFromEntity<Path_Index>();
            BufferFromEntity<Path_Position> buffer = GetBufferFromEntity<Path_Position>();

            NativeArray<PathNode> pathNodeArray = grid.GetPathNodeArray();

            Entities.ForEach( ( Entity entity , ref PathFinding_Orders pathFindingOrders , ref Regiment_Data regimentData ) =>
            {
                if ( pathFindingOrders.hasNewOrders )
                {
                    //pathFindingOrders.hasNewOrders = false;

                    NativeArray<PathNode> tmpPathNodeArray = new NativeArray<PathNode>( pathNodeArray , Allocator.TempJob );

                    FindPathJob job = new FindPathJob
                    {
                        gridSize = gridSize ,
                        cellSize = grid.cellSize ,
                        pathNodeArray = tmpPathNodeArray ,
                        startWorldPosition = regimentData.position2D ,
                        endWorldPosition = pathFindingOrders.targetPosition ,
                        entity = entity ,
                        pathIndexComponentDataFromEntity = pathIndices ,
                        finalWaypoints = new NativeList<float2>( Allocator.TempJob )
                    };

                    findPathJobList.Add( job );
                    jobHandleList.Add( job.Schedule() );
                }
            } );

            JobHandle.CompleteAll( jobHandleList );

If I remove the CompleteAll the time cuts by about half, but I still think 4-4.5ms is alot?

  1. Allocate your managed List outside of OnUpdate and clear at the beginning of OnUpdate. You are leaking GC.
  2. Does grid.GetPathNodeArray deep copy the array or shallow copy? I think you only need a shallow copy here.
  3. tmpPathNodeArray is performing a deep copy. Unless your job is modifying this array, you do not want this. If it is modifying the array, I would pass a [ReadOnly] version to the job and create the temporary inside of it. This deep copy is happening on the main thread which is slowing stuff down.
  4. I just realized you are using ComponentSystem and not SystemBase with codegen lambdas.That means more GC leakage.
  5. Why are you keeping the jobs in a list? Can’t you just create them and schedule them and only store their returned JobHandles? Also you should be feeding the Schedule method JobHandles so that everything can run asynchronously.
  6. You don’t need to complete all the JobHandles. You just need to schedule another IJob job with BufferFromEntity and the singular entity to copy the data to.
1 Like
  1. I think it was doing a deep copy, thanks.

  2. I am not sure what SystemBase is.

  3. I am keeping my jobs in a list because I do this after the job:

            for ( int i = 0; i < findPathJobList.Count; i++ )
            {
                Entity entity = findPathJobList[ i ].entity;
                buffer[ entity ].Clear();

                for ( int j = 0; j < findPathJobList[ i ].finalWaypoints.Length; j++ )
                {
                    buffer[ entity ].Add( new Path_Position { position = findPathJobList[ i ].finalWaypoints[ j ] } );
                }

                pathIndices[ entity ] = new Path_Index { index = buffer[ entity ].Length - 1 };
                findPathJobList[ i ].finalWaypoints.Dispose();
            }

            pathNodeArray.Dispose();
            jobHandleList.Dispose();
  1. Dont I need to get all the pathLists done first before I can write them to the Buffers?

I am new to this sorry for obvious mistakes.

I must be fundamentally misunderstanding something because when I try to pass the pathNodeArray to the Job without deep-copying it I get job errors saying it has already been deallocated.

Instead of subclassing ComponentSystem, you subclass SystemBase. It works like JobComponentSystem, except instead of passing in and returning InputDeps, it is now stored as a property named Dependency and lambda jobs automatically update it if you don’t specify otherwise.

Oh. You are writing to individual buffers per entity? Well in that case you don’t need to schedule a bunch of individual jobs. Instead you can build a NativeList of entities that need pathfinding and then convert that to a NativeArray. Then you can schedule an IJobParallelFor to do pathfinding over each Entity. You use ComponentDataFromEntity and BufferFromEntity to fetch all the per-entity info.

Most importantly, as long as it is per entity, you can write to BufferFromEntity using [NativeDisableParallelForRestriction].

That’s what inputDeps when scheduling jobs are for. You are basically saying “only run this job after these other jobs are finished”. A lot of us call this “job-chaining”.

1 Like

Wow thanks so much, I learned alot. I’ll try applying all of this!

Fixing the deep copies did quite alot. The algorithm now seems to scale properly, taking about 1-2ms.

Any chance you could post your updated code for this? Helpful us newbs.

I am able to get around 15-20ms for 50 units finding a path on an open 100X100 grid, with post process path smoothing (finding the nodes with lines between them not blocked by anything) on a single frame, and I haven’t even optimized my a* yet.

Also keep in mind this is on a laptop with a 4 core 4 thread AMD a10 cpu; modern cpu’s are at least 2-3 times faster, plus better RAM speeds, etc.

I am using an IJobParallelFor here. The batchSize is just the number of entities I am doing work on divided by the number of cores on my cpu.

    protected override void OnUpdate()
    {
        NativeList<Entity> entities = new NativeList<Entity>( Allocator.Temp );
        NativeList<float2> startPositions = new NativeList<float2>( Allocator.Temp );
        NativeList<float2> endPositions = new NativeList<float2>( Allocator.Temp );

        Entities.ForEach( ( Entity entity , ref PathFinding_Orders pathFindingOrders , ref Regiment_Data data ) =>
        {
            if ( pathFindingOrders.hasNewOrders )
            {
                entities.Add( entity );
                startPositions.Add( data.position2D );
                endPositions.Add( pathFindingOrders.targetPosition );

                pathFindingOrders.hasNewOrders = false;
            }
        } );

        FindPathJob job = new FindPathJob
        {
            gridSize = gridSize ,
            cellSize = grid.cellSize ,
            pathNodeArray = grid.pathNodeArray ,
            startWorldPositions = startPositions.AsArray() ,
            endWorldPositions = endPositions.AsArray() ,
            entities = entities.AsArray() ,
            pathPositionBuffer = GetBufferFromEntity<Path_Position>() ,
            pathIndexComponentData = GetComponentDataFromEntity<Path_Index>()
        };

        int batchSize = entities.Length / 4;
        JobHandle jobHandle = job.Schedule( entities.Length , batchSize );
        jobHandle.Complete();

        entities.Dispose();
        startPositions.Dispose();
        endPositions.Dispose();
    }

I my next move is to implement a NativePriorityQueue using Unsafe pointers for the open set of the a*, and then switching to a matrix to represent the open and closed sets of a*.

I think the priority queue wiill improve the performance by at least 2-3 x, and the matrix by at least on order of magnitude if not more.

I will post when I finish it.

2 Likes

Great thanks!

https://github.com/Tree37/A-Pathfinding-Jobs/blob/master/ParallelPathfindingSystemTester.cs