I have been looking into implementing HPA* pathfinding. Basically have a graph made up of clusters of cells. When a unit is told to go somewhere, it will do the following:
A* a path of clusters, based on cached entry points between them.
A* a path of cells from the current unit’s position to the entry point of the first cluster in path.
Follow cached paths between the entry points of the clusters along the path.
A* the path of cells from the last entry point to the target position.
Currently I have a* implemented using a burst job. Its very fast. Now I am asking how I should structure algorithm above with the job system, keeping pure performance in mind.
Should the entire thing be one job? Both the high level cluster path finding and the low level pathfinding inside each cluster calculated once and path is given to entity?
Or should I make one job to calculate the path of clusters, and then each time the entity enters a new cluster along the path I run another job to find the path through that cluster?
I am asking because I know there is some overhead to scheduling jobs, and in addition in order to run the jobs I need to pass in quite some data, including converting and/or copying the graph data to the job. My thinking is if it is all in one job I can avoid having to constantly re-copy this data and schedule a job every time the entity enters a cluster.
Any other tips or suggestions keeping performance in mind would really help.
Build your graph into a job-friendly format from the get-go and you don’t have to copy anything but pointers (as that’s what NativeContainer structs really are) to the jobs. Scheduling jobs have overhead, but like a couple dozen microseconds of overhead.
Thanks for replying. If you will, I have another question?
I am getting quite high ms for scheduling multiple pathfinding jobs per frame. The each job itself only takes about 0.002 seconds, but the system itself is taking around 20ms. This is with 6 jobs per frame. Also, if I find all paths within one job, it goes up to 200 ms. I am not sure why, again the jobs themselves are very small. But it seems that scheduling many causes a large performance hit.
Yeah. I definitely think you are doing something wrong in your code. There’s a huge chunk of time being spent on setup. And it seems you are forcing completion of the jobs inside the system.
Allocate your managed List outside of OnUpdate and clear at the beginning of OnUpdate. You are leaking GC.
Does grid.GetPathNodeArray deep copy the array or shallow copy? I think you only need a shallow copy here.
tmpPathNodeArray is performing a deep copy. Unless your job is modifying this array, you do not want this. If it is modifying the array, I would pass a [ReadOnly] version to the job and create the temporary inside of it. This deep copy is happening on the main thread which is slowing stuff down.
I just realized you are using ComponentSystem and not SystemBase with codegen lambdas.That means more GC leakage.
Why are you keeping the jobs in a list? Can’t you just create them and schedule them and only store their returned JobHandles? Also you should be feeding the Schedule method JobHandles so that everything can run asynchronously.
You don’t need to complete all the JobHandles. You just need to schedule another IJob job with BufferFromEntity and the singular entity to copy the data to.
I am keeping my jobs in a list because I do this after the job:
for ( int i = 0; i < findPathJobList.Count; i++ )
{
Entity entity = findPathJobList[ i ].entity;
buffer[ entity ].Clear();
for ( int j = 0; j < findPathJobList[ i ].finalWaypoints.Length; j++ )
{
buffer[ entity ].Add( new Path_Position { position = findPathJobList[ i ].finalWaypoints[ j ] } );
}
pathIndices[ entity ] = new Path_Index { index = buffer[ entity ].Length - 1 };
findPathJobList[ i ].finalWaypoints.Dispose();
}
pathNodeArray.Dispose();
jobHandleList.Dispose();
Dont I need to get all the pathLists done first before I can write them to the Buffers?
I am new to this sorry for obvious mistakes.
I must be fundamentally misunderstanding something because when I try to pass the pathNodeArray to the Job without deep-copying it I get job errors saying it has already been deallocated.
Instead of subclassing ComponentSystem, you subclass SystemBase. It works like JobComponentSystem, except instead of passing in and returning InputDeps, it is now stored as a property named Dependency and lambda jobs automatically update it if you don’t specify otherwise.
Oh. You are writing to individual buffers per entity? Well in that case you don’t need to schedule a bunch of individual jobs. Instead you can build a NativeList of entities that need pathfinding and then convert that to a NativeArray. Then you can schedule an IJobParallelFor to do pathfinding over each Entity. You use ComponentDataFromEntity and BufferFromEntity to fetch all the per-entity info.
Most importantly, as long as it is per entity, you can write to BufferFromEntity using [NativeDisableParallelForRestriction].
That’s what inputDeps when scheduling jobs are for. You are basically saying “only run this job after these other jobs are finished”. A lot of us call this “job-chaining”.
I am able to get around 15-20ms for 50 units finding a path on an open 100X100 grid, with post process path smoothing (finding the nodes with lines between them not blocked by anything) on a single frame, and I haven’t even optimized my a* yet.
Also keep in mind this is on a laptop with a 4 core 4 thread AMD a10 cpu; modern cpu’s are at least 2-3 times faster, plus better RAM speeds, etc.
I am using an IJobParallelFor here. The batchSize is just the number of entities I am doing work on divided by the number of cores on my cpu.
I my next move is to implement a NativePriorityQueue using Unsafe pointers for the open set of the a*, and then switching to a matrix to represent the open and closed sets of a*.
I think the priority queue wiill improve the performance by at least 2-3 x, and the matrix by at least on order of magnitude if not more.