Job is slower than Main Thread

Hi everyone!

I’ve recently started learning and experimenting with the Unity job system and i was trying to implement the A* Pathfinding using it.

I was thinking that using multithreading would greatly improve the performance by splitting paths calculations amoung different threads but when i tested it i noticed that i wasn’t getting any performance boost.

8515685--1135220--Multithread.png

That’s the code i’m using to test it(it’s called in the update function on a MonoBehavior):

private void Pathfind(bool multithread)
        {
            PathfindingNode start = new PathfindingNode((Vector2)t_start.transform.position, grid);
            PathfindingNode end = new PathfindingNode((Vector2)t_end.transform.position, grid);
            NativeArray<JobHandle> handlers = new NativeArray<JobHandle>(loops, Allocator.TempJob);
            AStar[] jobs = new AStar[loops];
            float time = Time.realtimeSinceStartup;
            if (multithread)
            {
                for (int i = 0; i < loops; i++)
                {
                    AStar aStar = new AStar(start, end, navPoints, grid);
                    jobs[i] = aStar;
                    handlers[i] = aStar.Schedule();
                }
                JobHandle.CompleteAll(handlers);
            }
            else
            {
                AStar aStar = new AStar(start, end, navPoints, grid);
                for (int i = 0; i < loops; i++)
                    aStar.Run();
            }
            Debug.Log(Time.realtimeSinceStartup - time);
            handlers.Dispose();

        }

and here’s the job(note that it’s not been optimized, it’s a raw implementation for now):

public struct AStar : IJob
    {
        [ReadOnly] public NativeArray<float2> navPoints;
        //public NativeList<float2> path;
       
        private PathfindingNode start;
        private PathfindingNode end;
        private CustomGrid grid;
  
        public AStar(PathfindingNode start, PathfindingNode end, NativeArray<float2> navPoints, CustomGrid grid)
        {
            this.start = start;
            this.end = end;
            this.navPoints = navPoints;
            this.grid = grid;
            //path = new NativeList<float2>(Allocator.TempJob);
        }
       
        public void Execute()
        {
            FindPath(start, end);
        }

        ///<summary>
        /// Find the path between start point and end point.
        ///</summary>
        public void FindPath(PathfindingNode start, PathfindingNode end)
        {
            // Compute distance from start node and target node.
            PathfindingNode current = start;
            current.g = 0;
            current.h = start.GetDistance(end);
            // Initialize search list.
            NativeList<PathfindingNode> toSearch = new NativeList<PathfindingNode>(Allocator.Temp) {current};
            NativeArray<PathfindingNode> processed = new NativeArray<PathfindingNode>(grid.width * grid.height, Allocator.Temp);

            while (toSearch.Length > 0)
            {
                current = toSearch[0];
                // Choose best node.
                foreach (PathfindingNode node in toSearch)
                    if (node.f < current.f || (node.f == current.f && node.h < current.h))
                        current = node;
               
                // If we've reached the end goal trace back and build path,
                // otherwise keep adding processed path
                if (current.Equals(end))
                {
                    // Trace back the path
                    CalculatePath(current, processed);
                    break;
                }

                for(int i = 0; i< toSearch.Length; i++)
                    if(toSearch[i].index == current.index)
                    {
                        toSearch.RemoveAtSwapBack(i);
                        break;
                    }
                processed[current.index] = current;
               
                NativeArray<PathfindingNode> neighbours = current.GetNeighbors();
                for (int i = 0; i < neighbours.Length; i++)
                {
                    PathfindingNode neighbour = neighbours[i];
                    bool inSearch = toSearch.AsArray().Contains(neighbour);
                    // Check if this neighbour node has already been processed, if yes discard it,
                    // otherwise keep computing it.
                    if (!processed.Contains(neighbour) && navPoints.Contains(neighbour.nodeCell.center))
                    {
                        if (!inSearch)
                        {
                            neighbour.connection = current.index;
                            neighbour.g = current.g + current.GetDistance(neighbour);
                            neighbour.CalculateH(end);
                            toSearch.Add(neighbour);
                        }
                    }
                }
            }
        }

        // Trace back calculated path and build pathfinding result.
        public void CalculatePath(PathfindingNode start, NativeArray<PathfindingNode> pathList)
        {
            int iter = 0;
            PathfindingNode currentPathNode = start;
            NativeList<float2> path = new NativeList<float2>(Allocator.Temp);
            while (currentPathNode.connection != -1 && iter < pathList.Length)
            {
                PathfindingNode cameFrom = pathList[currentPathNode.connection];
                path.Add(cameFrom.nodeCell.center);
                currentPathNode = cameFrom;
                iter++;
            }
            pathList.Dispose();
            path.Dispose();
        }
    }

I’m still new to the job system and i honestly don’t know what i might be doing wrong/breaking, so i would really appreciate if you let me know.

Thanks.

Pretty common actually. It’s still the same CPU running the code, and since everything (generally) has to be marshaled back through a main thread hook (if it interacts with the Unity API at all), multithreading only gives benefits in the narrowest of specific cases, and even those cases often require intention re-engineering to truly realize full benefit.

1 Like

Note that blocking on the main-thread can mean that the scheduler might also run some of those jobs on the main-thread because it’s not doing anything but waiting so that reduces the potential of it. Best to schedule, do work then come back and ask for results.

But, as above, whilst you’ll get a performance increase from doing things in parallel, there’s an overhead of scheduling and if you approach what you do in a job in an object-orientated way rather than a data-centric way, you’ll also loose the possibility of making it faster. Also, adding Burst won’t do much for you here so you’ll not get vectorisation either. Always worth adding it though!

You’ll get the most performance out of avoiding CPU cache misses by having your data laid out in a way that means the jobs can simply blast through it and also do vectorisation (Burst) etc, preferably using the mathematics library.

Things like allocating. You should avoid allocating inside a job if possible. If you are going to do temp work then use the Allocator.TempJob.

You should try to post this stuff on the DOTS forum, that’s where all the real experts might be. I can, of course, simply move your post for you if you wish.

2 Likes

Yes it would be great, thanks!

Moved.

First off, it isn’t obvious to me skimming through the code how running in parallel is actually parallelizing the calculations. It looks like each job is calculating the same path over and over again. For A*, stick to a single-threaded job per agent and run agents in parallel.

Second, Burst does so much more than vectorization. Vectorization can only give you at max a 4 or 8 times speedup, usually less. Yet Burst will frequently make code an order of magnitude faster or more. Make sure you add that BurstCompile attribute.

Also, there’s nothing wrong with using Allocator.Temp inside a job, especially if you are going to be using that memory for a lot of work (Allocating temporary lists and hashmaps for an entire A* solve is an excellent use case).

If you want really good performance, focus on a single agent and a single threaded job and try to optimize that as much as possible. If you really want to parallelize for a single agent, you will likely be better served by a different technique such as a flow graph instead.

3 Likes

Thank you so much to everyone which tried to help me.

By following your advices i learned a lot and was able to massively improve performance formy A* Pathfinding algorithm.
One important note is that even though i knew following a data-oriented approach was more efficient the results quite shocked me, i guess sometimes you just have to try yourself.

Also Burst is EXTREMELY good, shocked about that.

2 Likes

would you share your updated code as learning for us how you made it faster?