Random huge spikes on JobHandle.CompleteAll for a Pathfinding Job System

Hello,

First of all, sorry if it is confusing. I'm a beginner in DOTS, especially in writing code of Jobs so I'll probably say some stupid things (yet I've a degree in computer science and studied parallelization, I should have some basics).

I have a Pathfinding Job System based on the implementation of Code Monkey (link to the video, the project repo)
It is working well most of the time, except that it spikes at 1400ms/frame randomly every 30s or so.

I have no idea what to do to fix it, I've tried many changes without success. Here is the source code of the System

From what I've came to understand, the problem seems to come from the Pathfinding.OnUpdate function.
The main thread is waiting for the findPathJobs to complete at the CompleteAll call because the SetBufferPathJob needs to access some shared datas right after it. So it may just be a bad implementation but theses jobs are running on the main thread and are not parallelized.

I've came close to a parallelized solution but failed to succeed because the SetBufferPathJob needs to write to a ComponentDataFromEntity after the findPathJob has been executed.
Using the [NativeDisableParallelForRestrictionAttribute] didn't help, although there is no risk of race condition.

I hope someone more experimented can spot the problem easily with theses informations, and if needed I can provide the other components source code.

Thanks to any help :)

If you're calling JobHandle.Complete or JobHandle.CompleteAll, then you're creating sync points in the middle of a given frame. You're effectively saying, then, that other jobs have to complete before one particular job can run. That's something you need to avoid as much as possible. Not saying it's always preventable, but I believe in this case that it is. I make zero(!) calls to those functions in my navigation package. I think it sets a decent example for how DOTS navigation state should be handled at a high level, and encourage you to skim the code (FYI, I'm massively overhauling the demos and improving the nav code in a big feature branch right now).

The video you linked appears to be pretty dated at this point, as you can define jobs sequentially in a SystemBase for automagic dependency management, using Entities.ForEach for almost everything. On top of that, you can mess with system update order. Then rather than saying, "all these jobs have to complete so I can run these other jobs," you just make the latter jobs dependent on the former jobs. Without seeing your code, I would venture to guess you need to take full advantage of job dependencies. You don't have to complete all processing of jobs at some arbitrary time because you can, instead, tell others they must deterministically follow them.

1 Like

That’s not correct. JobHandle.Complete and JobHandle.CompleteAll does not create sync point (which is a specific thing described in the link to docs you posted) it only completes passed dependency chain, if you have two or more independent chains and complete only handle from one of them it will complete only jobs in this chain but not other and will freeze main thread (as it will trigger to use the main thread to help worker threads involved in dependency chain), and it’s a huge difference with sync point which completes all jobs in an application, real sync point triggered only through DependencyManager->CompleteAllJobsAndInvalidateArrays(); which called from BeforeStructuralChange in some methods of EntityManager, like Create\Destroy\AddComponent\RemoveComponent\SetSharedComponent etc.(and same methods from EntityCommandBuffer.Playback) or EntityManager.CompleteAllJobs()

3 Likes

Waiting for all dependencies to complete in a given chain is effectively a local sync point, which is still something to be avoided, especially if you have a large dependency chain.

Sounds like this may be OP’s issue. Valid distinction to make @eizenhorn , but my point still holds fine. Another way of putting your rebuttal in a way that’s useful to OP might be to alternatively separate dependency chains to retain the CompleteAll call.

Edit: Here’s a post from a certain Unity employee explaining how using said functions introduce sync points.

Thanks for your reply !

Completely agreed with your point @reeseschultz , as I said I knew this was probably the issue but these findPathJobs has to be completed before the SetBufferPathJobs could access the shared ComponentDataFromEntity.
I tried to chain the jobs passing the first as a dependency in the JobHandle call of the second, but always ended up with errors (concurrent access or obscur ones, couldn't fix them).
I will look at SystemBase eventually if no obvious answer can be found, but as I said I'm a beginner in DOTS it's pretty overwhelming...

Your package looks great but the reason I use a custom grid based Pathfinding is because of moving obstacles (they update the grid "walls" in real time so the agents can avoid them) and this feature was not possible with NavMeshes (well, rebaking the mesh every frame was too much a workload for mobile device). Since your package seems to be based on NavMeshSurfaces, does it support moving obstacles ?

Btw fixed the source code link, my bad !

Hey @Qhuhuit_1 , it should so long as you set the obstacles to carve. Really my code is hybrid DOTS. The full monorepo has demos (again, which I’m currently overhauling so they’re not so ugly and arbitrary). There are obstacles in the demos, although I don’t have any explicitly for moving ones. It shouldn’t be too bad to modify one of the demos to test that out. Basically, surfaces and obstacles still get injected as GameObjects, but the NavAgent entities are still capable of interacting with them.

I opted for that design because most of the performance cost I wanted to curtail was agent navigation itself. I don’t care if there are still some GameObjects around. Plus, it allows for “backward-compatibility” via the NavAgentHybrid component, which I introduced recently. That particular component can be used for leveraging the existing animation system in Unity while still getting a major performance benefit for many navigating agents.

FYI, I’m always open to PRs. If someone wants to write a navigation backend, I’d be happy to either add it as a package dependency to my nav code, or incorporate it directly. Right now the backend is the experimental AI API.

If my code doesn’t work for you, or you’re just curious about writing low-level nav code for whatever reason, you might take a look at Jeff Vella’s DOTS A* implementation. It may be a little stale, but it looks pretty solid to me still. Of particular interest is his code in Assets/Scripts and Assets/Plugins. He has a NativePriorityQueue you may want to look at, grid management, etc.

Otherwise, we’re all just going to have to wait for Unity to release an official DOTS navigation thing. I would say it’s the primary blocker preventing people from going full DOTS / Project Tiny, but obviously there’s other tooling we’d all like to see as well. I’m not surprised it’s taking forever, because it’s just a ridiculously hard thing to get right. People are asking me about networked DOTS nav lately, so covering all use cases and corner cases for this sort of thing is like climbing Mount Everest.