consistent crashes with RaycastCommand 2018.3

Another regret on moving to 2018.3. This happens consistently but the complexity of everything going on makes it difficult to pinpoint the cause.

I do raycasts for combat. Periodically I do a bunch of raycasting, a few groups of a few thousand at a time. This builds up environment data for the combat ai.

The trigger seems to be when new combat entities are created which are kinematic rigidbodies while the envirornment raycasting is in progress. Just a guess. The combat ai also does raycasting for LOS, so it could be some interaction with multiple raycast batches also for all I know.

The last code showing in the log was the code that fires off the environment raycasting.

It doesn’t trigger with the combat raycating disabled.

0x0000000142974CFD (Unity) physx::Sq::AABBTree::refitMarkedNodes
0x0000000142974E1E (Unity) physx::Sq::AABBPruner::refitUpdatedAndRemoved
0x0000000142971E93 (Unity) physx::Sq::AABBPruner::commit
0x0000000142979894 (Unity) physx::Sq::SceneQueryManager::flushUpdates
0x000000014272D8E1 (Unity) physx::NpSceneQueries::multiQueryphysx::pxRaycastHit
0x0000000142742D31 (Unity) physx::NpSceneQueries::raycast
0x0000000141853C83 (Unity) RaycastCommandJob
0x00000001408A68B3 (Unity) JobQueue::Exec
0x00000001408A6A1C (Unity) JobQueue::ExecuteJobFromHighPriorityStack
0x00000001408A6F32 (Unity) JobQueue::processJobs
0x00000001408A8F83 (Unity) JobQueue::WorkLoop
0x0000000140A70CE4 (Unity) Thread::RunThreadWrapper
0x00007FFCF3433034 (KERNEL32) BaseThreadInitThunk
0x00007FFCF5E13691 (ntdll) RtlUserThreadStart

So reducing the amount of raycasts in the environment batches seems to fix it. I jumped the combat itself up to our design peak which is 500 agents which results in a lot of raycasting but spread out so no large numbers per batch, and that didn’t trigger it either. So it appears to be related to large batches.

Although that sucks because we need that environment data to generate faster, it takes way to long having to limit the batch sizes to being so small.

Surely a bug report with a repro scene would be useful for Unity to fix the issue.

If I knew how to create an isolated repro I would. It’s a large multiplayer game way to many moving parts to submit as is.

So looking at the physx source and reconciling with the behavior I’m seeing, it appears they are just violating one of the stated rules in the source. Like either submitting updates without finishing or submitting from multiple threads. This error is very specific it really should not be that hard to track down, and a test case is actually not a huge value I don’t think in this case. Not enough value for them to just ignore it without one.

It’s also a rather nasty bug because what triggers it is load over multiple jobs, so it could in theory hit even at low load. I’ve now had it hit under relatively light loads just not as often.

Seems to be fixed in 2018.3.41f.