2023.2.17f1
I have a scene only with 1 rigidbody cube, and nothing else. Yet each Physics.Simulate event is taking 0.27ms while WaitForJobGroupID takes up 0.26ms. This seems like an eternity for such a simple task.
This becomes an issue for multiplayer games with physics rollback capability. Since we must execute the physics engine multiple times in a row, this can easily chew through multiple milliseconds despite extremely basic gameplay mechanics.
Sharing work between 7 worker threads for 1 rigidbody cube? Doesn’t physx utilize it’s own multithreading system anyways?
I’m on the verge of taking on the massive task of implementing the latest physx version myself just to bypass unity’s job system for playable networked physics capabilities.
I’d appreciate anyone to actually explain what’s going on in the profiler & why this needs to happen.
Even if you have just 1 rigidbody, the entire physics engine has to run. This means collision detection (both broad phase and narrow phase), island generation, calculating jacobians, solving constraints, integration, etc. Each of these phases is designed to run on multiple items at once, but there’s only one.
The cost of running the entire thing even if just for one object is not 1, wouldn’t be 5 if you had 5 objects or 10 if you had 10 objects. Systems rarely scale in a perfectly linear way. So for 1 object the cost can be 6, for 2 objects may be 8, for 3 objects may be 10, and so on.
As can be clearly seen in your screenshot, barely any work sharing is taking place at all: whenever a worker thread is performing a task, all other threads are idle. The actual thread doing the work changes for each task, since that’s the nature of a thread pool: any thread that’s idle can take on the next available task.
Performance in this case would be the exact same if you had a single thread doing all the work, since there’s only 1 object, and hence no way to parallelize anything. There’s no overhead due to using jobs.
Those 0.27 ms you’re looking at are the base cost of having any physics at all: the main thread waiting for the worker threads in the pool to finish all tasks.
Try using a larger timestep and updating the engine less times: instead of updating 4 times with a timestep of 0.02, update 2 times with a timestep of 0.04. This allows you to trade rollback accuracy for performance.
1 Like