My “Before Physics System Group” is extremely slow. Unfortunately, I cannot glimpse which system actually eats up all this time.
(this is related to the threads stating that since pre65, physics is slow - it still is, in 1.0.10)
So how can I examine this to find out where the actual waiting happens? The actual values drastically vary if Burst is off, so I have to profile this with Burst being enabled.
It feels like I’m running into this, especially with the multiple physics updates per frame, etc.
But my jobs that work with these singletons should be long finished, so I’m a bit confused as to how to investigate this slowdown further.
Unfortunately, since most of my objects are mobile and dependent on reasonably accurate collision detection, skipping BuildPhysicsWorld isn’t a viable approach.
It waits for previous simulation step to complete, there is FIndOverlaps job bar is visible. The cause of the stall that forced system to spam simulation steps happened somewhere before.
Also it’s weird how solver phases are spread over the step (those multiple small dots in the Job tab). Could it be the change to the Unity Jobs?
BTW, you can disable that catch up death loop by setting Project Settings → Time → Maximum Allowed Timestep equal to the Fixed Timestep. This will expose what actually causes slowdowns.
You might have three steps happening in this frame. And the first “long” BuildPhysicsWorldSystem is waiting for completion of something. You can see this in the blue bar below it that reads “JobHandle.Complete()”. It has to wait for completion of the systems (and the jobs schedule by these systems!) from the first step.
Can you zoom in on the first little green bar in front of your first “long” BuildPhysicsWorldSystem?
These Project Settings don’t have any influence on the ECS time management, nor the time step.
The time management you are referring to is implemented using the ECS rate manager interface.
And the time step (by default 1/60) is set using the TimeStep property in the FixedStepSimulationSystemGroup.
The well-known fixed stepping semantics for MonoBehaviors, that are controlled with the settings you pointed out, are implemented in ECS using different types of rate managers.
There is the FixedRateCatchUpManager, which is the one that is used by default in the FixedStepSimulationSystemGroup:
When it determines that the simulation is running behind, it will perform multiple system updates in a single FixedStepSimulationSystemGroup to try and remain at real-time rates, which works up to a point.
Then there is the VariableRateManager, which uses variable time steps for stepping things forward rather than multiple steps with fixed time steps.
Finally, there is the simplest one, the FixedRateSimpleManager, which ensures that there is always one step done per update of the FixedStepSimulationSystemGroup.
One could enforce this “single step” behavior (which I think you suggested) by setting the rate manager in the FixedStepSimulationSystemGroup to the FixedRateSimpleManager via the FixedStepSimulationSystemGroup.RateManager property.
Note that changing the FixedStepSimulationSystemGroup.TimeStep conveniently also updates the time step used in the rate manager, as you can see here:
Any plan to expose as multiple layman’s terms physics config at Project Settings in future release? Looks quite complicated to me. Anyway for now if I want to exit stalling death loop, which one should I change and how should I change to make it exits the loop quickly? CMarastoni suggests me to implement own FixedRateManager but seems like it’s deprecated and rename to RateManager?
To always do one step per frame you could use the FixedRateSimpleManager.
OR you copy the code of the FixedRateCatchUpManager and expose and play with the min / max time stepping parameters to limit the number of steps per frame, or maybe even change the time step slightly if that works in your simulation (similar to the VariableRateManger).
Which one works best for you depends on your exact use case.
That spread out ParallelSolverJobs situation should be fixed with the next version, if it’s what I think it is. There was an issue which caused more jobs to be spawned than needed. This is fixed now. This could be that issue, or something else. Not sure exactly. In any case, it’s worth trying to run it with the next version first to reassess.
Thanks for the details about rate managers, it’s very useful.
Regarding this jobs situation - are you referring to the next version of Physics, ECS, Collections, or the engine itself?
Just Unity Physics.
We noticed that there were an abundance of solver jobs scheduled when multithreading was enabled even when they were not needed at all (no workload whatsoever) or only a significantly smaller number was needed (lower than maximal workload). This issue, which caused slowdowns, specifically in cases with lower contact and joint counts is fixed now and will be released in the next available version.
Thank you for taking the time to reply and help me dig through this. It is most sincerely appreciated.
That is one of my systems, Jovian.Systems.PIDSystem, 0.60 ms.
I’m pretty sure it has some issues because for the love of god, I can’t get Burst to compile anything outside the OnUpdate method, but IF this causes a deadlock, I need to understand why (and 0.6 milliseconds doesn’t sound too bad for 20k entities). I use the AggressiveInlining attribute trying to imitate whatever Unity.Mathematics does, I even salvaged some of its adjacent code from QuaternionToEuler.
namespace Jovian.Systems
{
[BurstCompile]
[UpdateInGroup(typeof(BeforePhysicsSystemGroup))]
[UpdateBefore(typeof(VesselControlIntegrator))]
public partial struct PIDSystem : ISystem
{
[BurstCompile]
public void OnCreate(ref SystemState state)
{
}
[BurstCompile]
public void OnDestroy(ref SystemState state)
{
}
[BurstCompile]
public void OnUpdate(ref SystemState state)
{
// Avoid updating for first frame or other very small delta time values.
var dt = SystemAPI.Time.fixedDeltaTime;
if (dt <= float.Epsilon) return;
//Process all vessels that have a PID controller
foreach (var (transform, velocity, controls, pid) in SystemAPI.Query<LocalTransform, PhysicsVelocity, RefRW<ShipControls>, RefRW<PIDControl>>())
{
// Move towards desired position (or stay if position is zero)
var posError = math.lengthsq(pid.ValueRO.goalPosition) > float.Epsilon ? pid.ValueRO.goalPosition - transform.Position : default;
controls.ValueRW.translate = ComputePID(ref pid.ValueRW.posI, posError, -velocity.Linear, dt);
// Rotate towards desired orientation (or stay if orientation is zero)
var rotError = CalculateError(transform.Rotation, pid.ValueRO.goalOrientation);
controls.ValueRW.pitch = ComputeRequiredAngularAcceleration(rotError, transform.Rotation, -velocity.Angular, dt);
}
}
#region Computations
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static float3 ComputePD(float3 error, float3 delta)
{
const float gainP = 1f;
const float gainD = 1.5f;
var output = gainP * error + gainD * delta;
return output;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static float3 ComputePID(ref float3 integral, float3 error, float3 delta, float deltaTime)
{
const float gainP = 1f;
const float gainD = 2f;
const float gainI = 0f;
const float saturation = 0.2f;
integral += error * deltaTime * gainI;
integral = math.clamp(integral, -saturation, saturation);
var output = gainP * error + gainD * delta + gainI * integral;
return output;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static float3 CalculateError(quaternion currentOrientation, quaternion desiredOrientation)
{
//No rotation desired (default quaternion, not identity!)
if (math.lengthsq(desiredOrientation.value) < float.Epsilon) return float3.zero;
//World Space Quaternion rotation to reach desiredOrientation
var requiredRotation = math.lengthsq(desiredOrientation.value) > float.Epsilon ? RequiredRotation(currentOrientation, desiredOrientation) : quaternion.identity;
//World space Euler rotation.
return QuaternionToEuler(requiredRotation, math.RotationOrder.ZXY);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static float3 ComputeRequiredAngularAcceleration(float3 error, quaternion currentOrientation, float3 currentAngularVelocity, float deltaTime)
{
//Angular Velocity is in local space, but should be considered in world space.
var delta = math.mul(currentOrientation, currentAngularVelocity);
return ComputePD(error, delta);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static quaternion RequiredRotation(quaternion from, quaternion to)
{
var requiredRotation = math.mul(to, math.inverse(from));
// Flip the sign if w is negative.
// This makes sure we always rotate the shortest angle to match the desired rotation.
requiredRotation = requiredRotation.value * math.sign(requiredRotation.value.w);
return requiredRotation;
}
#endregion
#region QuaternionToEuler
// Note: taken from Unity.Animation/Core/MathExtensions.cs, which will be moved to Unity.Mathematics at some point
// after that, this should be removed and the Mathematics version should be used
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static float3 QuaternionToEuler(quaternion q, math.RotationOrder order)
{
const float epsilon = 1e-6f;
//REDACTED (just a boatload of math)
return EulerReorderBack(euler, order);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static float3 EulerReorderBack(float3 euler, math.RotationOrder order)
{
switch (order)
{
case math.RotationOrder.XZY:
return euler.xzy;
case math.RotationOrder.YZX:
return euler.zxy;
case math.RotationOrder.YXZ:
return euler.yxz;
case math.RotationOrder.ZXY:
return euler.yzx;
case math.RotationOrder.ZYX:
return euler.zyx;
case math.RotationOrder.XYZ:
default:
return euler;
}
}
#endregion
}
}
The system right after it is Jovian.Systems.VesselControlIntegrator, an older SystemBase style system still with one of those Entities.ForEach abominations.
I somehow have the nagging feelign that the “ScheduleParallel” in that one could be causing some of the issues (because it interacts with the physics components directly)
using Jovian.Components;
using Unity.Burst;
using Unity.Entities;
using Unity.Mathematics;
using Unity.Physics;
using Unity.Physics.Extensions;
using Unity.Physics.Systems;
using Unity.Transforms;
namespace Jovian.Systems
{
[UpdateInGroup(typeof(BeforePhysicsSystemGroup))]
[BurstCompile]
public partial class VesselControlIntegrator : SystemBase
{
[BurstCompile]
protected override void OnUpdate()
{
var dt = World.Time.DeltaTime;
//Hey, this indents kinda nicely for once. :)
//TODO: Convert into jobified ISystem
Entities
.ForEach(
(
ref PhysicsVelocity velocity,
in PhysicsMass mass,
in LocalTransform transform,
in Propulsion propulsion,
in ShipControls controls
) =>
{
//Standard thrust addition.
var linear_accel = controls.translate;
//Transform to local space, apply thrust limiters, then transform back to world space.
linear_accel = math.mul(math.inverse(transform.Rotation), linear_accel);
linear_accel = math.clamp(linear_accel, propulsion.ThrustMin, propulsion.ThrustMax);
linear_accel = math.mul(transform.Rotation, linear_accel);
velocity.ApplyLinearImpulse(mass, linear_accel * dt);
//Todo: Velocity dependent turning.
var angular_accel = controls.pitch;
angular_accel = math.clamp(angular_accel, -propulsion.TurnMax, propulsion.TurnMax);
var world = velocity.GetAngularVelocityWorldSpace(mass, transform.Rotation);
world += angular_accel * dt;
velocity.SetAngularVelocityWorldSpace(mass, transform.Rotation, world);
/*
TODO: Bring it back :)
//Flow Logic - make it feel more like a ship in water.
var linear_magnitude = math.length(velocity.Linear);
if (linear_magnitude > 0)
{
var linear_direction = math.normalize(velocity.Linear);
//Apply trim tensor logic allowing to preserve forward velocity through turns.
FIXME: Wait a sec, this isn't local right / up, this is world right / up? can't be right!
var lateral_flow = propulsion.TrimX * math.abs(math.dot(right, linear_direction));
var vertical_flow = propulsion.TrimY * math.abs(math.dot(up, linear_direction));
var flow = (lateral_flow + vertical_flow) * linear_magnitude * dt;
var inert = linear_magnitude - flow; //this is a remainder, so no dt
var linear_flow = flow * forward;
var linear_inert = inert * linear_direction;
velocity.Linear = linear_flow + linear_inert;
}
*/
}
).ScheduleParallel();
}
}
}
Zoomed in it looks like this:
However, everything is a bit faster now, I’m unsure what caused this (33 ms per frame now, rather than 50+)
Stepping along the lines of this frame timeline, I see another interesting one in the “second” execution block (no idea why PIDsystem comes up there again?):
However, if I turn CourseCommandSystem off, nothing improves (well I don’t get the double frame wrap all the time, but FPS still is ~30).
Curiously, all the waiting is now done by other systems. (which drove me crazy in profiling, because it’s not good that the wait time is attributed to the job doing the waiting, it should be attributed to the one being waited for)
Both CourseCommandSystem and MoveCommandSystem use Idiomatic foreach, look at a FixedList of pre-calculated waypoints (usually always empty in my scenario here), and write something to the PID component. (I wonder if that’s causing some kind of deadlock? I could possibly move them forward in time, to do their work before PIDsystem)
Or is idiomatic foreach really so terrible that I should jobify the systems first? They consume almost no time on their own, only if they have to wait on whatever it is they are waiting on. (the same thing CourseCommand is normally waiting on, unless it is turned off)
Ultimately, in LateSimulationSystemGroup is another point where the Semaphore hits the Fan, because these Mouse and Pip systems are VERY simple and just do a lot of distance and matrix calculations. Some of them do lead to structural changes, which they put into ECBs provided by BeginSimulationEntityCommandBufferSystem. Again, many of these are older SystemBase derived systems, but they kind of have to be, because they read from UnityEngine.Input stuff (like the mouse position, button state, and modifier keys)
I still believe the real problem must be something before CourseCommandSystem (or MoveCommandSystem).
First off, only code within the system’s main function (OnUpdate, OnCreate etc.) and code in Jobs can currently be burst compiled. Have a look at the bursted code using the Burst Inspector window and make sure you find your code there.
If you use the right Output dropdown Display Option (see documentation link above) you will get annotation that shows your original function calls embedded inside the assembler code, which should help you identify if everything was successfully burst compiled.
Also, to make absolutely sure that the function you want to burst compile is successfully compiled, you can add the [GenerateTestsForBurstCompatibility] attribute above the function alongside the [BurstCompile] attribute.
Second, is it possible that your application is somehow locked to 30 FPS? Maybe your time step is 1/30 Hz and that’s why you get some waiting in the player loop to ensure we reach exactly that framerate? Just a thought…
Not that I know of, I run this on a 120 Hz display and it used to run at over 300 fps in an earlier version of Physics. Sometimes, even this one does - usually if I clear the Library folder, in a couple of instances, it will be fast for a couple of editor starts.
I just verified - no calls to Application.targetFrameRate.
I think it’s back to finding out what job hangs here. Unfortunately, it still happens even if I disable all my systems.
Make sure to have also a look at the jobs in your profiler. If a system “hangs” it is likely just waiting for some jobs to finish which manipulate some data that is used as an input to the system, or some jobs might be reading data that this system wants to write to.
These data dependencies are very common and are modelled via job dependencies most of the time. In other cases, specific jobs are “waited on” by a system via JobHandle.Complete() which will block the execution at this point until the job has finished.
It might be obvious but if you see a JobHandle.Complete() call in the profiler in one of your systems, that’s exactly what is happening. To see which job is waited on you can either inspect the code of the system or simply look in the profiler’s timeline at which point the system will continue (when the JobHandle.Complete() bar ends). The job that finishes at this point will be the one that this JobHandle.Complete() call was waiting on.
I took everything out that was simulating physical entities (left in some instantiation code).
It all boils down to the initial culprit, BuildPhysicsWorld,
Funny enough, there should be no reason for BuildPhysicsWorld to be run 4 times, as 7ms would be fairly good (at some 30k entities).
Maybe it can never climb out of the deathloop once it’s in there? I do some pretty heavy loading earlier on, spread across a large number of frames, but still heavy.
I don’t know how to read the “Jobs” category in the profiler, but from my intuition, it doesn’t look like any job is stuck or has a particularly wide bar.
The only suspicious call at the start might be: Profiler.ParseThreadData 1.28ms Current frame accumulated time: 1.89ms for 10 instances on thread ‘Worker 9’ 9.59ms for 130 instances over 25 threads
EDIT: This call is indeed HIGHLY suspicious, the more I look at it.
In other frames, I even see
Profiler.ParseThreadData
3.51ms
Current frame accumulated time:
4.66ms for 6 instances on thread ‘Worker 6’
17.15ms for 102 instances over 25 threads
I’ll try to compare with the standalone profiler. EDIT: nah, it’s slow even without any profiler running… you got any ideas?
After pausing and unpausing the game, the “deathloop” doesn’t occur, but it’s still quite slow.
(and it shouldn’t be this slow - I’ve seen pretty much the same code, more of it, actually - run much faster in earlier Entities releases)
I’ve also removed the systems in LateSimulationStystemGroup, because now the “blame” was put on them: (they just spend their time waiting, though, they can run in under 1 ms easily)
The small stippled joblets are all (and yes, also the ones in “PresentationSystemGroup”, for some weird reason - is there, like, a second Physics world scheduling jobs outside FixedStepSimulationSystemGroup?):
Solver:parallelSolverJob (Burst)
0.000ms
Current frame accumulated time:
0.004ms for 85 instances on thread ‘Main Thread’
0.062ms for 1107 instances over 24 threads
(which is nothing, even in sum, so why do BuildPhysicsWorld and UpdateHybridChunkStructure end up waiting 100x their cumulative, parallelization-ignoring duration for them? thread switches are expensive, but clearly not THAT expensive, or?)
The one big chunk is:
Broadphase:smile:ynamicVsDynamicFindOverlappingPairsJob (Burst)
0.710ms
Current frame accumulated time:
34.88ms for 46 instances over 25 threads
Hmm, that accumulated time for that one looks suspiciously like almost my entire frame time. I presume it’s highly parallel; and 0.710 ms is pretty good for ~30k Entities in relatively close proximity.
As I also explained in this other post , if you see a long time bar on the BuildPhysicsWorldSystem (or any system for that matter), that does not at all mean that it’s this system which is “spending” the time. As you can see, in your case, the BuildPhysicsWorldSystem is doing the following on the main thread (the top timeline outside of the job category):
JobHandle.Complete()
That means, the system is waiting for a job to finish.
It’s the same as in the thread I linked above. From the looks of it, in your case, there are as much as 5 (!) physics steps done per frame which is caused by the FixedRateCatchupManager in an attempt to catch up with real-time simulation through a lower number of renders and a higher number of physics steps (each with the same time step of by default 1/60 seconds). If things start taking too long and real-time simulation can not be achieved by simply a single physics step per frame, that’s what happens. BuildPhysicsWorldSystem is updated in each of these 5 steps, but starting with the second one, they all have to wait until the previous step has finished, which means that all the jobs scheduled by the previous steps have to finish (collision detection, solver, etc.).
If you don’t want this to occur, always want a single physics step per frame and are fine with the simulation being a little slower than real-time, you need to change the rate manager (as explained here in this thread in an earlier post ).
And the Profiler.ParseThreadData entry is nothing to worry about. It’s from the profiler itself which needs to do a bit of work to produce the timeline and collect the data. You can safely ignore these calls.
Thank you - I think I did realize that.
How do I find out which job that is? I really can’t seem to find it.
How would you go and find that job? I thought the Profiler had some setting to show these semantic switches, but since I can’t even see the job it’s waiting for here, and there is stuff in PresentationSystemGroup that still waits on physics jobs, I’m at a loss.
To rule out some other possible causes, I am examining any MonoBehaviours that still interact with EntityManager in a way that creates sync points, but there are hardly any left (apart from a scene loader). So far, to no avail.
I also am okay with the catchup issue (but I still get poor frame rates - <30 fps - even with just one of these updates) even if there’s a single catchup.
But for development, I can make the game run with one tick per frame / no catchup and be fine with that. I just need to be able to tell if I’m way outside my intended performance envelope. That looked really good until recently, when these new issues came up and physics got really slow.