System overhead

Hello. We make a mobile multiplayer game with netcode for entities. For low-end devices, we target 30fps. Currently, I allocate the performance budget on simulation (including netcode resimulation) 16ms, and the other 16ms for presentation (animation, cloth simulation, urp etc). We target a max of 8 re-simulation (we are using modified netcode for entities with input delay(Input delay to reduce resimulations count)).
16ms/8 = 2ms for one tick.
For the test I created an empty project, with a simple system.

public struct TestComponent : IComponentData
    {
        public float3 Value;
    }
    
    public partial struct TestSystemISystemBursted : ISystem
    {
        [BurstCompile]
        public void OnUpdate(ref SystemState state)
        {
            foreach (var (lt, c) in SystemAPI.Query<RefRO<LocalTransform>, RefRW<TestComponent>>())
            {
                //c.ValueRW.Value = lt.ValueRO.Position;
            }
        }
    }
    
    
    public partial struct TestSystemISystemNonBursted : ISystem
    {
        public void OnUpdate(ref SystemState state)
        {
            foreach (var (lt, c) in SystemAPI.Query<RefRO<LocalTransform>, RefRW<TestComponent>>())
            {
                //c.ValueRW.Value = lt.ValueRO.Position;
            }
        }
    }
    
    public partial class TestSystemSystemBase : SystemBase
    {
        protected override void OnUpdate()
        {
            foreach (var (lt, c) in SystemAPI.Query<RefRO<LocalTransform>, RefRW<TestComponent>>())
            {
                //c.ValueRW.Value = lt.ValueRO.Position;
            }
        }
    }

And I got this results for 12 entites (because we have 12 player in match =))

So if I have 0.01ms for the simple system, and I have 100 systems in prediction group, I will spend 50% of my frame budget. Right now we have ~70systems (include unity systems physics, netcode etc)

Is it true that systems have colossal overhead? I will write more tests, anyway.

I tested on device Xiaomi Mi 9T - Full phone specifications

While some of the overhead may be due to profiling, on mobile it is not great. However, the NetCode-First ECS section of this post provides a trick to avoid it. The various ways to use Unity ECS: A Starter Guide

1 Like

Thank you. I will try to implement a mega system, but it will be more complex in the actual project. For example, requireForUpdate or a system dependency requires reworking my systems.

Also, I found this Performance of 1 large system vs lots of small systems [benchmarks]

It’s looks sad =( How about “performance by default”

Unity is working on parallel scheduling systems. That would free up the main thread massively in the future. Problem is that it’s a feature too far away, so that won’t help you now.

Prediction on mobile and underpowered devices is problematic. Like Dreaming said, reducing systems and bundling up jobs as best as you can is the best way to reduce this problem. Also remove any RequireForUpdate and have a higher level system enable/disable certain groups, so those groups don’t have to individually check for components each time.

Also, try to test scheduling jobs. Those numbers don’t mean much if you’re doing heavy calculations on main thread. Not scheduling jobs and complaining about wasted main thread time is counter productive.

1 Like

First: Yes, systems add overheads. There are certain things that may be not obvious under the hood that add extra strain. You are not hitting any in your example because the systems are just too simple but here it is some:

  • SystemBase is a managed type, and non burstable. You can invoke burst compiled method from the OnUpdate, but that has some costs (it is not free)
  • Any RequireForUpdate<> add check (query check) that some entities with these components exist before run. Check is fast somehow, but many of these adds up
  • Using [RequireMatchingQueriesForUpdate] attribute save overhead when the OnUpdate is fat and there is not entities that match the queries used by the system. It save some CPU on hand. but still add other overheads.
  • System.API.Query is responsible to again pre-condition checks to the system, that are tested before the system run if [RequireMatchingQueriesForUpdate] is present. So more you add, more tests are done.
  • Job scheduling (the cost itself on the main thread) depends on the complexity of the job struct and data. The fatter, larger the copy, larger the cost.
  • Job scheduling (job itself) has also its own overhead and the choice of using jobs or main thread work largely depend on the workload. Sometime it is better to just do it on the main thread.

Prediction loop is heavy: It is called multiple time per frame (base on latency) therefore, even 100 of systems called 10-12 times easily become thousands.

4 Likes

Knowing the causes is one thing. Could you give out some suggestions on how to alleviate them? Or is there any plan to improve this for 1.x cycle? I think people might want to inquire more about these 2 points after knowing the causes. :thinking:

It would be super handy to have these as extra profiler markers in the profiler. Just seeing “System Update” makes it fairly hard to guess where the time is going and for large projects using DeepProfile isn’t really feasible. I understand we can use external profiling tools such as Superliminal to get the full picture but in most cases they are a bit too detailed.

Any plan to address this client side prediction performance issue to make it costs much lesser? Currently it’s still quite challenging to enable this feature at mobile platform.

There are way to improve the system overhead:

1 - Avoid use any SystemBase as much as possible.
2- Try to [RequireMatchingQueriesForUpdate] attribute for systems that does not need to run all time but only if queries match. Profile to see if that is actually give the benefit or it is adding just overhead for nothing, because you know the condition is true 90% of time.
3- Check if the above if better or if using RequireForUpdate is better (profile)
4- Avoid scheduling jobs that does nothing. Check if query are empty instead. Sure it cost on the main thread. But way less than scheduling and wait most of the time, not to speak passing the data to the job
5- For large job payload, prefer using ScheduleByRef

7- Avoid fetching component data or buffer via GetComponentData or GetBuffer via EntityManager (this will wait for deps). Instead, unless it is strictly necessary, prefer passing lookup to Jobs and retrieve that instead.

8- Balance small system vs bigger systems that do slightly more.

There are the first I can think of.

1 Like

Prediction will always cost a lot, there is not “magic” fix for it.

There are two possible easy solution:
1 - Allow for input lag (so you compensate some ticks by delaying local input client. Less reactive, but mobile may be ok up to 2-4 tick) but not save the day.
2 - We allow for fuzzy comparison to not re-run prediction. However, there are caveats there and also if you misspredit you pay the cost. That at that point you never properly budget in account, so game may suffer for random stuttering.

Usually it is physics that drag things down a lot, without the situation it is not terrible (depend what to do, and how many predicted entities you have).

If you have some profiling you want to share happy to take a look. Also the specs of the mobile you are targeting can help as well.