So.. I was using ECS the wrong way... #Story

Ziboo · April 26, 2020, 5:00pm

Hello everyone,

I wanted to share my experience with ECS with you.
I spend a lot of time reading documentation, forum, best practices, etc…
Mostly my take away to be effective in ECS were:

Multi Threaded code with Jobs
Avoid sync points when possible

I was in a mindset where if my system was taking more that 0.01 ms it was not optimized, I was doing something wrong.

For a frame of reference, my game is World oriented, so I’m simulating hundreds of different Worlds.
So my guess at first was to use Jobs a lot to parallelized everything and use command buffers to avoid sync points.

But I didn’t had the performance I wanted, worked for a week on trying to optimize everything…

I modified all my system to use .Run() instead of .ScheduleParallel() and I gained 60 Fps.
I was amazed on how fast already is .Run() when working with Entities. It’s crazy fast !

Conclusion: don’t use Jobs until you really need it !

That might be a very dumb conclusion, but I don’t think it is enough said when you read about ECS and all. You think you need to do all those crazy Jobs, Chunk iterations, etc…

So for every beginner out their, don’t try to hard.
Build you systems with .Run until you have a big performance hit and really need to use Jobs.
I don’t know if it’s gonna be better in the future, but Scheduling a job takes a lot of time for simple systems.

Hope this help some people.

Cheers

PhilSA · April 26, 2020, 7:00pm

I’m gonna go out on a limb and theorize that you probably did something wrong in your tests. There’s almost no way this could be true if everything is done properly

Could you share some code examples of a system + job that runs way faster with .Run than with .ScheduleParallel?

RoughSpaghetti3211 · April 26, 2020, 7:05pm

I read it as, even without ScheduleParallel there was a 60 FPS gain and more potential if u use ScheduleParallel. But now I’m not sure how to read that.

PhilSA · April 26, 2020, 7:11pm

Oh… I see what you mean now

I still think the whole “wait until you have performance problems before you optimize” mindset is very very often a bad idea. An optimization that takes 1-2 hours when done early can cost you months if done later

It always depends, of course. But when the optimization is as obvious as working with the Job System, I think it’s definitely worth it to make the effort to use it properly from the start

Ziboo · April 26, 2020, 7:26pm

Maybe it’s really specific to my case, but Scheduling the job was indeed taking more time that just using .Run().
I’m not saying that you NEED to do that, and that profiling and optimizing is not important.
I’m just saying that .Run() can give you just what you need and in some cases better performance.
So don’t throw away .Run() just yet

I was just in the mindset that if I didn’t use Jobs it was wrong.

Krajca · April 26, 2020, 8:17pm

I think you need to remember that multithreading comes with the cost of copying data and job scheduling. Optimization don’t mean "now everything will be multithreaded.

Joachim_Ante_1 · April 26, 2020, 8:21pm

When you have small entity/chunk counts, the overhead of scheduling can be higher than just executing the code with .Run(). Do note, that we are doing a lot of work in order to make that not be so…

Specifically we are:

Adding support for completely bursted struct based systems. So a system itself can be burst compiled.
Doing a bunch of optimizations in IJobChunk & JobScheduler to reduce overhead.

Essentially you can say right now what DOTS is truly amazing at is scale on the axis of large entity counts.
But what we are focused on optimising now is speed on the axis of number of systems with small amounts of entities.

Anonymous2026 · April 26, 2020, 8:23pm

Mike Geig mentioning its sometimes more efficient to run simple jobs on the main thread, vs incur job overhead

Ziboo · April 26, 2020, 8:30pm

I’m using Burst everywhere.
For the sync points, I’m trying to avoid them, but in the current state of the debugging tools, it’s also hard to debug where it happens or maybe lack of experience.

That’s good news.
It’s pretty much what I had I guess.
I have a lots of Worlds, with lots of independent systems, not a lot of entities (~1000) per world.

PhilSA · April 26, 2020, 8:44pm

that could make sense. If you have let’s say 1000 ECS worlds with 50 systems/jobs each, that would mean 50000 jobs to schedule. Could be where the overhead of ScheduleParallel comes from. But if you truly have a huge quantity of Worlds, maybe a single-ECS-world setup would perform waaaaay better, and a different strategy could be used to represent the concept of a “world”

Still, it’d be interesting to see code examples and project settings. I could imagine .Run() performing a bit better than .ScheduleParallel() at low entity counts, but the 60fps gain is a bit suspicious (I’m assuming you went from something like 30 to 90fps, and not 400 to 460fps, which would be a relatively small gain). Maybe there’s an easy fix

Some thoughts:

did you try running this in a build?
is Burst Compilation enabled in the top menu option?
are safety checks and Leak detection disabled?
is Burst compilation set to Synchronous? (if not, performance will be bad for a pretty long time after you press Play, but will eventually settle down)
Do you exclusively use the new math types/operations from Unity.Mathematics in jobs?
Maybe your Jobs are used in unintended ways
etc, etc…

Ziboo · April 26, 2020, 9:14pm

PhilSA:

that could make sense. If you have let’s say 1000 ECS worlds with 50 systems/jobs each, that would mean 50000 jobs to schedule. Could be where the overhead of ScheduleParallel comes from. If you truly have a huge quantity of Worlds, maybe a single-ECS-world setup would perform waaaaay better, and a different strategy could be used to represent the concept of a “world”

Still, it’d be interesting to see code examples and project settings. I could imagine .Run() performing a bit better than .ScheduleParallel() at low entity counts, but the 60fps gain is a bit suspicious (I’m assuming you went from something like 30 to 90fps, and not 400 to 460fps, which would be a relatively small gain). Maybe there’s an easy fix

Some thoughts:

did you try running this in a build?

is Burst Compilation enabled in the top menu option?

are safety checks and Leak detection disabled?

is Burst compilation set to Synchronous? (if not, performance will be bad for a pretty long time after you press Play, but will eventually settle down)

Do you exclusively use the new math types/operations from Unity.Mathematics in jobs?

Maybe your Jobs are used in unintended ways

etc, etc…

The gain was in a Build from 40fps to 100 fps. I know the editor has a lot of overhead.
I use Unity.Mathematics, yes
Maybe your Jobs are used in unintended ways ?
Maybe ^^ that’s were it’s hard to say. But like Joachim_Ante said, I have a lot of simple short jobs, so I guess I was paying for the scheduling more than the job it self

A note:

I have a lot of system that needs to work on others entities, ComponentDataFromEntity, I’m checking neighbours, etc…
I have a AMD Ryzen 9 3900X, 12 dual cores. Does having a lot of cores also impact the time to schedule Jobs ?

l33t_P4j33t · April 26, 2020, 9:44pm

It might have something to do with the multiple worlds and system groups bit… no? there’s something funky going on there imo on my end.
my performance gets destroyed when i run anything in ghost prediction system group, each system takes up 1-2ms at least even if its just changing one rotation component on one entity, as is the case with body rotation system for me

are you using lots of worlds to simulate different clients in netcode?

5768659--607957--Screenshot 2020-04-26 at 22.38.39.png

Ziboo · April 27, 2020, 12:14am

I’m not using multiplayer.
I just have multiple custom worlds.
I remove Rendering and Transform Systems on the world that is not currently shown to the player though to gain performance

RoughSpaghetti3211 · April 27, 2020, 12:37am

100% agree

Joachim_Ante_1 · April 27, 2020, 11:09am

Thats because the prediction group has to run multiple times per frame…

vildauget · April 27, 2020, 4:21pm

I can’t help but notice nobody mentioned .Schedule() as the third alternative.

In a simple system of mine (finally get some working, yay), I get 0.06 ms with .Run(), 0.14 with .ScheduleParallel(), and only 0.03 with .Schedule() on a normal frame.

It might be because I’m having to account for possible of structural change, so that .Run() creates a new sync point, I guess.

Just don’t forget Schedule() as an option, it’ll be sad left alone in the dark.

the example system

    protected override void OnUpdate()
    {
        var worldSquareCreateDistance = _settings.worldSquareCreateDistance;
        var worldSquares = _worldSquares;
        var ecb = m_EndSimulationEcbSystem.CreateCommandBuffer().ToConcurrent();
        var archetype = _archetype;
        Entities
            .WithName("CreateNewWorldSquares")
            .WithAll<PlayerTagComponent>()
            //.WithStructuralChanges()
            .ForEach((int entityInQueryIndex, in WorldSquarePositionComponent worldSquare) =>
        {
            // create new worldSquares as necessary   
            for (int x = worldSquare.Value.x  - worldSquareCreateDistance; x <= (int)worldSquare.Value.x + worldSquareCreateDistance; x++) {
                for (int z = (int)worldSquare.Value.y - worldSquareCreateDistance; z <= (int)worldSquare.Value.y + worldSquareCreateDistance; z++) {
                    if (! worldSquares.ContainsKey( 'x' + x.ToString() + 'z' + z.ToString() ) ) {
                        var entity = ecb.CreateEntity(entityInQueryIndex, archetype);
                        ecb.SetComponent(entityInQueryIndex, entity, new WorldSquarePositionComponent{Value = new int2(x,z)});
                        //var entity = EntityManager.CreateEntity(archetype);
                        //EntityManager.SetComponentData(entity, new WorldSquarePositionComponent{Value = new int2(x,z)});
                        worldSquares.Add( 'x' + x.ToString() + 'z' + z.ToString() , true);
                    }
                }
            }
        }).Schedule();
    }

brunocoimbra · April 27, 2020, 4:32pm

Pål Høymork:

I can’t help but notice nobody mentioned .Schedule() as the third alternative.

In a simple system of mine (finally get some working, yay), I get 0.06 ms with .Run(), 0.14 with .ScheduleParallel(), and only 0.03 with .Schedule() on a normal frame.

It might be because I’m having to account for possible of structural change, so that .Run() creates a new sync point, I guess.

Just don’t forget Schedule() as an option, it’ll be sad left alone in the dark.

the example system

    protected override void OnUpdate()
    {
        var worldSquareCreateDistance = _settings.worldSquareCreateDistance;
        var worldSquares = _worldSquares;
        var ecb = m_EndSimulationEcbSystem.CreateCommandBuffer().ToConcurrent();
        var archetype = _archetype;
        Entities
            .WithName("CreateNewWorldSquares")
            .WithAll<PlayerTagComponent>()
            //.WithStructuralChanges()
            .ForEach((int entityInQueryIndex, in WorldSquarePositionComponent worldSquare) =>
        {
            // create new worldSquares as necessary 
            for (int x = worldSquare.Value.x  - worldSquareCreateDistance; x <= (int)worldSquare.Value.x + worldSquareCreateDistance; x++) {
                for (int z = (int)worldSquare.Value.y - worldSquareCreateDistance; z <= (int)worldSquare.Value.y + worldSquareCreateDistance; z++) {
                    if (! worldSquares.ContainsKey( 'x' + x.ToString() + 'z' + z.ToString() ) ) {
                        var entity = ecb.CreateEntity(entityInQueryIndex, archetype);
                        ecb.SetComponent(entityInQueryIndex, entity, new WorldSquarePositionComponent{Value = new int2(x,z)});
                        //var entity = EntityManager.CreateEntity(archetype);
                        //EntityManager.SetComponentData(entity, new WorldSquarePositionComponent{Value = new int2(x,z)});
                        worldSquares.Add( 'x' + x.ToString() + 'z' + z.ToString() , true);
                    }
                }
            }
        }).Schedule();
    }

It remembers me of that feature request: Request - .ScheduleAuto(dep,chunkThreshold)

PhilSA · April 27, 2020, 4:48pm

Out of curiosity, what’s a rough estimate of your nb of Worlds, and nb of your own jobs that are ran per World?

And what is the main reason for a separation into many Worlds in your project? Maybe you are using lots of worlds when you don’t really have to

Let’s say we call an ECS world a “World”, and your in-game worlds a “level” for the sake of readability. You could have:

a “visibleWorld” containing the entities of the level that’s currently visible
an “invisibleWorld” containing the entities of all the levels that are not visible. All in the same ECS World
Have your levels be represented by an Entity with a DynamicBuffer on it, containing all the Entities that belong to this level. This way you know which Entities to transfer to the visibleWorld when a level switch happens
If necessary, you can also have a BelongsToLevel (containing the Entity of the parent level) component on your entities so you can retrieve the parent level
The level Entity can also contain any additional data that is specific to that level

This kind of setup would definitely reduce the amount of jobs to be schedule by a lot, and will allow you to make good use of parallelization because nearly all of your entities will be in the same World. Someone correct me if I’m wrong, but I think the main reason to put things into a different world is when there are differences in the types of systems that are run, and/or the frequencies at which they are run

Ziboo · April 27, 2020, 7:35pm

PhilSA:

Out of curiosity, what’s a rough estimate of your nb of Worlds, and nb of your own jobs that are ran per World?

And what is the main reason for a separation into many Worlds in your project? Maybe you are using lots of worlds when you don’t really have to

Let’s say we call an ECS world a “World”, and your in-game worlds a “level” for the sake of readability. You could have:

a “visibleWorld” containing the entities of the level that’s currently visible

an “invisibleWorld” containing the entities of all the levels that are not visible. All in the same ECS World

Have your levels be represented by an Entity with a DynamicBuffer on it, containing all the Entities that belong to this level. This way you know which Entities to transfer to the visibleWorld when a level switch happens

If necessary, you can also have a BelongsToLevel (containing the Entity of the parent level) component on your entities so you can retrieve the parent level

The level Entity can also contain any additional data that is specific to that level

This kind of setup would definitely reduce the amount of jobs to be schedule by a lot, and will allow you to make good use of parallelization because nearly all of your entities will be in the same World. Someone correct me if I’m wrong, but I think the main reason to put things into a different world is when there are differences in the types of systems that are run, and/or the frequencies at which they are run

I don’t have a specific count for the number of Worlds or Entities the game could have at the end.
I think that you’re right, I guess it’s possible to not use different Worlds at all.
But I decided to uses different Worlds for simplicity I guess and/or lack of experience, for instance Debug entities with the Entities Windows using World Filtering.

Here is a small exemple:

var cropsEntities = this.cropsStorageQuery.ToEntityArray(Allocator.TempJob); //Get All Entities that are Crops
storageBuffers = this.GetBufferFromEntity<StorageSlot>(true); //Get All Storages (IBufferElementData)

this.Entities
    .WithNone<FlyDestination, TargetEntity>()
    .WithName("CropsGathererRobots_FindTarget")
    .WithReadOnly(storageBuffers)
    .WithDeallocateOnJobCompletion(cropsEntities)
    .ForEach((Entity entity, int entityInQueryIndex, in CropsGathererRobot robot, in StationReference stationReference) =>
    {
        {
            Entity cropsTarget = Entity.Null;
  
            var maxDist = float.MaxValue;

            for (var i = 0; i < cropsEntities.Length; i++)
            {
                var cropsEntity = cropsEntities[i];

                //<--- HERE I would need to check if the cropsEntity is in the same "Fake World" than my robot Entity
            
                if (!storageBuffers.Exists(cropsEntity)) //Check if Crops has a Storage
                    continue;

                var cropsStoragesBuffer = storageBuffers[cropsEntity];
            
            
                //Do something with Storage
            }
        }
    
   
    }).Run();

If I go your solution, it means that in every ForEach lamba that I do, I would need to filter out all my components / entities per “fake world”. That could be a lot of boiler plate code, where Worlds just do it for me.

I could use a SharedComponentData like shown in the doc:

public class ColorCycleJob : SystemBase
{
    protected override void OnUpdate()
    {
        List<Cohort> cohorts = new List<Cohort>();
        EntityManager.GetAllUniqueSharedComponentData<Cohort>(cohorts);
        foreach (Cohort cohort in cohorts)
        {
            DisplayColor newColor = ColorTable.GetNextColor(cohort.Value);
            Entities.WithSharedComponentFilter(cohort)
                .ForEach((ref DisplayColor color) => { color = newColor; })
                .ScheduleParallel();
        }
    }
}

But that’s pretty much the same thing as each Worlds Scheduling the job I think (minus systems overhead for sure)

Ziboo · April 27, 2020, 7:49pm

Also after thinking about it.

If I have only one “Fake World”, even a small change will affect the chunks / creating sync points, invalidating arrays, where if I’m Using ECS World, if a World is not really active (not a lot of things happening) at least it will not affect the other Worlds.
Also in a future, if I want to tick some Worlds slower that would be also easier.

It would need a try to see if paying the Worlds / Systems overhead is smaller/bigger than everything in one

Topic		Replies	Views
Performance overhead of DOTS systems Unity Engine Entities , com_unity_entities	50	14731	March 22, 2020
ECS code overhead for mobile games Unity Engine Entities , com_unity_entities	27	8179	November 26, 2019
Performance of 1 large system vs lots of small systems [benchmarks] Unity Engine Entities , Performance , com_unity_entities	24	2912	March 6, 2021
Benchmark performance of ecs systems Unity Engine Entities , com_unity_entities	28	11751	August 29, 2018
Share your multi-threading tips and tricks. Unity Engine Entities , Burst , Performance , Unity-Mathematics , com_unity_entities	15	5054	May 31, 2022

So.. I was using ECS the wrong way... #Story

Related topics