We just started the development of our new game using mostly DOTS. At first I was quite happy with the kind of performance I could have, especially when I had thousands of entities to process data on. But then, after we add more and more systems, a bunch of which only process data on queries of 1 or 2 entities, Something is becoming increasingly clear: there is gonna be a ton of systems, and they all have a non negligeable overhead. To the point where I’m starting to believe it would be more optimal to deal with MonoBehaviours instead.
So my analysis of the issue is this. For a really simple, run-of-the-mill, burst compiled JobComponentSystem that only schedules a job (See spoiler below), I’m getting a 0.007 to 0.01ms overhead on the main thread in a build. I know, it doesn’t sound that bad. But I wouldn’t be surprised that a finished game could have thousands of these. At 10+ms for the update loop alone, now it does look pretty bad. And that’s on a really good CPU. I know that if you organize the queries right and they are empty, the system is not run. But even then, the check alone seems to take around 0.001 to 0.002ms, so kinda significant on a large scale too.
I also have a single query in this example. If I need, for example, to get a singleton or to add/remove components, that makes everything even heavier.
So my question is this: can we hope for that kind of overhead to be lower in the future? Or is this an intrinsic overhead we have to deal with in DOTS? If that’s the case, I fear the performance by default claim would only be true for games that often have to deal with tons of similar entities or complex calculation. For the rest, it seems to be quite the opposite.
Example System
public class DoorInteractorIdleDoorAvailabilitySystem : JobComponentSystem
{
private EntityQuery idleEntityQuery;
[BurstCompile]
private struct SetIdleAvailableJob : IJobForEach<DoorInteractorAvailability>
{
public void Execute(ref DoorInteractorAvailability availability)
{
availability.IsClockwiseAvailable = true;
availability.IsCounterClockwiseAvailable = true;
}
}
protected override void OnCreate()
{
var idleQueryDesc = new EntityQueryDesc
{
None = new ComponentType[] { typeof(DoorOpened), typeof(DoorAnimation) },
All = new ComponentType[] { typeof(DoorInteractorAvailability) },
};
idleEntityQuery = GetEntityQuery(idleQueryDesc);
}
protected override JobHandle OnUpdate(JobHandle inputDeps)
{
var idleJobHandle = new SetIdleAvailableJob().Schedule(idleEntityQuery, inputDeps);
return idleJobHandle;
}
}
If you’re only acting on 1 or 2 entities, you probably shouldn’t schedule a job. When doing something on multiple threads there is always overhead, and it’s not always worth it.
Also you shouldn’t have thousands of systems in production, I don’t think the dependency solver would handle that well, and that kind of granularity would be very hard to follow. You’d have to skip through hundreds of files to follow the logic of what’s going on.
And with that many systems you’re not getting the benefits of DOTS, because each new system is likely hopping over to a new spot in memory. What’s the benefit of tightly packed linearly laid out memory, if you’re just going to query for 1 or 2 entities at a time in a thousand different systems? You’re essentially just randomly accessing everything at that point.
Afaik the system is indeed scaled to have thousands of systems. When there is no work to do the system should have zero overhead, but I suspect what you are pointing out is the overhead when there are some entitites (like a few).
I might be wrong here, so it would be interesting to hear someone from Unity to chime in here?
I’m also kind of curious about this because my primary motivation for working with DOTS is it fits better with how I prefer to have my data and logic broken up, and if you’re following the single responsibility design principle, you’ll likely have lots of systems. Something to keep in mind though, is even if you have thousands of systems, do you expect all of those to be running every frame? If your EntityQueries don’t have any any Entities, the job won’t run (simplification of how they decide how they should update, but you get the idea), so I’ve been trying to be mindful of that when designing systems.
Looking forward to someday being able to see my FixedUpdate systems in the debugger though so I can at a glance see which systems are running and which aren’t like I can with my default Simulation systems
There is indeed an overhead to an empty query system, as the check to see if the system is empty or not isn’t free. But yeah, I do try to be mindful of that.
Also, I won’t make systems which make a ton of unrelated stuff, because that will be a pain to maintain (single responsability principle) and I’m pretty sure it won’t be more optimal anyway. I also try to schedule burst jobs as much as possible, because it seems to use the main thread less than waiting for the query job to end on the main thread, then executing your work on it.
Edit: The post about mobile does make me more confident about the future performance of DOTS though. I guess I will keep myself up to date about that.
What if you could group dots systems by data load and low load systems could be amalgamated, my theory being that the overheads for DOTS could maybe be mitigated or amalgamated for these systems?
Or maybe some of the overheads for DOTS could be reduced e.g. Multi-threading to Single-threading.
After all your systems are probably just a few lines of processing code.
Alternatively can you manually combine systems and just switch/case depending on a flag?
Unity has said the will support “many” systems, but I don’t think they are talking about thousands. I’ve never heard anyone mention anything about scaling to thousands of systems. And as far as I’m aware no one using an ECS architecture (whether they are using Unity or not) organizes their game like that.
You’re write. Thanks! I hadn’t seen that. I still don’t think an architecture with thousands of systems makes sense from an organizational and performance standpoint.
Maybe once you get over about 20 systems you should re-analyse what they actually do as I would guess, you will start to see common atomic data operations. After all there are only about 23 basic math and logic operators.
We are 100% aware of current performance issues in small entity counts combined with many system update calls.
It’s caused by 3 seperate inefficiencies in that particular scenario all of which are either already fixed or assigned to a dev working on fixing it for the next release.
We have very much noticed the same thing in dots shooter production, given that client side prediction runs multiple system update ticks per frame against a single character only.
We are seeing total speeds of 10-50x in these specific low entity count, cases with the different bug fixes / codegen optimizations applied.
I’ll end by saying that the claim of performance by default in ECS / DoD only applies to many entities is not a correct assumption. DoD has a higher impact when you process more than one thing, but the expectation is that it is faster in all cases. If you have one thing, OO might be only a little bit slower while when you have many things it is a lot slower.
The current issues with low entity counts, are simply performance bugs that are in the process of being fixed.
First of all the beauty of ECS is that, it is relatively easy to replace multiple system, with one.
So I personally try to go with fine grained systems and only if I can find a measurable overhead, I start thinking about joining the systems.
I know and worked on Games, which used ECS with Single Responsibility Principal in mind and have big number of systems.
It was not Unity ECS, but from what I see and tried, Unity ECS should be at least on par.
From my experience, there are three techniques you can apply to reduce the overhead of a system call:
Early exit. The World calling an execute on a system should not be a huge overhead, however in the update it is better if you can avoid performing unnecessary work on every tick and identify if your system can return early.
Write better queries. A system iterates on a group of entities, try to design your components in a way that let your systems iterate only on “proper” set of entities / components. Best case scenario, your set is empty and the system has nothing to iterate on. AFAIK in this case the system will be marked as “not running” in Entity Debugger.
Think about grouping systems (ComponentSystemGroup) and disabling whole groups in cases where it is clear that in current state those system has nothing to iterate on.
That all sad, take my advice with a grant of salt and always profile ;).
From my experience some systems that that don’t take hundreds of entities and are processed in IJobParallelFor (or equivalent), and require a post update command buffer will have enough overhead that they’d be faster in a simple component system on the main thread. But it depends on the use case.
If you need to run commands on the main thread directly after, it may indeed be better to run on the main thread directly. but if you only need to modify values on existing components, it’s definitely faster to run a job, since you don’t have to wait for the query. The query will be run in parallel and then your job will run, and you’ll already be running another system on the main thread while this is happening. However, if you need to change components after your job, you have to wait for your jobs to complete, so it may not be worth it to schedule a job for that.
I already use several groups similar to this:
public class InteractGroup : ComponentSystemGroup
{
[Preserve] public InteractGroup() {}
protected override void OnCreate()
{
RequireForUpdate(GetEntityQuery(typeof(Interaction)));
}
}
It allows to group every system that work on the same component/query together, you know when all the work on those components is done, and if the query is empty, it culls everything with one check.
Sorry for reviving this, but I just tested the same thing with the new Entities release (0.2.0), and it seems that it’s basically the same as before for a simple system (about 0.007ms or more). Is there a reason I don’t see the improvements that were supposed to be in the release?
I’m kinda starting to fear that the ECS won’t be as performant as good old MonoBehaviours for games without a bunch of systems operating on a ton of entities.
[AlwaysSynchronizeSystem]
public class RotationSpeedSystem_ForEach : JobComponentSystem
{
// OnUpdate runs on the main thread.
protected override JobHandle OnUpdate(JobHandle inputDependencies)
{
float deltaTime = Time.DeltaTime;
// Schedule job to rotate around up vector
Entities.ForEach((ref Rotation rotation, in RotationSpeed_ForEach rotationSpeed) =>
{
rotation.Value = math.mul(
math.normalize(rotation.Value),
quaternion.AxisAngle(math.up(), rotationSpeed.RadiansPerSecond * deltaTime));
})
.Run();
// Return job handle as the dependency for this system
return default;
}
}
For low entity counts + many systems. This is now the most efficient way of writing that code.
It uses burst for the execution, but does not schedule a job. Instead it uses the Burst delegate compiler to run the job directly on the main thread without going through the job system which at the moment has too much overhead when the code being executed is as simple as above and just process 1-2 entities.
[AlwaysSynchronizeSystem] above makes it so that system isn’t passed a job handle but instead all systems that write dependent data will simply be synchronized before the system runs. This is an important optimization.
Lets call this a workaround for now. The end goal is to make it unnecessary to type out [AlwaysSynchronizeSystem].
We also want to make it more automatic so you can write code once and then configure globally if you actually do want to schedule jobs or execute on main thread instead etc. Additionally we want to will make sure that scheduling overhead is less than it is today.
Do note that you should profile in player. If you must profile in editor, then please turn off the job debugger especially for many systems with low entity count the overhead of job debugger can get huge.
But at least for the time being there is “a way” of doing it. But clearly there is still more for us to do in this area.
The above method seems to indeed be much faster, I get 0.01-0.02ms in profiler compared to 0.02-0.03ms of jobified (not sure how to get more granular with the profiler other than profiling manually). Base ComponentSystems show up at about 0.05ms. In development build (not sure in non-dev because profiler) these numbers jumped all over the place but the general trend was still the same.
I find it weird that JobComponentSystem seems to run main thread code more efficiently than ComponentSystem. With 1-entity 1-component the base ComponentSystem is not only slower but creates garbage as well (also in dev build). I assumed this was because of the lambda but JobHandle has the same lambda style and seems to not create any garbage, perhaps because of burst. JobComponentSystem’s ForEach also runs faster, perhaps also because burst.
So as of right now it seems like there is no reason to use ComponentSystems, since JobComponentSystems can now run on main thread. I imagine the same optimizations could be applied to ComponentSystems but then there are two ways to basically do the same thing, with JobComponentSystems being more powerful because it can do both (main thread and job). Is there still a use case for them? Is the plan to deprecate them later on?
Yes. With these new changes JobComponentSystem should always be used for both main thread & jobified code.
We introduced the new code-gen based Entities.ForEach only for JobComponentSystem in order to not have any breakage in existing code on upgrade.
Ultimately I think we need to merge the two into one. Having two seperate one’s wasn’t a great idea in the first place. It somewhat undermined the idea of performance by default. Now we are moving to a place where the simplest code is also the fastest. Thats ultimately where we want to be.
(Unless you really need, IJobChunk style level access to the guts, but i think that’s pretty rare at least in game code)
Fortunately we can do that in a way where it is easy to migrate and we can leave the old ones around for easy upgrade / deprecation reasons for a while longer. We hope to get this done before end of year.