So I’m working on optimizing some code for a job system with multiple IJobParallelFor execution steps. Currently I am using a single component struct that contains all required variables for each entity in an entityquery. However, the individual execution steps do not always require all the data in the component, and certainly not all of it has to have both read/write permissions. So here is my question, which is better for performance:
a. A single component that includes all the data used in all steps, copied from a single native array with both read and write permissions. After which the entities in the entity query would be updated with a single call of entityQuery.CopyFromComponentDataArray()
b. multiple smaller components to allow for each step to get less data and be able to set some to read only or write only. Each component would be copied from its own native array. After which the entities in the entity query would be updated with a multiple calls of entityQuery.CopyFromComponentDataArray(), one for each component.
case (a) would be one struct with both read and write permissions that contains around 30 variables with a mix of int2’s, float2’s, bytes, floats, etc.
case(b) would be around five smaller structs with the same data, but able to have more control over setting read/write permissions.
I have a few other systems that use the same component, but only read/write a couple variables from inside it, which I’m guessing is causing some kind of performance hit from copying and moving all the unused data in the component, I’m just not sure if it is enough to be worried about.
This difference may be negligible, but I’m very new to DOTS so I’m still trying to understand where the biggest performance costs are with it, so any helpful info on general performance with DOTS would be welcome.
If you use Entities.ForEach with entityInQueryIndex instead of the entityQuery API, you can split the components into NativeArrays with more granularity while copying the data out of chunks.
Also, if you aren’t using the timeline view of the profiler, start using it. It will help you pin down what is actually expensive and where you should spend time optimizing. The other views are a lot less useful for DOTS.
A single struct with with all the data of an entity is no different than object oriented programming.
The whole point of data oriented design is cache-friendly layout of your data in RAM. You already said yourself…
This is one of the reasons why games perform poorly when scale increases. Remember: Ultimately, in the current computer architecture we use, memory bandwidth will always be the bottleneck (“Von-Neumann Bottleneck”) - sometimes multithreading does nothing because a program is memory bandwidth bound (for example, just adding ‘1’ to an array of ints is probably goint to perform just as well on 4 cores as it would on 100). When you iterate over the one array that holds the data and load in an entire struct to just - let’s say - modify the positions, you waste a ton of memory bandwidth - possibly even more so than if you programmed it in an object oriented fashion (because they are structs and not just pointers). It gets way worse when you then do some other stuff which causes the array to be evicted from the cache and at some later point in your frame decide to iterate over the array again, and again just to modify a single value in each entity.
Additionally, in 99% of all cases operations on arrays of such structs cannot be vectorized (SIMD). You want to keep your components as primitive as possible. Single Instruction, Multiple (pieces of the same) Data (type). Some people even go as far as not storing positions as float3s but rather as three different arrays (under the hood - three different component types in ECS): x, y and z positions. Most of the time that’s really overkill and does next to nothing but sometimes it can increase performance considerably.
Summing up: in ECS performance is the main focus. Many games don’t really require it. To have great performance, one needs to understand how computers work when it comes to data access at a low level, which naturally leads you to knowing how to layout your data. I suggest you read up a little on that if things are not clear.Recently I found this article which does a nice job: https://tech.innogames.com/unitys-performance-by-default-under-the-hood/
B. But you can do A and profile it to see if it reach your performance quota and optimize afterward.
The B is better because of the copying of data and access patterns. The rule of thumb for me is: “are these variables be accessed together?”. If so, they can be in the same component.
Float3 position parts (x,y,z) as different components is a somewhat extreme case. But it illustrates perfectly. Assuming that in your game, you can only move on the x-axis, there are two unused parts of that float3 which means 3x more data to copy than it needs to. It accumulates in the cache misses also. Similarly, in your case, you are copying data you don’t need while doing calculations on the rest of the variables.
So yeah, less is better, but remember to profile before, while, and after optimizing anything
So an update, appreciate the responses.
I broke my larger component into several smaller components and was able to specify read/write permissions more specifically. It did significantly speed up different jobs. I also found out I needed to change the batch sizes of a few jobs which also sped things up significantly.
As for using Entities.ForEach with entityInQueryIndex, I haven’t had a chance to test that out. I like the idea, but I admit I’m not exactly sure how I would handle a few things if I switched to entities.foreach, or if it would actually be a better method. What are the reasons to prefer entities.foreach over ijobparallelfor?
Happy to hear you seeing the benefits of data oriented design! Well done
Entities.ForEach just is an easy to use API - nothing more.
AFAIK, it always results in an IJobChunk which makes sense in ECS - although it introduces overhead at the beginning of your job being executed, which sometimes makes up 90% of the total code size. As so often: Abstraction comes at the cost of performance although it is very minor in this case - still, though.
If you work with NativeArrays and friends directly, I’d recommend writing your jobs (and thus choosing the specific job type yourself) and scheduling your Jobs manually.
That level of control is why went with using native arrays and ijobparallelfor initially, and just made more sense for my usecase. And currently I have several jobs scheduled in succession using the same native arrays of components to read and write to, and it’s working well enough for me, so I’ll probably stick with how I have it set up for now. Thanks for the help!
Alright. It is pretty clear you misunderstood what I said. I am not saying to move away from NativeArrays. NativeArrays are the right solution. What I am saying, is that you aren’t using the right tool to create and populate NativeArrays. You are using the EntityQuery API to extract the data into NativeArrays. It is a very simple API, but it doesn’t give you much flexibility. Entities.ForEach gives you more flexibility. You allocate your NativeArrays using the count of entities using EntityQuery.CalculateEntityCount(). Then you run an Entities.ForEach and use entityInQueryIndex to tell you what index in the NativeArrays to write your data into. And then outside your Entities.ForEach, you run the same IJobFor jobs you were running on those NativeArrays as usual. The difference with using Entities.ForEach is now you can “patch up” the data as you copy it. You can convert from SoA to AoS, scale some data by a constant, or even offset the index (you might need WithDisableParallelForRestriction() on your NativeArrays in your Entities.ForEach to do this).
Here’s an example: https://github.com/Dreaming381/lsss-wip/blob/master/Assets/_Code/SubSystems/Gameplay/BuildBulletsCollisionLayerSystem.cs#L21-L43
The final call requires a NativeArray which is an array of structs format. Entities hold the different parts of a ColliderBody in separate components. In addition, I want to scale the CapsuleColliders based on how far they traveled. I could have done all this using multiple EntityQuery.ToComponentDataArray() and then used an IJobFor to do this logic, but by doing this logic directly in an Entities.ForEach, I save myself the additional memcpy. I still end up with a NativeArray, and the rest of the logic uses NativeArrays from that point onward.