I have a bit of code that finds a sceneEntity based on two values in a component of that sceneEntity (that component was added to the sceneEntity during conversion). This code runs on the main thread using a forloop (variation 1).
I was looking to increase the performance of this by finding the entity in a ForEach (variation 2). However, that new solution seems to be slower (it takes roughly 2.5 times more time) and I was wondering why.
(Both variations here. I tested the difference in performance with the ProfileMarker and then commenting out one of the variations)
public class SomeSystem : SystemBase
{
private EntityQuery m_sceneSectionQuery;
private SceneSystem m_sceneSystem;
static ProfilerMarker s_PerfMarker = new ProfilerMarker("PerformanceMarker1");
protected override void OnCreate()
{
base.OnCreate();
m_sceneSystem = World.GetExistingSystem<SceneSystem>();
m_sceneSectionQuery = GetEntityQuery(ComponentType.ReadOnly<SomeData>(), ComponentType.ReadOnly<SceneSectionData>());
RequireForUpdate(m_sceneSectionQuery);
RequireSingletonForUpdate<DesiredDataSingleton>();
}
protected override void OnUpdate()
{
var desiredData = GetSingleton<DesiredDataSingleton>();
s_PerfMarker.Begin();
//-------- Variation 1 --------
NativeArray<Entity> availableSceneSections = m_sceneSectionQuery.ToEntityArray(Allocator.Temp);
Entity desiredSceneSectionEntity = Entity.Null;
for (int i = 0; i < availableSceneSections.Length; i++)
{
var sceneSectionEntity = availableSceneSections[i];
var data = GetComponent<SomeData>(sceneSectionEntity);
if (data.Foo == desiredData.desiredFoo && data.Bar == desiredData.desiredBar)
{
desiredSceneSectionEntity = sceneSectionEntity;
break;
}
}
availableSceneSections.Dispose();
m_sceneSystem.LoadSceneAsync(desiredSceneSectionEntity);
//-------- End of variation 1 --------
//-------- Variation 2 --------
var result = new NativeArray<Unity.Entities.Hash128>(1, Allocator.TempJob);
JobHandle job = Entities
.ForEach((in SomeData someData, in SceneSectionData sceneSectionData) =>
{
if (SomeData.Foo == desiredData.desiredFoo && SomeData.Bar == desiredData.desiredBar)
{
result[0] = sceneSectionData.SceneGUID;
}
}).ScheduleParallel(this.Dependency);
job.Complete();
var sceneGUID = result[0];
result.Dispose();
m_sceneSystem.LoadSceneAsync(sceneGUID);
//-------- End of variation 2 --------
s_PerfMarker.End();
}
}
There are only 12 entities being processed by this in my scenario. Is the overhead from scheduling the jobs causing the extra time, which would then not be worth it due to the small amount of entities processed?
Or is there another reason why variation 2 is slower? I had thought it to be faster due to it using the ForEach instead of GetComponents.
12 entities foreach is a bit meaningless to be honest. Too small sample.
Try with 1200 and 12k entities. You will see the difference.
Yes scheduling intorduces small overhead, providing there is matching query.
You can also add to foreach .WithReadOnly ( desiredData ), to see if that will improve anyhow.
Also, try to compare against IJob.
Using .ScheduleParallel definitely not worth it with low entity count.
Try using .Run() instead. It burst-compiles code by default, so it should be faster.
Also, scheduling and completing job instantly will cause a sync point, causing a stall on main thread. Try to avoid it.
Thank you for your replies! The entity count is indeed very low. I now wonder what amount of entities usually starts to become viable for these things, though that of course depends on what the job actually does as well…
@Antypodish.WithReadOnly(desiredData) does not seem to work due to it being a (Singleton)Component rather than a NativeContainer of any sorts.
@VergilUa using .Run() was indeed faster than either Schedule or ScheduleParallel, but still slower than the implementation without any ForEach. The .Run() is now roughly one ms slower (at an average of 3.117ms) than the for loop (at an average of 2.083ms). I don’t really know why it’s slower at this point, but it also hardly matters with the small difference and the small amount of entities.
The code for the new implementation with .Run() is:
//-------- Variation 2 --------
var result = new Unity.Entities.Hash128();
Entities
.ForEach((in SomeData someData, in SceneSectionData sceneSectionData) =>
{
if (someData.Foo == desiredData.desiredFoo && someData.Bar == desiredData.desiredBar)
{
result = sceneSectionData.SceneGUID;
}
}).Run();
m_sceneSystem.LoadSceneAsync(result);
Worth noting that there’s more to consider than just the overhead of scheduling the job. By accessing a component on the main thread you’re forcing any jobs that write to it to complete immediately.
Note that in most cases ComponentDataFromEntity would be faster than copying whole array of data (this is what GetComponent does under the hood), but it depends on the use case.
Alrighty, I ran some new tests. I mitigated as much of the potential factors as I knew how to:
I don’t know what I can do to check this
Made sure the desiredData is the last item the for-loop iterates over, so both implementations should go over each entity
Took the LoadSceneAsync out of the test (I previously ran the test ten times to account for any variable timing issues like it)
Both implementations do query two components (SomeData and SceneSectionData) via the EntityQuery and ForEach respectfully. In both cases they are read-only.
I believe this no longer applies to the new variation 2 implementation with .Run()
The results of the test are still quite in favour of variation 1 (with the for-loop GetComponent):
Make sure that in Jobs → Burst menu “Synchronous compilation” is enabled. Don’t run tests on first run (generally speaking, any test on JIT-ed platform should not be performed on first run).
Try disabling Burst Safety Checks, and Leak Detection, see if it makes any difference.
Also, try profiling with Deep Profile enabled, see what actually takes time.