This is whenever the profiler is capturing data. For this reason I made a toggle for disabling this type of profiling, as we’re typically quite rarely interested in it. To be fair, comments in the code states that it will eventually be toggleable by disabling the module in the profiler UI, but that the internal APIs aren’t there yet so for now the analysis runs anytime the profiler captures data.
One of the most annoying parts is that it spends a buttload of time on the LoadingWorlds for SubScenes, even if those don’t have any entities in them and the archetypes allocated in them are kind of irrelevant since they don’t have any systems to run. The reason is that the archetype count in those worlds have grown a lot over the duration when we’ve loaded more and more subscenes.
I do admit that we don’t have the best archetype utilization, but it works for our project.
Dug a bit further. The main cost seems to come from CalculateUniqueSharedComponentValuesCount() (see the attached screenshot). Removing the call to it made the whole analysis cost “only” 11ms in total.
To give an example, our server has about 10k archetypes and removing that analysis caused the server analysis job to drop down from 50ms down to 0.8ms.
It seems as though even if the loading worlds have zero entities, this still costs a lot. Since there should be zero chunks in those worlds, it seems to me that the cost in large purely comes from the immense amount of NativeHashMap allocations. All of the allocations caused by that function also cause the temp memory to grow so much that it eventually resulted in fallback allocations.
Here’s the function.
Edit:
Final update. Returning directly from the function if the chunk count is zero for the archetype also improved the performance quite drastically. By doing that it went from 300ms down to 30ms. So it simply seems like “NumSharedComponents” for an archetype is not updated whenever chunks are removed from it.
I might rewrite the job to be a parallel job to improve perf further eventually, but now it at the very least doesn’t completely kill the editor performance.
We have very few shared components that we have created ourself, the main bulk comes from RenderMeshes and SceneTags. The function does need to instantiate an HashMap and make the TryAdd() check for every single Chunk regardless though, as long as there is atleast one SharedComponent in the archetype. The only thing that gets affected by the amount of shared components would be the GetHashCode function that us used in that HashMap lookup.
I think we have one SharedComponent that exists on most of our main entities in prefabs, but it has very few unique states. Since the archetype has that one single sharedcomponent, the function demands the analysis.
Hi @Zec_1 , this is a known issue in Entities 0.50 that will be fixed when Entities 1.0 release. Sorry for the inconvenience. You are correct, the calculation for number of shared component different values is the highest offender performance wise. Taking a look back, the team decided to disable that calculation in Entities 1.0, until we find a better way to provide it without the performance cost.
Thanks @Zec_1 ! @robertg_unity : Honestly, I’m grateful for the reply and the acknowledgement of the bug but saying this will be fixed at the end of the year is kind of, I don’t know, what happened to hotfixes? When did your team become this inflexible?
Some patchwork fix would be okay. Essentially everything that is better than dealing with 100ms timings.
What I meant was that this is already fixed in our master branch. As for backporting the fixes, I am already looking into this, I just don’t know yet in which version it will land, so I can’t provide any version numbers yet.
In future it would be more helpful to use the code tags, if asking for help because then people can copy and paste etc. A screenshot of code is a bit…?
Not really asking for help nor did I have any modified code to provide. That code is the unmodified and costly function inside of the entities package that everyone has.
I just wanted wanted to showcase it since just reading it makes it quite clear that it would cost a lot to run for all archetypes every frame. I feel screenshots with proper code formatting are more readable when just showing off something. But yes, if I actually had any of my own modified code to show off I’d attach it through text. Thanks for the suggestion regardless