As an experiment I am testing the limitations of the current state of Unity DOTS on different hardware(And I am having loads of fun with it!). I’ve seen all the Unity demo video’s and basically they’re always CPU bound.
I have created a test scene were I can spawn any number of cubes, which will form a terrain based on perlin noise. Each cube has it’s own perlin noise calculation every frame created with the Job system to fully make use of burst and my CPU. The color is changed based on the height with a shader:
I have ran the test on a medium laptop, with CPU: i5-8250u (4 cores, 8 logic processors) and GPU GeForce MX150. Running the test with 71,289 cubes will reach my limit of 30 FPS. Both MSI Afterburner and Windows Task Manager say my GPU is 97% in use, reaching the limitations of my GPU. Here’s an image to proof my GPU is maxed out
I’ve also ran the test on newer hardware: CPU: i9-8950HK (6 cores12 logic processors,) and GPU GTX 1080 Max-Q. Reaching a good 291,000 cubes until reaching my 30 FPS limit. However, in this case, my GPU is not reaching it’s limitations, and my CPU is around 80% utilization according to Windows Task Manager. (For some reason it’s lower in MSI Afterburner, I think they calculate it differently?)
How come that I am not getting to almost 100% utilization of my CPU? Is this because the main thread has to give out jobs to the other workers, meaning there’s always a little idle time?
It really depends on how you write your jobs and if you are utilizing parallelization. Instead of looking at your system performance monitor, take a look with your profiler. That will show you how your jobs are spread across different jobs and give you a better overview of how you are using your CPU.
Regarding GPU, I assume you are using instancing and have enabled that flag on the material on your cubes?
Take a look at this thread : ECS + VFX renders 10 million entities .
I’ve done tests on my side using this project as a starting point and you can basically control at least 1 million entities from CPU using ECS combined with VFX graph.
On my side I am GPU bound.
Ah, looking at the profiler I see that the Jobs are idle for about 7 ms each. It seems they’re unable to perform “UpdateDynamicRenderBatches”.
I’m assuming this can only be done on the main thread. So for that reason I’m never able to utilize ~ 100% of my CPU? So that probably means adding more calculations using Jobs would increase the CPU usage while removing some cubes in the scene?
How would I do that in build? I am noticing a huge performance increase running the application as a standalone.
That’s the RenderMesh component for you. In my case, I also had about 6ms of usage on it on the main thread. It was spending most of its time on a job called ‘AddBatches’.
Interestingly enough, it did that with 20k of a certain type of entities, but not with 40k of another type of entities (units and buildings). So I’m not sure why it would have such low performance in one case but not the other, especially since I never ever touched that component after creation.
The workaround is to write your own renderer system to bypass RenderMesh, which is what worked for me.