Occlusion Culling not yielding any real perfomance gains. What am I missing?

Hello fellow DOTS-enjoyers,
I have a procedurally generated island which consists of multiple entities with meshes and I am spawning a different vegetation on it.
Unity 2022.3.16f1, Entities + Entities Graphics 1.0.16
CPU: 13th Gen Intel(R) Core™ i9-13900H, 2600 MHz, 14 Core
GPU: 4070 Laptop Version

According to the Rendering Debugger everything works as intended:
Zoomed In:


Zoomed Out with the culling frozen:

Zoomed Out with frame unfrozen:

First of all you can notice that the shadows are not being culled. I am not sure what is causing this, but this cannot be the problem as a whole. I did the same thing without the camera rendering shadows and the result was the same.

Profiler looks like this in all cases (with shadows), with the MainLightShadow being a bit lighter on CPU when zoomed out so that there are no shadow casters. But that differences is 0.3-0.5ms.

I would also assume that the island size does not have that much of an impact on performance when zoomed in and only rendering like 10 Trees because in theory everything else is occluded, but this is not the case.
Funnily enough, it does not seem to make a difference to the CPU if all the trees are occluded, even when I add the “DisableRendering” or “Disabled” component to the vegetation and terrain entities so that they disappear from the game view, the performance gains nothing from that.
I am also kind of suprised at how CPU heavy the whole thing is. There is no logic as you can see in the profiler. I know that culling and batching needs the CPU but in a scene where its only static objects I feel like it should be mostly the GPU doing the work.

I feel like I am missing something here. In theory this set up should be a good usecase to gain performance by only rendering the visible terrain chunks and trees.
What am I doing wrong?

Help would be very much appreciated :slight_smile:

Cheers

I think you’re mixing up occlusion culling and frustum culling.
Frustum culling is always enabled and cannot be disabled.
Occlusion culling will skip rendering of occluded objects, which isn’t applicable in the pictures I see. Then it will just take up unnecessary CPU time.

Sorry, yes I mean frustum culling. I did not add any “Occluder” components, and in the profiler the occluding system is not using much CPU.
But shouldn’t the frustum culling enhance performance when zoomed in and only a fraction of the trees have to rendered (or just some mushrooms like in the first picture)?

All of your trees are still drawn to the shadow map. Directional shadows are drawn from the perspective of the light, not the camera. (Notice that objects outside the camera view or even behind the camera can still casts shadows in front of you.) You should be able to confirm this in the frame debugger/rendering debugger.

Your best bet would be to reduce shadow distance and cascade count. (For a top-down view, 1 cascade should be sufficient.)

1 Like

Thanks for the suggestion. The shadow cascade was already down to 1, as described above, even with shadows turned off completely there are no real performance gains :frowning:

When it comes to profiling in ECS, you almost always want to use the timeline view of the profiler, so that you can see which things are happening on the main thread and which things are happening in jobs. It gives you a much better picture of how things are working and what the potential problems are. I suggest you switch to that view, take a screenshot, and circle or highlight something you’d like to reduce on the CPU-side.

It is still unclear to me if you are CPU or GPU bound, but the timeline view will make that more obvious by showing whether the main thread or the render thread does more waiting around.

2 Likes

First of all, thanks for the timeline tip. After studying it for a while it makes a lot more sense to me why I should look at that. Now I can see that it is in fact the shadows which take up a fifth of the frame.
Maybe shadow rendering distance should be an adjustable graphics option in the game, so one can set it according to their hardware. Ot is there another way of culling shadows other than the render distance?
Besides that, I am not sure what the jobs in the presentation system group are doing that take up almost half a milisecond. That would be something I would want to get rid of if possible, other than that I think there is not much to do.

9625787--1367006--upload_2024-2-5_21-46-54.png

I definetly need to spend more time with the timeline.
All in all I am still suprised on how CPU bound I am with a scene that is basically only environment. It seems like a lot of stuff happending on the main thread with the render pipeline.
Sorry if my question if unclear or even dumb, this is the first time I am profiling a DOTS project.

First off, you have to expand these threads to see why EntitiesGraphicsSystem is waiting.
9626339--1367141--upload_2024-2-5_20-5-4.png
There’s clearly more than just rendering happening here. I think there’s some physics and transforms which I wouldn’t expect for a static scene.

This is also actually Entities Graphics code. And you may be able to improve it by using ISharedComponentData to group nearby entities together (a tiling system). However, it currently does not appear to be the bottleneck, so let’s ignore it for now.
9626339--1367147--upload_2024-2-5_20-10-57.png

And then there’s this:
9626339--1367150--upload_2024-2-5_20-13-39.png
Something is very, very wrong here. There’s not supposed to be this many waves of jobs happening at this time. If you don’t mind, I would really like a copy of this project to
A) See if I can reproduce using Entities Graphics
B) See if I can reproduce with my framework (which has a modified Entities Graphics that changes some of this logic)
C) If I can reproduce both, fix whatever is causing it

If you can’t share, then you can save the profile capture and send that to me. Or if that’s still too much to ask, then at least expand the jobs so that I can see what they all are.

Your confusion is 100% warranted! What you are experiencing is probably not what is supposed to be happening.

1 Like

These are the jobs:
9626585--1367192--upload_2024-2-6_8-46-6.png
It seems it is the ChildLocalToWorld computation that takes up a lot of time. Could this be due to every tree / bush having 4 LODs as a child entity? I removed all LODs except for one from the trees and it seems to enhance performance:
9626585--1367201--upload_2024-2-6_8-55-35.png
I would not have expected just rendering the full model at all times would yield better performance than using LODs.

Then there is the ComputeBoundsJob. I am not sure why it needs to re-compute the bounds every frame for non moving objects like terrain and vegetation.

I also created a repository for this project, I can send you the link and / or the profiler data that I saved.
And again thanks so much for taking the time to go through all of this with me :slight_smile:

LODs sacrifice a little CPU performance to improve GPU performance. However, since you are CPU-bound, going the opposite direction makes sense for improving performance. The other issue here is that because you are instantiating prefabs, the parent hierarchy is being preserved. I’d suggest that you add a system after the TransformSystemGroup (or early on in the frame that follows the frame that does the proc gen) that removes LocalTransform and Parent from all your LOD entities.

Oh. You have occlusion culling enabled in your project. Turn that off. Again, that sacrifices CPU performance to improve GPU performance, but it sacrifices much more CPU performance for much less GPU performance compared to LODs. It is also not very stable nor conservative.

I don’t believe that is necessary anymore. I think I solved all the mysteries.

If you have any further questions about why things are happening the way they are, let me know!

2 Likes

Yes, thank you so much for all the insights. I will try removing the parent relations and see if that helps with the LODs.
Cheers!