[quote=“LaneFox, post:260, topic: 930675, username:LaneFox”]
We run it through a pipeline that converts it into glTF. It detects mesh duplicates and reuses the mesh for those instances, so there’s definitely some room for it to work, and definitely room for occlusion culling to make huge improvements.
But, I do agree it’s not as simple as their test cases using mostly copies of a few objects. I just don’t quite understand why perf is actually worse.
[/quote]As I understand it, this GPU occlusion culling implementation seems to be mostly made for heavily GPU-bottlenecked scenarios where there’s a lot of expensive overdraw. It’s honestly pretty hard to imagine given how CPU-bottleneck the engine usually is (at least on desktop).
But yeah, it’s mentioned in the Unity 6 conference that it uses an additional pass dedicated to building some kind of occlusion buffer if I recall correctly, so increased draw calls are expected if you enable this GPU occlusion culling option. It’s a tradeoff between draw calls (CPU load) and overdraw (GPU load).
I’m pretty curious about your scene setup/assets for the render to still have that many draw calls in the better case, though. I’d expect a stadium to have a lot of standardized/reusable parts in general. Did you check the Frame Debugger to see if there’s anything you can do to collapse the draw calls further?
You might want to disable static batching to avoid conflicting with instancing and probably the SRP Batcher as well.
We may just misunderstand what its for then. The expectation was to just have regular ol’ Occlusion Culling, but processed on the GPU, which is the opposite - moving load from CPU to GPU.
If the feature works as you describe, it seems rather misleading in name.
[quote=“LaneFox, post:262, topic: 930675, username:LaneFox”]
We may just misunderstand what its for then. The expectation was to just have regular ol’ Occlusion Culling, but processed on the GPU, which is the opposite - moving load from CPU to GPU.
If the feature works as you describe, it seems rather misleading in name.
[/quote]In the current renderer context, I agree. This implementation might work better with MultiDrawIndirect/Work Graph rendering with which much further collapse of draw calls is possible, so most of the load would be on the GPU (although better parallelized) and it might be possible to discern improvements to the general frame time. Multiplying draw calls by even 3 when you just have a few dozens shouldn’t affect performance in a meaningful way. Sadly, Unity still has limitations when it comes to batching (especially non-MeshRenderer objects), so if anything, this GPU Occlusion Culling option was added too early?
I honestly have no idea where things are going right now. With the way skinning and textures are handled currently, MultiDrawIndirect on its own should have a limited impact. It’s nice that some overhead was shaved by making some buffers persistent (feature as old as OpenGL 4), but it’s only a part of the problem (driver overhead). Though I have to say that my understanding of a “modern” rendering pipeline is probably too dated. Vulkan (and probably D3D12) have different costs for various actions as opposed to OpenGL and D3D11, so what I know would work on the old rendering APIs might not be appropriate in newer ones.
When you say “the system is also compatible with Umbra occlusion culling”. Does that mean it also works with Unitys CullingGroup API? Unity - Manual: CullingGroup API
So someone correct me if I am wrong on this please, but it sounds like the following.
You can have somethings use the Umbra Occlusion Culling where it still offers slightly better performance or customization.
The next part is mentioned in the documents and I will link them at the end.
The new GPU Resident Drawer is based off of the BatchRenderGroup API to draw game objects with GPU instancing.
If the objects being drawn are not compatible with the GPU Drawer than it will fallback to Unity’s drawing with GPU Instance and thus would be still able to be used in the Culling Group API.
Does GPU Resident Drawer replace GPU Instancing in the Material settings? Why are there two instancing solutions? Is it intended to be used with both enabled?
Speaking of which, do we have to have GPU instancing enabled on a material for GPU Resident Drawer to render it optimally, or is this setting just ignored when using GPU Resident Drawer?
This is the demo of Unity.
I created 3 cubes for character injuries, which should be used for Hybrid Batch Group. However, an error occurred: GPU Instancing Shader variant flip. Resulting in only being able to perform SRP Batch
This is a brand new batch failure error, I don’t know what it is
Closing PlayerArmature and redisplaying it will restore the Hybrid Batch Group
It’s frustrating that the stats and scene view aren’t being updated with culled data. Likewise, you won’t see Tris update either when looking at Stats in the game window. Also, the color box “Test overlay” isn’t helpful because they don’t tell you what the colors mean. You can kind of tell Red is occluded, but what is cyan or blue?
The only place you can see it is in Rendering Debugger / GPU Resident Drawer. Check “Display Culling Stats” look under “Occlusion Culling Events” you’ll see Culled Instances there. You can also see them in the frame debugger as well.
They do update the Tris value in stats when turning on and off the “GPU Occlusion Culling” option in the project settings/rendering area. However, I think this is calculated in however they’re calculating the hi-Z. Like you said, if you mask the frustum of the camera, well, it should be near 0.
The big question is, why isn’t FPS showing an increase for culled gameobjects? I think it has something to do with GPU culled data not going back to the CPU, so the system is unaware less tris are being drawn, etc, etc.
Why not just use Mesh.CombineMeshes at runtime assuming your CAD models all have same material? Or do you need 100% accurate GPU occlusion culling?
I also have dynamic CAD setup with script that combines all models with same materials at runtime into “chunks” which are baked in a grid format so that camera culling still works (as opposed to one huge mesh always rendering and in memory). Mesh.CombineMeshes is like 95% faster than any built in batching or instancing option present. Static batching still in 2024 is really really poorly optimized, and it doesn’t help how complicated documentation is regarding using static batching with gpu instancing, dynamic batching, and now all these unity 6 batching features that 100% slow down performance.