Dots Occlusion?

How do I do occlusion culling in ECS? One of the biggest resource hogs in my game is my grass, which are split in 32x32 blocks that can be rendered/culled. Right now the only culling is Frustrum Culling, but as you can see on those screenshots, that’s hardly close to enough. I can’t use Unity’s Occlusion not only because those blocks are generated on runtime, there is no way to query the umbra system for an arbitrary bouding box occlusion state. I can’t also just not draw the grass above me because the grass on cliff corners is visible from below.


OBS: Maybe this question should be asked on another forum? The system that render the blocks of grass is written in ECS, so maybe here is fine?

Check that out, it’s full custom but done with ECS and I think the code is good: GitHub - zcvdf/culling: Unity ECS implementation of a typical Culling system including Frustrum Culling and Occlusion Culling.
(not my project, just found it on github)

1 Like

I can give it a look, but considering it was last updated in 2020, I expect it to not be compatible at all with the current versions

It is compatible and updated, I tested it. But it is also not complete, and Unity’s said that the version the have for 1.0+ is far, far better.

Also… I don’t think you’re tackling grass right as a problem space. Unsolicited advice would probably be that you look into where the performance hit is. Most grass in games isn’t placed like that but recycled, and over distance it shrinks into the ground so they can use less draw calls, and have less overdraw. Are you using a depth prepass, and is it mobile? And what version of Unity? Many of these factors will majorly affect vegetation rendering more than culling would.

I actually have no idea how to do that and even made a thread asking how to go about it: https://discussions.unity.com/t/886679

No idea what that is, does it work well with Instanced rendering? Google mostly gives me a “Should I do or not do it” rather than “What it is”, but seems to be something to do with culling based on camera depth buffer

For the other questions:
There is a lod system, where the blocks of 32x32 don’t render all 256, but smaller and smaller fractions (128, 64, 32, etc) and the blade becomes a billboard for 2, 3, 4, etc blades of grass one besides the other. They are already placed in the 32x32 blocks in a randomish order, so just not rendering the last 128 will evenly occlude out blades of grass. This is done both in the compute shader that build the render list (place less of the block in the render list) and the vertex shader (change size and uv to render more blades of grass based on distance class)

Are you using the Hybrid Renderer to render grass or DrawMeshInstancedIndirect?

DrawMeshInstancedIndirect

Try simplest solution, to render anything within a radius distance.

If I understand correctly, you have chunks of the terrain, each 32x32, with a grass, which is done by a shader? You want to hide chunks, which are too far.

This solution however still will render stuff behind, so you may want something like rotation look at dot product wiht a tolerance, or something similar, to exclude instances behind.

In that case, why not do occlusion culling in a compute shader by testing the bounding boxes against the depth buffer?

3 Likes

Because that sounds like some pretty impressive witchcraft, and while I am interested, I stumbled onto this: https://docs.unity3d.com/Manual/CullingGroupAPI.html

So hopefully this official unity documented way of doing this, that it seems nobody knew it ever existed somehow, will work.
Otherwise I will try to figure out the bounding box vs depth buffer thing

I knew it existed, but you asked about ECS so I am assuming your potential occluders would be entities, in which case the CullingGroup API won’t take them into account and all your grass instances would report as visible.

You may want to investigate the Hi-Z culling. 100% GPU side, so ECS-agnostic. Zero scene setup. Downside is ymmv per-scene per-platform, so it’s not a silver bullet. For my project the culling cost was roughly the same as brute-force rendering cost. So again ymmv.

I personally used this asset, and even reported some bugs in their occlusion shaders.
https://discussions.unity.com/t/700427

It is not that hard to implement at all.
Problem with CullingGroupAPI is, it runs single threaded.
So it is no use for large scale instances count, if you want to run it efficiently.
You are better roll in own solution.
Or you may want start playing with tricks, grouping things together etc. and cull these groups respectively.

I think it’s multi threaded if I recall. @superpig threw it in one day, as you do if memory serves.

Is / can be group.onStateChanged multithreaded?
Also group.QueryIndices looks it can not accept NativeArray.
So I would say … it is meh at best, for use with DOTS.

I am curious, which part is / can be multithreaded.

The Unity culling part, it’s just C++ side as far as I know.

1 Like

Hippo is correct - CullingGroup uses the job system for the actual culling computations on the C++ side.

The state change events are sent on the main thread after culling is completed. When I first wrote it, the idea was that you’d use the state change events to activate/deactivate GameObjects, which you can only do from the main thread anyway. That was back in Unity 5.2 and things have obviously changed somewhat since then…

If you wanted to roll your own version in C# with Burst these days, I think the only part you’d really have trouble with is integrating it with Umbra (which, ironically, is how CullingGroup started - I wanted a way to do sphere-visibility queries against the Umbra data). If you aren’t using Umbra and just want frustum culling, I think it’d be pretty easy to replicate.

4 Likes

Having trouble might be a understatement: https://docs.unity3d.com/ScriptReference/UnityEngine.UmbraModule.html
It pretty much explicitly says that there is no public API.

The thing is, I have about 10k clusters of up to 1024 blades of grass (the clusters were a bit bigger than I was remembering), so even if it’s main thread only, I at most need to update 10k bitflags per frame on the main thread if the player is on the corner of the map spinning 180º frame perfect. This could be even further reduced if I group every 8~32 clusters into a big cluster for occlusion culling.

Once I have some time in between fixing bugs, I will try implementing it.

Final unsolicited advice :slight_smile:

Honestly, if you’re doing compute grass, it is better to just not draw vs draw and then cull, which is basically your approach. You’re doing the work then throwing it away.

Ideally, if you are using a compute shader to generate the grass (say you took it from minionsart) then your best bet is to modify your approach, it so that you maybe have just a few batches that surround the player, that are recycled. This is so fast you don’t need to cull.

And if you wanted to ‘cull’ in compute, you could do a simple dot product check from the world space camera to the vert or blade position you’re generating, and reject most of the work with one line of code in the compute shader. Culling in compute just means avoiding generating the data to begin with.

10K clusters is a bit too many, just for grass. Even merging these or culling them is expensive.

1 Like

The grass is generated and cache’d. I don’t generate it every frame.
I also need it both in C# and GPU since I can cut it for items and it bends when you walk over.
Frustrum culling (and hopefully umbra view culling) happens in c#, the clusters not culled are sent (start blade index + count / LODDistanceDivider) to a compute shader that unrolls it into a list of indexes for the instanced shader (so it can get the position/type/etc) from the ComputeBuffer that caches the whole thing.
I have no idea what are the consequences of each blade being all over the place in GPU memory, but that’s how I do.

1 Like