URP, 10 million default cubes, is this expected results? (using that sample script)
The sample script does not use Burst to keep the sample simple. If you improve it and use Burst jobs to fill the large arrays (I think in the sample itās just the visible instance indices), you should see much better performance.
Alternatively, if you just want to test best case performance, you can make a cached copy of the array and use UnsafeUtility.MemCpy to copy it into place. This is not what a real game would do, since it would assume static visibility, but for a simple test like this it could be OK.
Oh now you have done it! This is just the ticket and I think will help a major ton of devs! Thank you, and I hope URP would get some kind of new static path like this for classic projects. So many would get a free boost from it. Until then I am going to poke around with gratitude!
Thank you!
Just to be 100% sure I got it right. When we upload data (or register it) using the API, does it remains on the GPU untill we unload it?
The new code seems to solve some issues and I think it will work correctly with multiple lights.
@mgear I also tried to do your test using GPU culling, but my GPU (GeForce 1050GTX TI) cannot handle so much data unfortunately.
I did a test with around 2 mil cubes (127x127x127) here are my results:
CPU time to emit the visibility is 0.034ms and to emit the draw calls is 0.017ms
Total time on CPU with updating of the GPU data is 0.11ms
Iām sure as soon as I fix the GPU occlusion culling it will run even faster.
Of course my version doesnāt handle correctly all lights and uv lightmaps per instance. So getting speed improvements inside Unity it will always be better. Iām waiting for the release of Dots so I can fully use this new features.
Thank you for sharing!
Will there be default BRG optimizations enabled? I donāt know much about graphics programming so just checking if we will only have an API to use or if this will also have a default implementation in the engine.
I could just add my assets to the scene as usual and maybe follow certain rules and have it work.
I just have to ask. What method did you use to implement gpu occlusion culling and how?
For GraphicsBuffers you create and upload data to (matrices, overridden properties and so on) it is persistent and up to you to update and manage.
Meshes and Materials are a bit more complicated. Once you register them they will be recognized by the system but if some code deletes them the BRG will tag them as deleted and will just stop drawing any draw commands referencing the deleted mesh/material.
The data you provide in the culling callback is transient. Itās only used for one frame and then freed.
We are not replacing any unity part with this as it is now. Itās the foundation for the Hybrid Renderer to be able to render entities, and it is usable if you want to write a custom renderer. The scripts linked in this threads are just examples of what is possible, and we may use something like this in the future so speed up general rendering.
Awesome, hope this happens!
Hi @YuriyPopov ,
The occlusion culling is following the standard approach from Nanite. Use the previous frame to do the culling for the current frame.
Render all visible instances, next using the new depth do another visibility pass and render the rest of objects.
Right now I have an issue with the algorithm and it is not working as expected I need to debug.
But the current implementation is using frustum culling, size culling and lod switching, everything from the GPU. Still one draw call per material, but doesnāt care if you have different meshes.
If I can have more than one light it will be usable, so thatās why Iām waiting for improvements in Dots
I hope I answered your question.
I dont get how you do GPU culling with the this api at all. Do you first gather data, feed a compute shader, then read the buffer back on the cpu and omit the draw commands ?
Just for clarifications, the code that I did it is not using this API (only URP), but I think it can be adjusted. The basic idea is to store all the data in a persistent buffer on the GPU. Execute a compute shader that will generate a list of instances that needs to be rendered. Using a draw indirect emitted from the CPU for all different materials and the let the compute to generate the actual data for the draw call.
Iāve sent you a private message with more info about this if you would like to know more (I donāt want to pollute this thread more that I already did).
I have found that Unity already Expose new Api for this Graphics.RenderMeshIndirect and it will support multi-draw paradigm in future.
Please
make BRG interface like this api so it can easily enable usage of low level multi-draw api in future.
May be expose few different ways to store draw commands, like in new Graphics.RenderX method family, so we can provide draw commands from GraphicBuffer and others
Goals is to efficiently draw many different mesh instances with same material, like:
- draw all different props on level in one go
- draw all chunks of VoxelWorld in one go
- draw one district of level (one constructor) in one go
- perform culling and LOD selection on GPU side write commands and render fast batches from GraphicBuffer
- ā¦
We have an experimental implementation for almost all of that (regular mesh draw, direct procedural, indirect procedural) in a branch. On DX11 multiple indirect draws is emulated as a loop on CPU side and of course there is no way to provide a late command count as a buffer so all commands in the range will always be executed.
How and when this actually will land is still not decided. We want to get the interface right and ensure it actually covers everything. Weāll get back to you once it hits some future unity version beta, but it wonāt happen during the unity 22.X stream.
Will this api be used for Hybrid Renderer in ecs 0.5?
And will support the Point Light next version?
This API is not used by the version in 0.5, and is used by the next version.
Point light support depends on URP, the Hybrid Renderer requires a screen space technique for local lights, such as deferred or forward+.
Will this API support GLES 3.1 in ECS 1.0 ?
We are aiming for this, and are currently working on it.
However, it is possible that due to technical reasons, the GLES3.1 version might work slightly differently and have different performance characteristics.
Hybrid Renderer in ECS 0.17 is already pretty fastā¦Can we expect a better performance by adopting this API when ECS reachs 1.0ļ¼