Resource binding in Unity is not persistent. It’s stateless. Every explicit draw call (e.g. Graphics.DrawMesh), compute dispatch and ray tracing dispatch will bind all resources that it needs. In ray tracing dispatches, the resources and parameters setup come from various places like materials (used by Renderers in the RayTracingAccelerationStructure), resources and values set using Graphics.SetGlobalXXX for example, property blocks set using Renderer.SetPropertyBlock or other setting in MeshRenderer.
The cost of RayTracing.Dispatch depends on how many Renderers are in the RayTracingAccelerationStructure (how complex the scene is) and how many cores your CPU has.
Is this potentially something HDRP Ray Tracing could benefit from? Would the total cost be reduced (like OP is pointing out) if we would schedule all ray tracing effects ‘jobs’ for a final “collected” dispatch (in the HDRaytracingDeferredLightLoop?) instead of doing a separate DispatchRays call for every effect?
A collected dispatch could definitely bring performance gains when it comes to ray binning.
I did some digging into this about a month ago, and as far as I could see (and measure), the current solution seems to be inefficient and can actually result in a net loss in many cases, but please correct me if I’m wrong.
Because the rays are binned separately for every effect, and this is done twice (both eyes) in XR, the ray binning overhead cost will add up. If you are running the full ray tracing stack (RTGI, RTR, RTAO, RTSS), it will execute 8 separate ray binning passes.
Because the rays are dispatched separately, every set of binned rays will start BVH traversal from the beginning (8x in this case), so they will not trigger the cache hits we are trying to achieve.
If we would merge all of these together, and bin and dispatch these rays at once, all (maximum attainable) BVH cache would be triggered and unnecessary (repeated) binning pass overhead would be eliminated.
(For more info about ray binning, check out this Battlefield V presentation from GDC 2019, starting from page 20)
In the case of a large number of entities, B is much faster than A. If the mesh is not merged, it can be considered that B is the fastest, but the speed of the mesh is not certain.
I used Google Translate, I don’t know if it can translate accurately