Why two DispatchRays(n) and one DispatchRays(n * 2) the work time consumption is not the same

Hi,
I am trying to make screen space soft shadows use DXR ,But encountered performance issues

RayTracingMode is Static
No repeat Build of RayTracingAccelerationStructure
Same setting
5k Cube


context.ExecuteCommandBuffer(buffer);
buffer.Clear();
------------------------------------------------------
context.ExecuteCommandBuffer(buffer);
buffer.Clear();

How to shorten the time of multiple DispatchRays?
Or only use one DispatchRays

Hi!

Resource binding in Unity is not persistent. It’s stateless. Every explicit draw call (e.g. Graphics.DrawMesh), compute dispatch and ray tracing dispatch will bind all resources that it needs. In ray tracing dispatches, the resources and parameters setup come from various places like materials (used by Renderers in the RayTracingAccelerationStructure), resources and values set using Graphics.SetGlobalXXX for example, property blocks set using Renderer.SetPropertyBlock or other setting in MeshRenderer.

The cost of RayTracing.Dispatch depends on how many Renderers are in the RayTracingAccelerationStructure (how complex the scene is) and how many cores your CPU has.

Is this potentially something HDRP Ray Tracing could benefit from? Would the total cost be reduced (like OP is pointing out) if we would schedule all ray tracing effects ‘jobs’ for a final “collected” dispatch (in the HDRaytracingDeferredLightLoop?) instead of doing a separate DispatchRays call for every effect?

Did you manage to find an answer for your question? Interesting if this is possiblity to optimize RTX effects.

A collected dispatch could definitely bring performance gains when it comes to ray binning.

I did some digging into this about a month ago, and as far as I could see (and measure), the current solution seems to be inefficient and can actually result in a net loss in many cases, but please correct me if I’m wrong.

  1. Because the rays are binned separately for every effect, and this is done twice (both eyes) in XR, the ray binning overhead cost will add up. If you are running the full ray tracing stack (RTGI, RTR, RTAO, RTSS), it will execute 8 separate ray binning passes.

  2. Because the rays are dispatched separately, every set of binned rays will start BVH traversal from the beginning (8x in this case), so they will not trigger the cache hits we are trying to achieve.

If we would merge all of these together, and bin and dispatch these rays at once, all (maximum attainable) BVH cache would be triggered and unnecessary (repeated) binning pass overhead would be eliminated.

(For more info about ray binning, check out this Battlefield V presentation from GDC 2019, starting from page 20)

1 Like

I don’t know how HDRP works, so I don’t know the result
But in SRP, it is useful to combine execution when the number of entities is large

A:

A:
C#
buffer.DispathRay(shadername,“job1”,x,y,1,camera);
buffer.DispathRay(shadername,“job2”,x,y,1,camera);

B:
buffer.DispathRay(shadername,“job3”,x,y,2,camera);
shader:
if(z == 0)
{
job1Code
return;
}
job2Code
return;

In the case of a large number of entities, B is much faster than A. If the mesh is not merged, it can be considered that B is the fastest, but the speed of the mesh is not certain.

I used Google Translate, I don’t know if it can translate accurately