Why use compute shaders for ray marching/tracing?

A good example is this Ray Tracing tutorial from gamasutra. I implemented some ray marching recently but without using any kernels. I simply do all the operations in the shader. My question is: isn’t the shader already intrinsecally parallelized by the GPU as it’s computing everything in the fragment function?
Why would I gain anything (and how much would I gain) using a compute shader instead of a simpler fragment shader?
Thanks

I have the same question. Any results?

1 Like

I haven’t delved too deeply myself but my understanding is that there are some things you can’t do in a fragment shader.

It’s mainly around the lack of access to buffers or any ability to either precalculate stuff or store intermediate calculations for later reuse. A fragment shader is dumb and only knows about it’s own single pixel. With a compute shader there is are many possibilities for optimization that wouldn’t be possible in a fragment shader.

Search Github for “raymarch” (or “sdf” etc) and “compute shader” - there’s a few projects that might give you some ideas.

1 Like

Actually you can bind buffers to the pixel/frag stage to read or write. You can even compute stuff in a compute shader and then bind that result to the frag to read from without transferring data around.

Some benefits of compute off the top of my head:

  • Control over the computation resolution and hardware resource distribution instead of it simply being the pixels the triangle falls on.
  • Asynchronous or simple pre-computing of a result.
  • Can easily implement reductionist computations, where the output of one compute is a lower amount of elements to then be computed in another compute kernel, and so on… leading to more optimal calculations.
  • Can compute arbitrary data that may not relate to a specific pixel, such as vertex data (vertex pass only knows about its current vertex, geo/tess only can know up to 6 adjacency (3 in unity) and would be more wasteful in many circumstances) or any other computation that would benefit from highly parallel processing.
  • Is specifically designed for input/output to arbitrary buffers and allows for further optimizing through the use of thread/work groups and group-shared memory.

And I’m sure there’s much more.

3 Likes