ComputeShader version of Physics.Raycast

Hi all,

I am new on ComputeShader.
In my application, I wrote a script that contains many may (>10,000) times of Physics.Raycast
However, it is well-known that Physics.Raycast is CPU, and thus slow.

I look into google, and found that ComputeShader Unity is something similar to CUDA
And I would like to write a ComputeShader version of Physics.Raycast.

I will be glad if any friends here can give me advise or reference about this.
Thanks a lot! and have a nice day.


This is super possible. It’s actually something I have done, it is NOT easy, however. I would greatly recommend this blog:

And this post:

Unfortunately, compute shaders are a relatively new technique and resources can be hard to come by.

As far as compute shaders speeding up your raycast, the answer, of course, is “it depends.” From what you describe you have many rays, and hopefully few objects in your scene. So you’ll likely want to parallelize by having a single thread per ray. GPUs are capable of many, many threads, but as @Benproductions1 said, they’re not actually faster (and are infact quite a bit slower) than CPU threads. So if each thread has to look for an intersection with 1000s of objects, it’s still going to be slow. One way that Unity (along with other engines) gets around this problem on the CPU is through octree culling:

…but I’m getting ahead of myself. Assuming that you don’t have too many objects in your scene, here’s a simple raycast compute shader:

struct Ray
    float3 position;
    float3 direction;
StructuredBuffer<Ray> rays;

// Three vertices define a triangle in space
float3 vertA;
float3 vertB;
float3 vertC;

// a 32 by 1 thread group, generally you want to fill up thread groups 
// which are either 32 or 64 threads wide, GPU dependent
void RayCast (uint3 id : SV_DispatchThreadID)
	// Our Ray
	float3 pos = rays[id.x].position;
    float3 dir = rays[id.x].direction;

    // The normal vector of the plane defined by the triangle
	float3 norm = normalize(cross(vertB - vertA, vertC - vertA));

    // The distance of the ray to an intersection with the plane
    // This is in units relative to the length of ray.direction
	float k = dot(vertA - pos, norm) / dot(dir, norm); 
    // The point in space were the ray intersects the (infinite) plane
	float3 I = pos+ k*dir; 
	// Convert to barycentric coordinates
    // This will find if the intersection is actually within the triangle
	float triangleArea = dot(norm, cross(vertB - vertA,vertC - vertA));
	float areaIBC = dot(norm, cross(vertB - I, vertC - I));
	float baryA = areaIBC / triangleArea;
	float areaICA = dot(norm, cross(vertC - I, vertA - I));
	float baryB = areaICA / triangleArea;
	float baryC = 1 - baryA - baryB;	
	if(baryA > 0 && baryB > 0 && baryC > 0 && k >= 0) {
	    // The ray intersects this triangle

This would have to run for EVERY triangle that the ray potentially intersects, either with a two-dimensional compute shader (num rays x num triangles), or a for loop within the compute shader. Either way, you’re looking at some serious computations.

Now, this doesn’t get you 100% of the way there. You have to figure out what the hell you’re going to return, and for that you’re going to run into race conditions. Yup, there’s gonna be some race conditions, good luck!

Physics.Raycast is CPU, and thus slow

Just to be clear, a GPU is not fast, it’s merely concurrent. Think of a GPU as having about 32, very old (ie. slow), very small CPUs. Unless you’re doing something like 1k+ (arbitrary) raycasts per frame, using the GPU will probably be slower.

Before you ask about how you can implement Physics.Raycast on the GPU, you’d be better of re-implementing it in a normal script first, just so you can understand the amount of work behind it.

To gain any sort of speed boost, you will quite literally have to rebuild a large section of Unity’s physics engine, PhysX, on the GPU. This includes spacial trees and the like. You don’t have access to any physics or other such API functions from any type of shader, so unless you’re either ready to rebuild the PhysX collision system or willing to settle for a horridly slow GPU implementation, I suggest you stick with what you currently have.

Thank you for your answer.
That’s a horrible news.

And as I mentioned, I have to perform more than 10K raycasting (probably increasing to obtain a better result in my application), and that is from each light source that I have on scene.

And of course I understand that single GPU vs single CPU is not necessarily faster.
What I understand is exactly about the parallelism, since each raycast does not differ from each other but the initial shooting direction.

So how may I improve this if I do not have the help from computeShader?
Any advise?

Hello, how can I call this RayCast method with appropriate arguments, and how to retrieve bool if raycast hitted its target?