A few quick rules for you to follow to help you and future ComputeShader users to wrap your head around it.
-
After calling computeShader.Dispatch(0,1,1,1); your cpu waits for the gpu to return a completed status.
-
This is critical information, that confuses many new users. Especially those who have experience with threads but none with gpu threads. Let me say that again… the main thread sleeps until the gpu returns a completed status!! I wish someone had told me this from the beginning!
-
Each thread has an ID and acts as an iterator in a loop!! You need to manage the iterator in way that corresponds with traditional looping. (See Figure 1.0 below)
Figure 1.0
for(int y = 0; y < 20; y++)
{
for(int x = 0; x < 20; x++)
{
ComplexMathProblem(x,y);
}
}
Iteration Management
Figure 1.0 depicts a clear picture of a standard xy loop. I will explain how emulate this on a gpu below.
computeShader.Dispatch(0,1,1,1);
dispatches 1x , 1y and 1z thread groups.
This means that thread group x and y will both execute 1 time. These are the dimensions.
Have a look at Microsoft’s explaination of this layout:
Figure 1.1
To effectively use a compute shader, we need to have an intimate understanding of what our loop needs to do, as we need to convert that iteration (similar to recursion in a way) into a new kind of iteration we’ve never seen before. Figure 1.2 explains.
Figure 1.2
[numthreads(20, 20, 1)]
void CSMain(uint3 id : SV_GroupThreadID)
{
int x = id.x;
int y = id.y
ComplexMathProblem(x,y);
}
Figure 1.2 depicts the exact same looping operation as figure 1.0, but without the for loop.
Everything that occurs inside the for loop of figure 1.0 will occur inside CSMain
here - Without the for
loop.
Each thread will (at the same time and in no particular order) execute the contents of
CSMain
. However the order doesnt really matter. The cores each have an id, and all of them should execute. All you need to do, is look at each core as an iteration of your loop.
For larger programs outside the scope of the gpu, you can make the request multiple times from the gpu, or you may have to wait longer for the gpu to return the data. This could result in frame drops, so its important you optimize your code and do as much on the gpu as possible. .
Now notice, I am using the SV_GroupThreadID.
In the [numthreads(20, 20, 1)]
I have asked the gpu for 20 x threads and 20 y threads.
This is a total of 400 threads.
I hope this helps. I am still learning too, we all are! Its a science after all!