I’m launching a bunch of simultaneous computes (i.e. calling Dispatch() on multiple compute shader instances in parallel.)
Each ComputeShader instance is instantiated via Instantiate(). I keep several of these in a pool. It is based on a config value, but let’s say for clarity sake I’m keeping 5 compute shader instances.
So the pattern is, a central processor has a queue of work, and it fetches an available compute shader from the pool. When it is done, the shader instance is returned to the pool. So while we’re doing 100s of computes in total, we have up to 5 running at the same time.
Well… the problem is, after calling Dispatch(), how can I tell when that compute is completed? I’m not doing any readbacks or trying to get data back to the CPU. It’s just sent off the GPU - send and forget.
But I don’t know when to return a compute shader instance back to the pool, if I can’t tell that the work is done.
(The whole point is, I don’t want to overload the GPU by running a gazillion computes at the same time. want to spread it out and Dispatch() max 5 simultaneously.)
Someone answered this on Stack Overflow by saying that Dispatch() is indeed synchronous… well if this is the case, then we know the work is done straight away… but I find this kind of surprising, because isn’t the whole point of sending work off to the GPU, to NOT block the CPU thread?
Does anyone have concrete knowledge on this?
Thanks!
Hey!
If you want verify that the GPU finished executing a compute shader you can do right away by blocking the CPU until the GPU finished executing the compute shader. You can also check that in a future frame which is the preferred way. You need to use a GraphicsFence.
First option destroys performance because the CPU and GPU will be in sync while you execute your compute shaders. If you use a CommandBuffer you can do something like this:
cmdBuffer.Dispatch(...);
GraphicsFence gpuFence = cmdBuffer.CreateGraphicsFence(GraphicsFenceType.AsyncQueueSynchronisation, SynchronisationStageFlags.ComputeProcessing);
Graphics.ExecuteCommandBuffer(cmdBuffer);
// Not sure but you might need a GL.Flush() here.
while(gpuFence.passed == false)
{
//Just wait 1 ms to let GPU execute commands.
Thread.Sleep(1);
}
// The GPU finished executing the compute shader at this point.
If you don’t use CommandBuffers then use similar methods from Graphics.
The second option is more performant but you need to associate and store a GraphicsFence to each Dispatch and check if GraphicsFence.passed is true in a following frame and return the associated compute shader to the pool. I haven’t checked any of this!!!
1 Like
By the way, why do you need a pool of compute shaders? Are those instances of the same compute shader?
My approach is pretty simple… I have a wrapper class (plain C#) that encapsulates a ComputeShader instance, with the shader instance created via Instantiate(). It’s the same shader for all instances.
I create 5 instances of the wrapper class and they get used and returned out of the pool as part of a work queue - so that no more than 5 of these computes are running at the same time.
I’m probably misunderstanding the statefulness (or lack thereof) of a compute shader instance. Are you suggesting I can just keep one instance somewhere, and do multiple Dispatch() calls on it?
Dispatch is an asynchronous method.
It’s just a guess, but many beginner-oriented sample codes use ComputeBuffer.GetData
right after Dispatch to synchronously retrieve results. This might have caused Stack Overflow users to misunderstand how it works.
(Of course, ComputeBuffer.GetData
can significantly degrade the overall performance of your game, so avoid using it in production code.)
There’s no need to create new instances of ComputeShader. Use the original asset as is.
Additionally, pooling is unnecessary. A ComputeShader simply sends commands, so you can call Dispatch
five times with a single ComputeShader.
If you want to check when the process is complete, use GraphicsFence.passed
, as explained above. If you want to retrieve the results, use AsyncGPUReadback.Request
.
1 Like
I’m looking this morning at the compute shader instances setup I have (each one via Instantiate()), and based on what you and @Arithmetica are saying, they are basically stateless and I can just re-use the one, calling Dispatch() multiple times…
However - the part that gets me is that I’m setting input buffers with unique data for each Dispatch(), as well as output buffers of course.
(etc., a lot more buffers)
So I’m assuming I would need to keep these buffers totally separate for each instance, but that I could still call SetBuffer and Dispatch on one single shader instance?
So the behaviour is that calling SetBuffer() + Dispatch() would fire all that off to the GPU together with the buffer pointers etc., so that if I immediately call SetBuffer() + Dispatch() with a different buffer/data, the already-sent dispatch would not be messed with.
Hope this makes sense.
Sure, no problem.
For example, take a look at this official Unity test code that performs parallel sorting.