Compute Shaders without slowing the game

Hi,
I’m working on my own marching cube/infinite world engine on CPU. I wanted to optimize my game and learned about compute shaders. I was excited about the idea of using my GPU to generate mesh and noise and started to experiment with it but quickly learned that Compute Shaders, like pretty much all other unity classes, can’t be executed in other threads.
I’m searching for some way to do some heavy work on the GPU without freezing the main thread until the task is done. I understand that this was made to prevent two CPU threads from executing the same shader at the same time but still, that’s not useful if your goal is to have an infinite terrain rather than generating everything at the start.
So, is there any solution or I’m stuck using the CPU?

Or maybe there is a way to call a function when dispatch is done?

Wait, does Dispatch cause the freeze because the main thread has to wait for it to finish or the getData function needs the compute shader to finish its task before I can retrieve the data, meaning I can just do the get data later without slow down?

Even if THAT would be the case, I’m pretty sure you can only get the data in the main thread right?

Yeah, I can’t get the data in a second thread

I am rather sure that it is possible to refactor your algorithm, to run without freezes. Too long freezes (> 2 seconds) can cause driver crash:

Would you describe your algorithm in details ?

Do you compute stuff in each update? If yes, is it Update() or FixedUpdate()? Compute shader should be dispatched from Update() to work in sync with the graphic pipeline.

After you dispatch compute shader, main thread just goes on. But getData stalls the pipeline in order to get the data in the current frame. So, if it’s the issue, you might try to use the async version of get data.

But if your code is performance heavy and requires more time than CPU side computations, Update() will wait GPU to finish until it’s able to do the render routines.

My goal is to generate my infinite procedural world as fast as possible. I would like to execute more than one compute job of the same shader at the same time because I’m pretty sure my GPU can handle that but I’m not sure how yet. Also, I do my stuff in Update.

I can’t figure out if the CPU is getting stopped or it’s just the pipeline. In case it’s the pipeline, I would like to know where I could find information about the async version of getData you are talking about. I can’t seem to find anything related to it in the documentation.

Did you ever find a solution? I am having the same problem

First of all, if you “can’t figure out” what is getting stopped it means you are not profiling. Surround the pieces of code you suspect are causing issues with Profiler.BeginSample() and Profile.EndSample() calls to measure how long they are taking before taking any kind of action.

Ideally you should re-design your algorithms to avoid reading data back to the CPU. You said you are creating meshes, so I assume you are generating lists of positions/indices/normals/etc in the compute shader, reading them back and feeding them into a Unity Mesh object. That’ll never be fast: you’re copying data from the GPU to the CPU, then doing whatever processing Unity needs to do to convert that into mesh data, and uploading it back to the GPU.

Check https://docs.unity3d.com/ScriptReference/Rendering.AsyncGPUReadback.html

1 Like

This is good IF you need the data back on the cpu. In my case I don’t, so why can’t we just have a way to check if the compute shader is done?

You don’t have two. If you dispatch a CS that writes into a buffer, then dispatch another CS that reads from that same buffer, the second dispatch will only execute when the first one is completed.

2 Likes

What is the right pattern for when you have separate input/output buffers? Iike an input and an append buffer?

I’ve seen tight loops where they set SetCounterValue(0), set the buffers, Dispatch(), then call CopyCount and then use the buffers either in a second CS or the rendering side. But it’s not clear if they are just hoping the work is done or if there is something undocumented like say CopyCount forces the cpu side to wait.

The SetCounterValue and CopyCount commands are scheduled to happen on the GPU, just like a Dispatch command. The CPU never sees the value and isn’t stalled by calling it. In fact, if you have multithreaded rendering and/or rendering job enabled, these will be issued to the GPU from other threads.