Check if a ComputeShader.Dispatch() command is completed on GPU before doing second kernel dispatch

Hi,

I am trying to move from my CPU based procedural planet generation approach to a GPU based (when it comes to plane calculation and rendering).
Being fairly new new to shader programing at all, I am right now at the stage that I can hand over generation constants in a buffer to a compute shader, precalulcate the plane in a compute shader (vertice positions, norrmals, based on noise) and hand them over via the buffer for rendering (replacing the vertex positions of a prototype mesh with the ones from the buffer).

But right now I havent been able to render more than one plane at once. Guess this is due to some dependency I am not aware of (maybe the order of creating the buffers, materials, DrawMesh calls), or maybe I in the end need one gameobject per DrawMesh call?!.

Any hint on what I might be doing wrong would be very helpfull and appreciated.

So right now my (not working) approach is I moved the buffers into the QuadtreeTerrain class (a quadtree node), as well as the material (not sure if individual materials are necessary).

    class QuadtreeTerrain {
        // Quadtree classes
        public QuadtreeTerrain parentNode; // The parent quadtree node
        public QuadtreeTerrain childNode1; // A children quadtree node
        public QuadtreeTerrain childNode2; // A children quadtree node
        public QuadtreeTerrain childNode3; // A children quadtree node
        public QuadtreeTerrain childNode4; // A children quadtree node
        // Buffer
        public ComputeBuffer generationConstantsBuffer;
        public ComputeBuffer patchGeneratedDataBuffer;
        // Material
        public Material material;
            ....
    }

In the SpaceObjectProceduralPlanet script, applied to a single game object, I hold six instances of quadtrees [=QuadtreeTerrain] then.

    public class SpaceObjectProceduralPlanet : MonoBehaviour {
        ....
        // QuadtreeTerrain
        private QuadtreeTerrain quadtreeTerrain1;
        private QuadtreeTerrain quadtreeTerrain2;
        private QuadtreeTerrain quadtreeTerrain3;
        private QuadtreeTerrain quadtreeTerrain4;
        private QuadtreeTerrain quadtreeTerrain5;
        private QuadtreeTerrain quadtreeTerrain6;
   
        // We initialize the buffers and the material used to draw.
        void Start()
        {
            ...
            // QuadtreeTerrain
            this.quadtreeTerrain1 = new QuadtreeTerrain(0, edgeVector1, edgeVector2, edgeVector3, edgeVector4, quadtreeTerrainParameter1);
            this.quadtreeTerrain2 = new QuadtreeTerrain(0, edgeVector2, edgeVector5, edgeVector4, edgeVector7, quadtreeTerrainParameter2);
            this.quadtreeTerrain3 = new QuadtreeTerrain(0, edgeVector5, edgeVector6, edgeVector7, edgeVector8, quadtreeTerrainParameter3);
            this.quadtreeTerrain4 = new QuadtreeTerrain(0, edgeVector6, edgeVector1, edgeVector8, edgeVector3, quadtreeTerrainParameter4);
            this.quadtreeTerrain5 = new QuadtreeTerrain(0, edgeVector6, edgeVector5, edgeVector1, edgeVector2, quadtreeTerrainParameter5);
            this.quadtreeTerrain6 = new QuadtreeTerrain(0, edgeVector3, edgeVector4, edgeVector8, edgeVector7, quadtreeTerrainParameter6);
            CreateBuffers(this.quadtreeTerrain1);
            CreateBuffers(this.quadtreeTerrain2);
            CreateBuffers(this.quadtreeTerrain3);
            CreateBuffers(this.quadtreeTerrain4);
            CreateBuffers(this.quadtreeTerrain5);
            CreateBuffers(this.quadtreeTerrain6);
            CreateMaterial(this.quadtreeTerrain1);
            CreateMaterial(this.quadtreeTerrain2);
            CreateMaterial(this.quadtreeTerrain3);
            CreateMaterial(this.quadtreeTerrain4);
            CreateMaterial(this.quadtreeTerrain5);
            CreateMaterial(this.quadtreeTerrain6);
            Dispatch(this.quadtreeTerrain1);
            Dispatch(this.quadtreeTerrain2);
            Dispatch(this.quadtreeTerrain3);
            Dispatch(this.quadtreeTerrain4);
            Dispatch(this.quadtreeTerrain5);
            Dispatch(this.quadtreeTerrain6);
    }
   
        // We compute the buffers.
        void CreateBuffers(QuadtreeTerrain quadtreeTerrain)
        {
            .... preparing generation constants
            quadtreeTerrain.generationConstantsBuffer.SetData(generationConstants);
            // Buffer Output
            quadtreeTerrain.patchGeneratedDataBuffer = new ComputeBuffer(nVerts, 16 + 12 + 4 + 12);
        }
   
        //We create the material
        void CreateMaterial(QuadtreeTerrain quadtreeTerrain)
        {
            Material material = new Material(shader);
            material.SetTexture("_MainTex", this.texture);
            material.SetFloat("_Metallic", 0);
            material.SetFloat("_Glossiness", 0);
            quadtreeTerrain.material = material;
        }
   
        //We dispatch threads of our CSMain1 and CSMain2 kernels.
        void Dispatch(QuadtreeTerrain quadtreeTerrain)
        {
            // Set Buffers
            computeShader.SetBuffer(_kernel, "generationConstantsBuffer", quadtreeTerrain.generationConstantsBuffer);
            computeShader.SetBuffer(_kernel, "patchGeneratedDataBuffer", quadtreeTerrain.patchGeneratedDataBuffer);
            // Dispatch first kernel
            _kernel = computeShader.FindKernel("CSMain1");
               computeShader.Dispatch(_kernel, THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);
            // Dispatch second kernel
            _kernel = computeShader.FindKernel("CSMain2");
            computeShader.Dispatch(_kernel, THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);
        }
   
        // We set the material before drawing and call DrawMesh on OnRenderObject
        void OnRenderObject()
        {
            this.quadtreeTerrain1.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain1.patchGeneratedDataBuffer);
            Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain1.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
   
            this.quadtreeTerrain2.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain2.patchGeneratedDataBuffer);
            Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain2.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
   
            this.quadtreeTerrain3.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain3.patchGeneratedDataBuffer);
            Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain3.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
   
            this.quadtreeTerrain4.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain4.patchGeneratedDataBuffer);
            Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain4.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
   
            this.quadtreeTerrain5.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain5.patchGeneratedDataBuffer);
            Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain5.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
   
            this.quadtreeTerrain6.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain6.patchGeneratedDataBuffer);
            Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain6.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
        }
   
        //When this GameObject is disabled we must release the buffers.
        private void OnDisable()
        {
            ReleaseBuffer();
        }
   
        //Release buffers and destroy the material when play has been stopped.
        void ReleaseBuffer()
        {
            // Destroy everything recursive in the quadtrees.
            this.quadtreeTerrain1.generationConstantsBuffer.Release();
            this.quadtreeTerrain1.patchGeneratedDataBuffer.Release();
            this.quadtreeTerrain2.generationConstantsBuffer.Release();
            this.quadtreeTerrain2.patchGeneratedDataBuffer.Release();
            this.quadtreeTerrain3.generationConstantsBuffer.Release();
            this.quadtreeTerrain3.patchGeneratedDataBuffer.Release();
            this.quadtreeTerrain4.generationConstantsBuffer.Release();
            this.quadtreeTerrain4.patchGeneratedDataBuffer.Release();
            this.quadtreeTerrain5.generationConstantsBuffer.Release();
            this.quadtreeTerrain5.patchGeneratedDataBuffer.Release();
            this.quadtreeTerrain6.generationConstantsBuffer.Release();
            this.quadtreeTerrain6.patchGeneratedDataBuffer.Release();
            DestroyImmediate(this.quadtreeTerrain1.material);
            DestroyImmediate(this.quadtreeTerrain2.material);
            DestroyImmediate(this.quadtreeTerrain3.material);
            DestroyImmediate(this.quadtreeTerrain4.material);
            DestroyImmediate(this.quadtreeTerrain5.material);
            DestroyImmediate(this.quadtreeTerrain6.material);
        }
   
        void Update() {
            // Do nothing
        }
   
    }

Of course this is very bruteforce, but well this should work before I proceed as I need to figure out how to handle the buffers and draw calls and where to put them.

1 Like

I tried to get closer to the problem. It seems to depend which “Dispatch()” call I do first to precalculate a computebuffer in one computeshader before it is sent to the vertex buffer.
It seems the first ComputeBuffer.Dispatch() call overrules all following ones,. as only the results from the first call are drawn. Although from my point of unserstanding I am using different buffers.
Edit: To be more precise: Both meshes are drawn but they seem to share the same locations and probably buffer. I noticed that as the rendered triangles doubled with each Graphics.DrawMesh added.

Each “QuadtreeTerrain” class has two compute buffer references.

  class QuadtreeTerrain {
  public ComputeBuffer generationConstantsBuffer;
  public ComputeBuffer patchGeneratedDataBuffer;
  }

In SpaceObjectProceduralPlanet I initialize the buffers (call CreateBuffer(QuadtreeTerrain) in Start(), where in the called functions the buffers of each object are initialized (“new ComputeBuffer()”). Afterwards I dispatch each buffer by calling (in Start() ) the function Dispatch(QuadtreeTerrain).
Then, in “OnRenderObject()” I sent the buffers to the renderer.
For the ease of read I reduced the number of different buffers to 2.
Any hint why the first Dispatch() call overrules all others is very much appreciated.

using UnityEngine;
using System.Collections;
using System.Threading;
using System.Collections.Generic;

[RequireComponent(typeof(GameObject))]
public class SpaceObjectProceduralPlanet : MonoBehaviour {

    public int seed;
    public Position position;
    public string name;
    public float radius;
    public float diameter;
    public Transform m_Transform;
    private int LOD;
    // Primitive
    private AbstractPrimitive primitive;
    private enum PrimitiveState { IN_PRECALCULATION, PRECALCULATED, DONE };
    private PrimitiveState primitiveState;
    // QuadtreeTerrain
    private QuadtreeTerrain quadtreeTerrain1;
    private QuadtreeTerrain quadtreeTerrain2;
    // Plane Template
    public Mesh prototypeMesh;
    public Mesh prototypeMesh2;
    public Plane plane;
    public Texture2D texture;
    // ComputeShader
    public Shader shader;
    public ComputeShader computeShader;
    private ComputeBuffer generationConstantsBuffer;
    private ComputeBuffer patchGeneratedDataBuffer;
    private int _kernel;
    // Constants
    public static int nVertsPerEdge { get { return 224; } }     //Should be multiple of 32
    public static int nVerts { get { return nVertsPerEdge * nVertsPerEdge; } }
    public int THREADS_PER_GROUP_X { get { return 32; } }
    public int THREADS_PER_GROUP_Y { get { return 32; } }
    public int THREADGROUP_SIZE_X { get { return nVertsPerEdge / THREADS_PER_GROUP_X; } }
    public int THREADGROUP_SIZE_Y { get { return nVertsPerEdge / THREADS_PER_GROUP_Y; } }
    public int THREADGROUP_SIZE_Z { get { return 1; } }

    struct PatchGenerationConstantsStruct
    {
        public int nVertsPerEdge;
        public float scale;
        public float spacing;
        public Vector3 patchCubeCenter;
        public Vector3 cubeFaceEastDirection;
        public Vector3 cubeFaceNorthDirection;
        public float planetRadius;
        public float terrainMaxHeight;
        public float noiseSeaLevel;
        public float noiseSnowLevel;
    }

    struct patchGeneratedDataStruct
    {
        public Vector4 position;
        public Vector3 normal;
        public float noise;
        public Vector3 patchCenter;
    }

    // Initial call. We setup the shaders and prototype meshes here.
    void Awake () {
        // Transform
        m_Transform = transform;

        // Mesh prottype
        this.prototypeMesh = MeshServiceProvider.setupNavyFishDummyMesh(nVertsPerEdge);
        this.prototypeMesh2 = MeshServiceProvider.setupNavyFishDummyMesh(nVertsPerEdge);
        // Plane Template (not used right now as we have the prototype mesh)
        this.plane = new Plane(nVertsPerEdge, Vector3.back);
        // Shader
        this.shader = Shader.Find("Custom/ProceduralPatch3");
        // ComputeShader
        this.computeShader = (ComputeShader)Resources.Load("Shaders/Space/Planet/Custom/ProceduralPatchCompute3");
        // Texture
        this.texture = (Texture2D)Resources.Load("Textures/space/planets/seamless/QuadtreeTerrainTexture.MugDry_1024") as Texture2D;
    }

    // We initialize the buffers and the material used to draw.
    void Start()
    {
        // Edge coordinates for initialization
        Vector3 edgeVector1 = new Vector3(-1, +1, -1);
        Vector3 edgeVector2 = new Vector3(+1, +1, -1);
        Vector3 edgeVector3 = new Vector3(-1, -1, -1);
        Vector3 edgeVector4 = new Vector3(+1, -1, -1);
        Vector3 edgeVector5 = new Vector3(+1, +1, +1);
        Vector3 edgeVector6 = new Vector3(-1, +1, +1);
        Vector3 edgeVector7 = new Vector3(+1, -1, +1);
        Vector3 edgeVector8 = new Vector3(-1, -1, +1);
        // Parameters
        QuadtreeTerrainParameter parameter = new QuadtreeTerrainParameter();
        parameter.nVertsPerEdge = nVertsPerEdge;
        parameter.scale = 2.0f / nVertsPerEdge;
        parameter.spacing = 2.0f / nVertsPerEdge;
        parameter.planetRadius = 6371.0f; // 6371000.0f; = earth
        parameter.terrainMaxHeight = 15.0f;
        parameter.noiseSeaLevel = 0.0f;
        parameter.noiseSnowLevel = 0.8f;
        QuadtreeTerrainParameter quadtreeTerrainParameter1 = parameter.clone();
        quadtreeTerrainParameter1.cubeFaceEastDirection = new Vector3(1, 0, 0);
        quadtreeTerrainParameter1.cubeFaceNorthDirection = new Vector3(0, 1, 0);
        QuadtreeTerrainParameter quadtreeTerrainParameter2 = parameter.clone();
        quadtreeTerrainParameter2.cubeFaceEastDirection = new Vector3(0, 0, 1);
        quadtreeTerrainParameter2.cubeFaceNorthDirection = new Vector3(0, 1, 0);
        // QuadtreeTerrain
        this.quadtreeTerrain1 = new QuadtreeTerrain(0, edgeVector1, edgeVector2, edgeVector3, edgeVector4, quadtreeTerrainParameter1);
        this.quadtreeTerrain2 = new QuadtreeTerrain(0, edgeVector2, edgeVector5, edgeVector4, edgeVector7, quadtreeTerrainParameter2);
        CreateBuffers(this.quadtreeTerrain1);
        CreateBuffers(this.quadtreeTerrain2);
        CreateMaterial(this.quadtreeTerrain1);
        CreateMaterial(this.quadtreeTerrain2);

    // Only the mesh is drawn where there has been the first Dispatch(..) call. E.g. if the first call is commented out, the second mesh (QuadtreeTerrain2) is drawn.
        //Dispatch(this.quadtreeTerrain1);
        Dispatch(this.quadtreeTerrain2);
    }

    void Update()
    {

    }

    // We compute the buffers.
    void CreateBuffers(QuadtreeTerrain quadtreeTerrain)
    {
        // Buffer Patch Generation Constants
        quadtreeTerrain.generationConstantsBuffer = new ComputeBuffer(4, // 1x int (4 bytes) for one index, index = 0
            4 +     // nVertsPerEdge (int = 4 bytes),
            4 +     // scale (float = 4 bytes),
            4 +     // spacing (float = 4 bytes),
            12 +    // patchCubeCenter (float3 = 12 bytes),
            12 +    // cubeFaceEastDirection (float3 = 12 bytes),
            12 +    // cubeFaceNorthDirection (float3 = 12 bytes),
            4 +     // planetRadius (float = 4 bytes),
            4 +     // terrainMaxHeight (float = 4 bytes),
            4 +     // noiseSeaLevel (float = 4 bytes),
            4);     // noiseSnowLevel (float = 4 bytes),
        PatchGenerationConstantsStruct[] generationConstants = new PatchGenerationConstantsStruct[1];
        generationConstants[0].nVertsPerEdge = quadtreeTerrain.parameters.nVertsPerEdge;
        generationConstants[0].scale = quadtreeTerrain.parameters.scale;
        generationConstants[0].spacing = quadtreeTerrain.parameters.spacing;
        generationConstants[0].patchCubeCenter = quadtreeTerrain.centerVector;
        generationConstants[0].cubeFaceEastDirection = quadtreeTerrain.parameters.cubeFaceEastDirection;
        generationConstants[0].cubeFaceNorthDirection = quadtreeTerrain.parameters.cubeFaceNorthDirection;
        generationConstants[0].planetRadius = quadtreeTerrain.parameters.planetRadius;
        generationConstants[0].terrainMaxHeight = quadtreeTerrain.parameters.terrainMaxHeight;
        generationConstants[0].noiseSeaLevel = quadtreeTerrain.parameters.noiseSeaLevel;
        generationConstants[0].noiseSnowLevel = quadtreeTerrain.parameters.noiseSnowLevel;
        quadtreeTerrain.generationConstantsBuffer.SetData(generationConstants);
        // Buffer Output
        quadtreeTerrain.patchGeneratedDataBuffer = new ComputeBuffer(nVerts, 16 + 12 + 4 + 12); // Output buffer contains vertice position (float4 = 16 bytes),
                                                                                                // normals (float3 = 12 bytes),
                                                                                                // noise (float = 4 bytes)
                                                                                                // patchCenter (float3 = 12 bytes)
    }

    //We create the material
    void CreateMaterial(QuadtreeTerrain quadtreeTerrain)
    {
        quadtreeTerrain.material = new Material(shader);
        quadtreeTerrain.material.SetTexture("_MainTex", this.texture);
        quadtreeTerrain.material.SetFloat("_Metallic", 0);
        quadtreeTerrain.material.SetFloat("_Glossiness", 0);
    }

    //The meat of this script, it sets the buffers for the compute shader.
    // We then dispatch threads of our CSMain1 and 2 kernels.
    void Dispatch(QuadtreeTerrain quadtreeTerrain)
    {
        // Set Buffers
        computeShader.SetBuffer(_kernel, "generationConstantsBuffer", quadtreeTerrain.generationConstantsBuffer);
        computeShader.SetBuffer(_kernel, "patchGeneratedDataBuffer", quadtreeTerrain.patchGeneratedDataBuffer);
        // Dispatch first kernel
        _kernel = computeShader.FindKernel("CSMain1");
        computeShader.Dispatch(_kernel, THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);
        // Dispatch second kernel
        _kernel = computeShader.FindKernel("CSMain2");
        computeShader.Dispatch(_kernel, THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);
    }

    //After all rendering is complete we dispatch the compute shader and then set the material before drawing.
    void OnRenderObject()
    {
        this.quadtreeTerrain1.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain1.patchGeneratedDataBuffer);
        Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain1.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
        this.quadtreeTerrain2.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain2.patchGeneratedDataBuffer);
        Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain2.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
    }


    //When this GameObject is disabled we must release the buffers.
    private void OnDisable()
    {
        ReleaseBuffer();
    }

    //Release buffers and destroy the material when play has been stopped.
    void ReleaseBuffer()
    {
        // Destroy everything recursive in the quadtrees.
        this.quadtreeTerrain1.generationConstantsBuffer.Release();
        this.quadtreeTerrain1.patchGeneratedDataBuffer.Release();
        this.quadtreeTerrain2.generationConstantsBuffer.Release();
        this.quadtreeTerrain2.patchGeneratedDataBuffer.Release();
        DestroyImmediate(this.quadtreeTerrain1.material);
        DestroyImmediate(this.quadtreeTerrain2.material);
    }

}
1 Like

The problem seems to concentrate of the second Dispatch() call to the second kernel, which I immediately do after the first one.

In CSMain1 I initially calculate the position of a vertex based on some noise.
In CSMain2 I want to calculate the normals and some other things (terraintype etc.)

My problem:
I am not sure when I can do the second Dispatch() call to the second kernel.
If I use the following line of code, the planes (calculated in the first kernel (CSMain1)) do not correctly show up.

// Set Buffers CSMain1
computeShader.SetBuffer(_kernel[0], “generationConstantsBuffer”, quadtreeTerrain.generationConstantsBuffer);
computeShader.SetBuffer(_kernel[0], “patchGeneratedDataBuffer”, quadtreeTerrain.patchGeneratedDataBuffer);
// Dispatch first kernel CSMain1
computeShader.Dispatch(_kernel[0], THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);
// Set Buffers CSMain2
computeShader.SetBuffer(_kernel[1], “generationConstantsBuffer”, quadtreeTerrain.generationConstantsBuffer);
computeShader.SetBuffer(_kernel[1], “patchGeneratedDataBuffer”, quadtreeTerrain.patchGeneratedDataBuffer);
// Dispatch second kernel CSMain2
computeShader.Dispatch(_kernel[1], THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);

It works when I comment the second Dispatch() call out.

// Set Buffers CSMain1
computeShader.SetBuffer(_kernel[0], “generationConstantsBuffer”, quadtreeTerrain.generationConstantsBuffer);
computeShader.SetBuffer(_kernel[0], “patchGeneratedDataBuffer”, quadtreeTerrain.patchGeneratedDataBuffer);
// Dispatch first kernel CSMain1
computeShader.Dispatch(_kernel[0], THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);
// Set Buffers CSMain2
//computeShader.SetBuffer(_kernel[1], “generationConstantsBuffer”, quadtreeTerrain.generationConstantsBuffer);
//computeShader.SetBuffer(_kernel[1], “patchGeneratedDataBuffer”, quadtreeTerrain.patchGeneratedDataBuffer);
// Dispatch second kernel CSMain2
//computeShader.Dispatch(_kernel[1], THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);

I guess the problem is that the second C# Dispatch() to the second kernel is done (although nothing happens in the second kernel at the moment) while the first one is still being worked on in the first kernel.

How do you determine and orchestrate the Dispatch() calls of two or more kernels on the CPU in C# code in Unity?

Both and especially the second stage / second kernel are now correctly invoked. For the next one that comes by that problem:
The error was that in the compute shader I had both pragma definitions on top, afterwards both functions. Like:

#pragma kernel CSMain1
#pragma kernel CSMain2

[numthreads(threadsPerGroup_X,threadsPerGroup_Y,1)]

void CSMain1 (uint3 id : SV_DispatchThreadID)
{
// code
}

void CSMain2 (uint3 id : SV_DispatchThreadID)
{
// code
}

Things started to work when I put the code in different order:

#pragma kernel CSMain1

[numthreads(threadsPerGroup_X,threadsPerGroup_Y,1)]

void CSMain1 (uint3 id : SV_DispatchThreadID)
{
// code
}

#pragma kernel CSMain2

[numthreads(threadsPerGroup_X,threadsPerGroup_Y,1)]

void CSMain2 (uint3 id : SV_DispatchThreadID)
{
// code
}

There are a few (of the only few) compute shader tutorials around which describe my above initial implementation which didnt work. So anyone who has the same problem like me might try to change the order of codelines like above.

1 Like

Hello there!, I have had similar issues and my solution :
Don’t use get data in real time! (took me 2 months to find the reason, which I can explain another time if you like)
Instead:
Make an array of compute buffers, a minimum of 2: one for read, one for write.
Do a rw structure in compute, filled with junk data to be overridden in the compute
On the next frame (this is import - I suggest doing an enum to ‘waitforenfoframe’ yield )
Lastly copy the Contents of the write buffer into read.
Then use these cloned buffers with whatever you need - get data does work on static buffers in this case.

What do you mean don’t use GetData(…) in realtime? Is there a way to get the data without calling GetData(…) that we should do instead?

What happens if the ComputeShader takes more than 2 frames…?

Can today be another time? I’m also trying to run a shader multiple times after eachother but I can not find a way to check wether a shader is done. Does GetData() wait for the shader to complete? I’m also trying to run this in editor so I can not use yield return new WaitForEndOfFrame().

GetData simply overrides the CPU buffer by whatever is available. The best way to ensure you have completion is a loop and extra a variable for a checksum, then continue.

Hey SirShelly, I am trying to figure this out but I’m really new to compute shaders, how can I make the checksum variable and extract it?

Cheers!

'Makes you really wonder where people find this information out…

Yes it does.

No it doesn’t.

Let’s try and dispel some myths here… :slight_smile:

If you dispatch a ComputeShader, then call ComputeBuffer.GetData on a buffer written by the shader, you should see the data that the ComputeShader wrote. There is no “it’s still in progress” or anything like that. It should be the data as it is after the ComputeShader ran. No exceptions. Anything different is a bug, which should be reported.

FWIW, you shouldn’t use ComputeBuffer.GetData for anything other than debugging purposes, because reading data back from the GPU is slow, and doing it immediately after dispatching the ComputeShader is making the CPU wait for the GPU to finish doing something. Graphics pipelines are not designed to send data back to the CPU quickly. GPUs like to be told what to do by CPUs, and then left to get on with it. The exception to this guideline is if you use the AsyncGPUReadback API. This API lets you ask the GPU to send something back, without waiting for it to happen. Then, it’s up to you to ask if it’s done yet, and not block the CPU if the data isn’t ready yet. The GPU will usually send it back in something like 1-3 frames. If you’re in a situation where you feel like you need the data back immediately, it may be time to rethink what you’re doing and do some redesigning of your algorithm.

11 Likes

@richardkettlewell Apologies if that came off sounding rude, thanks so much for your explanation, that helps a lot! :slight_smile:

This would be really great info for the docs page on ComputeBuffer.GetData if possible, I’ll leave a suggestion on that page as well.

2 Likes

Thanks for the reply @richardkettlewell , I’ve found it extremely difficult to find info on compute shaders, it’s like researching the holy grail :), so thank you Richard!

2 Likes

I’m sorting out getting this added to the docs :slight_smile:

2 Likes

Is there a way to know if the ComputeShader Kernel has completed, without issuing a GetData() call?
I’m asking because I have a computeshader that needs to be called thousands of times per second. So I’m calling the Dispatch() on FixedUpdate(), inside a loop:

        for (int i = 0; i < numberOfStepsPerFixedUpdate; i++)
        {
            shader.Dispatch(khUpdateSimulation, Mathf.CeilToInt((float)vertices.Length / THREADGROUP_SIZE), 1, 1);
        }

The problem is that I want to call as many loop iterations as possible by changing numberOfStepsPerFixedUpdate dynamically, but I need to check that I’m not calling more iterations than the GPU is able to process. I tried many workarounds, but none looks perfect:

  1. Adding a dummy.GetData(); and retrieve a dummy buffer, which I don’t need, and time how long the loop has run for. If too much, reduce iterations, if too litte, increase.
  2. Monitoring framerate, and lower the loop count if frame rate goes down.

But idea number 1 adds a very costly (performace-wise) GetData() call that I really don’t need…not a good solution
idea number 2 does not really work well, because if frame rate goes down for whatever reason (CPU?) I get an unwanted reduction of the loop count.

I would need something that tells me “okay, now your dispatch calls have completed” without a performance hit…
Any ideas?
Thank you

I forgot: At this time, Target platform is Windows and API is DirectX11

I don’t know of any way to know if some dispatch has been completed on the GPU. GetData won’t let you know that either because it is slow.

I dn’t know exactly what you are trying to achieve, but if you try to maximize the dispatch iterations while keeping a target fps, you can run a growing number of iterations and stop growing it when the frame rate get to the target, including a margin. It will work only if your compute shader thread have little execution divergence, that is. Also, your graphic card will melt your computer ^^

Sorry to necro, but a somewhat related question:

@richardkettlewell , would you mind answering/confirming a couple of GPU side ordering questions/hypothesis?

  • If I call ComputeShader.Dispatch and use a buffer written to by that compute shader in a standard rendering shader (being invoked by Graphics.DrawMeshX), that compute is guaranteed to finish before the draw call happens, is that correct? (basing this off of the GraphicsFence docs, specifically “GPUFences do not need to be used to synchronise a GPU task writing to a resource that will be read as an input by another”)
  • If I dispatch a compute shader kernel several times in a row via CommandBuffer.Dispatch, and all of the dispatches write to the same AppendStructuredBuffer, are each of the dispatches guaranteed to finish before the next dispatch runs?
1 Like
  1. Yes
  2. Yes

:slight_smile:

3 Likes

FYIs to earlier posters, the doc page for GetData was updated some time ago to include info that it always returns up to date results: Unity - Scripting API: ComputeBuffer.GetData

1 Like