ComputeShaders With Multiple Kernels

Is it possible to bind a single ComputeBuffer to two ComputeShader kernels? So far I cannot seem to read from the RWStructuredBuffer in the 2nd kernel, but rather just get 0’s for elements I wrote to in the first kernel. From reading MS DirectCompute threads, there are mentions of unbinding a structured buffer from one kernel before binding it to a second kernel, or at least this is my interpretation. I’m assuming that the ComputeShader.SetBuffer() function does the binding, but how do I unbind? Or am I misinterpreting what I am seeing?

Thanks for any advice anyone can give.

[Edit] I found the problem and it was not related to binding the same buffer to two kernels but rather a problem with SetBuffer and me mistyping the compute shader RWStructuredBuffer name. Interestingly, even though I had not successfully bound the buffer to the shader kernel, I was still able to write to it and read from it within the first kernel. When I tried to read from it in a second kernel, I just got 0’s. Things seem to now be working fine with two kernels and a single ComputeBuffer and RenderTexture bound to both kernels.

1 Like

what does the code look like?
I’m trying to swap buffer and data in the compute shader but I might be running into the same problem
Unity can’t find CSMain kernel.

#pragma kernel CSMain
#pragma kernel SwapBuffer

RWStructuredBuffer<float> output;
StructuredBuffer<float> buffer;

int width;
int height;
StructuredBuffer<float> emitters;
StructuredBuffer<float> obstacles;

[numthreads(8,8,1)]
void CSMain (uint2 id : SV_DispatchThreadID)
{
    int em = emitters[id.x + width * id.y];
    float p = em
        + buffer[id.x + 1 + width * id.y]
        + buffer[id.x - 1 + width * id.y]
        + buffer[id.x + width * (id.y + 1)]
        + buffer[id.x + width * (id.y - 1)]
        ;

    output[id.x] = p * obstacles[id.x + width * id.y];
}

[numthreads(64,1,1)]
void SwapBuffer (uint2 id : SV_DispatchThreadID)
{
    buffer[id.x] = output[id.x];
}

I currently struggle the exact same issue. However the solution or cause you described (misstype of buffer name) does not seem to be the issue. I am sure both kernels are running (I can sucessfully read out results of both of the using AsyncGPUReadback.Request(…) and due to the results I am sure that both kernels are running.
However the results that kernel 1 writes into the TemporaryBuffer, even though they include values when reading them with AsyncGPUReadback.Request(…), are not available in kernel 2 (it has just 0s). So same issue like knchaffin has.

This is how I set the buffers and dispatch to the two kernels.

// Output Texture
patchData.normalMapGPU = new RenderTexture(patchConstants.nPixelsPerEdge, patchConstants.nPixelsPerEdge, 24, RenderTextureFormat.ARGBHalf);
patchData.normalMapGPU.enableRandomWrite = true;
patchData.normalMapGPU.Create();

// Output Buffers
patchData.temporaryBuffer = new ComputeBuffer(patchConstants.nPixelsPerEdgeWithSkirt, 12 + 4, ComputeBufferType.Default);   // Temporary Buffer

// Set Buffers for kernel [1]
this.shader.SetBuffer(this.kernel[1], "PatchConstantsBuffer", patchConstantsBuffer);
this.shader.SetBuffer(this.kernel[1], "BodyConstantsBuffer", bodyConstantsBuffer);
this.shader.SetBuffer(this.kernel[1], "TemporaryBuffer", patchData.temporaryBuffer);

// Dispatch kernel [1]
this.shader.Dispatch(this.kernel[1], patchConstants.nPixelsPerEdgeWithSkirt, patchConstants.nPixelsPerEdgeWithSkirt, 1);   // (Typically 258,258,1)

// Set Buffers for kernel [2]
this.shader.SetBuffer(this.kernel[2], "PatchConstantsBuffer", patchConstantsBuffer);
this.shader.SetBuffer(this.kernel[2], "TemporaryBuffer", patchData.temporaryBuffer);
this.shader.SetTexture(this.kernel[2], "NormalMapTexture", patchData.normalMapGPU);

// Dispatch kernel [2]
this.shader.Dispatch(this.kernel[2], patchConstants.nPixelsPerEdge, patchConstants.nPixelsPerEdge, 1);                // (Typically 256,256,1)

// Request the result
patchData.request = AsyncGPUReadback.Request(patchData.normalMapGPU);
//patchData.request = AsyncGPUReadback.Request(patchData.temporaryBuffer); // For test purposes

And this is how I work with it in the two kernels.

// The structure of the temporary buffer to move the data between the kernals during the highres calculations
    struct TemporaryStruct
    {
        float3 position;
        float noise;
    };
 
 
    //Various input buffers, and an output buffer that is written to by the kernel
    StructuredBuffer<PatchConstantsStruct>            PatchConstantsBuffer;
    StructuredBuffer<BodyConstantsStruct>            BodyConstantsBuffer;
    RWStructuredBuffer<TemporaryStruct>        TemporaryBuffer;
    RWStructuredBuffer<OutputStruct>        OutputBuffer;
    RWTexture2D<float4>                NormalMapTexture;
 
 
 
    // Second kernel to create the position grid for the pixels. We need to create a separate kernel as we want to add the result to the temporary buffer instead of the OutputBuffer
    #pragma kernel CSMain2
 
    [numthreads(1, 1, 1)]
    void CSMain2(uint3 id : SV_DispatchThreadID)
    {
        // Get the constants
        PatchConstantsStruct patchConstants = PatchConstantsBuffer[0];
        BodyConstantsStruct bodyConstants = BodyConstantsBuffer[0];
 
        // Get outBuffOffset
        int outBuffOffset = id.x + id.y * patchConstants.nPerEdgeWithSkirt;
 
        // Get the PatchNormalizedCoord
        float3 patchNormalizedCoord = PatchNormalizedCoord(id.x, id.y, patchConstants.nPerEdge, patchConstants.spacing, patchConstants.eastDirection, patchConstants.northDirection, patchConstants.centerVector);
 
        // Calculate its 'real world' size:
        float3 patchCoord = patchNormalizedCoord * bodyConstants.radiusMeter;
 
        // We determine the 'planet-space' value for the patchCubeCenter:
        // Note we wont be using this variable for the time being but in case we need the center coordinate we have it here
        float3 patchCenter = normalize(patchConstants.centerVector) * bodyConstants.radiusMeter; // patchCenter now sits on the surface of a planet-sized sphere.
 
        // Next we generate the noise value using the patch's 'real-world' coordinate (patchCoord)
        int octaves = bodyConstants.octaves + patchConstants.level;
        octaves = clamp(octaves, 0, 10);
        float noise = FBM(patchCoord, octaves, bodyConstants.frequency, bodyConstants.amplitude, bodyConstants.lacunarity, bodyConstants.persistence);
 
        // We create the height value taking max height into account
        float height = (noise * 2) - 1;                // terrainHeight now ranges from -1 to + 1;
        height = clamp(height, -1, +1);             // We clamp the height to make sure it does not overshoot -1 or +1
        height *= bodyConstants.maxHeightMeter;        // terrainHeight now ranges from -terrainMaxHeight to +terrainMaxHeight.
 
        // This final step adds (or subtracts) the real terrain height from the real world-sized (but centered) patch.
        // Note we apply this to both variables, the normal one and the centered one.
        patchCoord += patchNormalizedCoord * height;
 
        // Result
        TemporaryBuffer[outBuffOffset].position = patchCoord;
        TemporaryBuffer[outBuffOffset].noise = noise;
    }
 
 
 
    // Third kernel to create the normalmap and the slope
    #pragma kernel CSMain3
 
    [numthreads(1, 1, 1)]
    void CSMain3(uint3 id : SV_DispatchThreadID)
    {
        // Get the constants
        PatchConstantsStruct patchConstants = PatchConstantsBuffer[0];
 
        // Get offsets
        int inBuffOffset = (id.x + 1) + (id.y + 1) * patchConstants.nPerEdgeWithSkirt;
        int outBuffOffset = id.x + id.y * patchConstants.nPerEdge;
 
        // Create Normals (Indexes)
        // Create the necessary indexes of surrounding vertices
        int inBuffOffsetNorth = inBuffOffset + 1 * patchConstants.nPerEdgeWithSkirt;
        int inBuffOffsetEast = inBuffOffset + 1;
        int inBuffOffsetSouth = inBuffOffset - 1 * patchConstants.nPerEdgeWithSkirt;
        int inBuffOffsetWest = inBuffOffset - 1;
 
        // Method normals
        float3 sideA, sideB, sideC, sideD;
        float3 normalForward, normalBackward, normal;
        sideA = TemporaryBuffer[inBuffOffsetNorth].position - TemporaryBuffer[inBuffOffset].position;
        sideB = TemporaryBuffer[inBuffOffsetEast].position - TemporaryBuffer[inBuffOffset].position;
        normalForward = cross(sideA, sideB);
        sideC = TemporaryBuffer[inBuffOffsetSouth].position - TemporaryBuffer[inBuffOffset].position;
        sideD = TemporaryBuffer[inBuffOffsetWest].position - TemporaryBuffer[inBuffOffset].position;
        normalBackward = cross(sideC, sideD);
        normal = normalBackward + normalForward;
        //normal = normalize(normalized);
 
        // Create Texture
        float3 normalRGB = float3(normal.x, normal.z, normal.y) / 2 + float3(0.5f, 0.5f, 0.5f);
        uint2 textureID = uint2(id.x, id.y);
        NormalMapTexture[textureID] = float4(normalRGB, 1);
    }
1 Like

There are limits to what can be written and read on the GPU, but it’s unusual to see this with buffers. Try binding the buffer to a separate reference for each kernel and update your code accordingly;

CS:
RWStructuredBuffer<TemporaryStruct> TemporaryBufferWrite;
RWStructuredBuffer<TemporaryStruct> TemporaryBufferRead;

C#:
this.shader.SetBuffer(this.kernel[1], "TemporaryBufferWrite", patchData.temporaryBuffer);
this.shader.SetBuffer(this.kernel[2], "TemporaryBufferRead", patchData.temporaryBuffer);
2 Likes

Thanks for the advice grizzly. Unfortunately there is the same result, data does not reach Kernel 2.
For testing purposes I added typos to the names of the SetData() command in C# and checked if for whatever reason the compiler would not care that it does not match to the ComputeShader so that then this could be the reason.
But it recognized the typo for each of the ComputeBuffer names. So this is definitely not the reason.

I begin to wonder if the AsyncGPUReadback usage might be the reason? Is a ComputeBuffer wiped in a new frame?
When I set the AsyncGPUReadback command after dispatching Kernel 1 and 2 to asynchronously read the NormalMap which is created in Kernel 2 , maybe after Kernel 1 finishes its job and writes to TemporaryBuffer the data is wiped before Kernel 2 starts?

Just a guess. But the only idea I have left…

No, ComputerBuffer data will persist until explicitly wiped/destroyed.

It’s not clear from the snippet you’ve provided how the request is handled. Depending on its size, data can take a few frames to become available. For testing purposes I would suggest using GetData to retrieve the data immediately.

If you can provide more code, or better, a working example to test I’ll happily have a look.

I use the same buffer between two kernels and it works fine. First kernel clears the buffer, the second writes to it. However, I use the same variable name for both kernels (the variable is declared just once in the shader). Try that instead of binding the same buffer to two variables to see if it works.

Yes it should work, but apparently it does not. R/W limitations for certain resources (textures) produce similar results, hence the above working solution for the aforementioned problem was initially suggested.

1 Like

For documentation purpose if someone stumbles across this problem, grizzly was so kind and look at a working example and found the issue. It was the size of the buffer :)eek:) of “TemporaryBuffer”, which was 1-dimensional and not 2-dimensional as it should have been.
Thats really evil as the compiler does not complain about that. buffer sizes are really worth it to look at them twice and fourth when searching for an issue :wink:

patchData.temporaryBuffer = new ComputeBuffer(patchConstants.nPixelsPerEdgeWithSkirt, 12 + 4, ComputeBufferType.Default);

has to become

patchData.temporaryBuffer = new ComputeBuffer(patchConstants.nPixelsPerEdgeWithSkirt * patchConstants.nPixelsPerEdgeWithSkirt, 12 + 4, ComputeBufferType.Default);
3 Likes