Compute shader synchronization

Hello !

I need your help today ! I begin to work with compute shader in a really simple use case :
I have a depth camera and I want to calculate the bounding box of an object near to the camera.

But I have too much pixel to process and I want to use GPGPU, compute shader and parallelization to compute this.

I currently have a problem, when I run my program, I have the same min and max coordinates. So I think that all my group and threads write in the same time to my StructuredBuffers.

Do you have an idea to how to do that ?

Thanks a lot !

PS : Sorry for my English, I’m French :smile:

Here is the code of my compute shader :

#pragma kernel ComputeBoundingBox
//We define the size of a group in the x, y and z directions, z direction will just be one
#define thread_group_size_x 1024
#define thread_group_size_y 1
#define thread_group_size_z 1
//Size of the depthData frame
#define width 512;
#define height 424;

//DataBuffer = depthData of the camera
//minBuffer, maxBuffer, array of size 3 with min/max x, y and z
//mask = image area to process
RWStructuredBuffer<float> dataBuffer;
globallycoherent RWStructuredBuffer<float>minBuffer;
globallycoherent RWStructuredBuffer<float> maxBuffer;
RWStructuredBuffer<float> mask;


float xValue = 0, yValue = 0, zValue = 0;

[numthreads(thread_group_size_x, thread_group_size_y, thread_group_size_z)]
void ComputeBoundingBox(uint3 id : SV_DispatchThreadID)
{
    xValue = (id.x + 1) % width;
    yValue = (id.x + 1) / width;
    zValue = dataBuffer[id.x];

    if (mask[id.x] > 0.49)
    {
        if (zValue > 500 && zValue < 1500)
        {
            if (xValue < minBuffer[0])
                minBuffer[0] = xValue;
            else if (xValue > maxBuffer[0])
                maxBuffer[0] = xValue;
            if (yValue < minBuffer[1])
                minBuffer[1] = yValue;
            else if (yValue > maxBuffer[1])
                maxBuffer[1] = yValue;
            if (zValue < minBuffer[2])
                minBuffer[2] = zValue;
            else if (zValue > maxBuffer[2])
                maxBuffer[2] = zValue;
        }
    }
}

Yes, this indeed requires sync, you can try interlocked operations InterlockedExchange function (HLSL reference) - Win32 apps | Microsoft Learn. Any way I really recommend you to re-think this code, because performance is going to be extremely poor:

  • All your threads and thread groups going to fight for locks, as basically all the work is in critical sections
  • You have lots of branching, that will force basically only few threads actually working in group