EDIT: Compute on Metal, GroupMemoryBarrier() not working?

I’m working on a compute shader that does skinning and I get different results on OS X/Metal depending on seemingly arbitrary code changes such as altering the order of unrelated lines. Are there some known rules to follow here or is the HLSL to Metal translation just broken?

The first version of this works, the second doesn’t, even though the only difference is the order of two lines near the end. parents[ ] is a read only StructuredBuffer.

[numthreads(128,1,1)]
void PoseToMatrices (uint3 id : SV_DispatchThreadID)
{
    uint bone;
    int parent;
    half4x4 localMatrix;
    half4x4 fullMatrix;
   
    bone = id.x;
    if (bone >= boneCount)
    {
        return;
    }
   
    localMatrix = CalcLocalMatrix(bone);
    localMatrices[bone] = localMatrix;
       
    // Can't use GroupMemoryBarrierWithGroupSync with early return.
    GroupMemoryBarrier();
   
    parent = parents[bone];
    fullMatrix = CalcFullMatrix(bone, parent, localMatrix);
    WriteMatrix(bone, fullMatrix);
   
    bone = bone + 128;
    if (bone < boneCount)
    {
       parent = parents[bone];
       localMatrix = CalcLocalMatrix(bone);

       fullMatrix = CalcFullMatrix(bone, parent, localMatrix);
       WriteMatrix(bone, fullMatrix);
    }
}
[numthreads(128,1,1)]
void PoseToMatrices (uint3 id : SV_DispatchThreadID)
{
    uint bone;
    int parent;
    half4x4 localMatrix;
    half4x4 fullMatrix;
   
    bone = id.x;
    if (bone >= boneCount)
    {
        return;
    }
   
    localMatrix = CalcLocalMatrix(bone);
    localMatrices[bone] = localMatrix;
       
    // Can't use GroupMemoryBarrierWithGroupSync with early return.
    GroupMemoryBarrier();
   
    parent = parents[bone];
    fullMatrix = CalcFullMatrix(bone, parent, localMatrix);
    WriteMatrix(bone, fullMatrix);
   
    bone = bone + 128;
    if (bone < boneCount)
    {
       localMatrix = CalcLocalMatrix(bone);
       parent = parents[bone];

       fullMatrix = CalcFullMatrix(bone, parent, localMatrix);
       WriteMatrix(bone, fullMatrix);
    }
}

I’ve done some more experimenting and it seems like GroupMemoryBarrier(); isn’t actually doing anything. I have groupshared data which I thought should be protected by that, but maybe I’m misunderstanding something because I’m new to compute shaders? Or maybe it’s a bug in the translation to Metal.

#if HAVE_SIMDGROUP_BARRIER
simdgroup_barrier(mem_flags::mem_threadgroup);
#else
threadgroup_barrier(mem_flags::mem_threadgroup);
#endif

As I said, I’m a compute shader novice, but this bit from the “compiled” shader just seems wrong. Shouldn’t GroupMemoryBarrier() always translate to something that spans the entire thread group and not just the SIMD group?

line 3055 - seems like SYNC_THREADS_IN_GROUP should be set and isn’t

Just a guess, but maybe the GroupMemoryBarrierWithGroupSync() function will work where GroupMemoryBarrier() won’t?

I believe threadgroup_barrier(mem_flags::mem_threadgroup) is exactly equivalent to GroupMemoryBarrierWithGroupSync(), and the simdgroup_barrier(mem_flags::mem_threadgroup) call is roughly equivalent to GroupMemoryBarrier() because neither of the latter guarantees thread group execution sync, only group-shared memory read/write sync.

If you aren’t able to use GroupMemoryBarrierWithGroupSync() you’ll probably have to restructure your code to allow it.

1 Like

No, unfortunately that doesn’t help. I’ve tried it both ways.
On the other hand it does produce the code that I expected from the original version
threadgroup_barrier(mem_flags::mem_threadgroup);
So I guess the actual bug is more complicated than I was thinking.

The error is related to group-shared memory read/write sync, I’m getting invalid data read from a groupshared array after the barrier.

@ecurtz old thread, I know. Does this mean group shared memory is working on iOS now? I tested it way back, but haven’t lately.

Unfortunately I gave up on that method and rewrote the shader so I don’t know if this was ever fixed (or even if it was working as intended and I misinterpreted something in the documentation.) If you do try it, please let us know your results!

1 Like