Compute Shader odd behaviour on iOS

Hello guys, are there any shader GURUS? I am experiencing an odd behaviour of the Compute Shader that differs from Editor and iOS. What I am trying to achieve is something similar to the Fog Of War. I register new object on the grid and pass their position to the Compute Shader in order to generate a mask that will reveal underlying texture around these objects.
Here is how it looks in Editor:
6842807--796442--Editor 1.png 6842807--796445--Editor 2.png
This is how it looks on iOS:


As you can see on the iOS mask is generated partially. The most noticeable thing is that mask is rendered properly only for the last element of the buffer. Also, after some time, parts of the mask start appearing in stripes. At this point I do not have any assumptions left.
Here is the code for MaskRenderer.cs (I cleaned it up) that registers cells, creates a compute buffer and dispatches it to the compute shader.

private void Awake()
{
    cells = new List<GridCell>();
    //Create a new render texture for the mask
    maskTexture = new RenderTexture(TextureSize, TextureSize, 0, RenderTextureFormat.ARGB32,
        RenderTextureReadWrite.Linear);
    maskTexture.enableRandomWrite = true;
    maskTexture.Create();

    //Set the texture dimension and the mask texture in the compute shader
    computeShader.SetInt(textureSizeId, TextureSize);
    computeShader.SetTexture(0, maskTextureId, maskTexture);

    //We are using the mask texture and the map size in multiple materials
    //Setting it as a global variable is easier in this case
    Shader.SetGlobalTexture(maskTextureId, maskTexture);
    Shader.SetGlobalFloat(mapSizeId, MapSize);

    bufferElements = new List<CellBufferElement>();
}

//Setup all buffers and variables
private int _prevCount;

private void LateUpdate()
{
    //Recreate the buffer since the visibility updates
    bufferElements.Clear();
    foreach (GridCell cell in cells)
    {
            CellBufferElement element = new CellBufferElement
            {
                PositionX = cell.transform.position.x,
                PositionY = cell.transform.position.z,
                Visibility = cell.Visibility
            };
            bufferElements.Add(element);

    }

    if (bufferElements.Count == 0) return;
    if (buffer == null)
    {
        buffer = new ComputeBuffer(bufferElements.Count * 3, sizeof(float));
        _prevCount = bufferElements.Count;
    }
    else if (_prevCount != bufferElements.Count)
    {
        buffer.Dispose();
        buffer = new ComputeBuffer(bufferElements.Count * 3, sizeof(float));
        _prevCount = bufferElements.Count;
    }
   
    //Set the buffer data and parse it to the compute shader
    buffer.SetData(bufferElements);
    computeShader.SetBuffer(0, cellBufferId, buffer);

    //Set other variables needed in the compute function
    computeShader.SetInt(cellCountId, bufferElements.Count);
    computeShader.SetFloat(radiusId, Radius / MapSize);
    computeShader.SetFloat(blendId, BlendDistance / MapSize);

    //Execute the compute shader
    //Our thread group size is 8x8=64,
    //thus we have to dispatch (TextureSize / 8) * (TextureSize / 8) thread groups
    computeShader.Dispatch(0, Mathf.CeilToInt(TextureSize / 8.0f), Mathf.CeilToInt(TextureSize / 8.0f), 1);
}

Here is the compute shader itself:

#pragma kernel CSMain

//General variables
int _CellCount;
int _TextureSize;
float _MapSize;
float _Radius;
float _Blend;

//Buffer containing position (x,y) and visibility(z) of the cells
StructuredBuffer<float> _CellBuffer;

//Mask output texture
RWTexture2D<float4> _Mask;

//Kernel function that "renders" the mask based on the grid cell buffer
[numthreads(8,8,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
    //Reset the pixel value at the start
    _Mask[id.xy] = float4(0, 0, 0, 1);
    //Loop through each cell
    for (int i = 0; i < _CellCount; i++)
    {
        //Calculate the texel and cell center position in uv space [0;1] and the distance between them
        float2 UVPos = id.xy / (float)_TextureSize;
        float2 centerUVPos = float2(_CellBuffer[3 * i], _CellBuffer[3 * i + 1]) / _MapSize;
        float UVDistance = length(UVPos - centerUVPos);

        //Calculate a smooth visibility value for the current cell
        float val = smoothstep(_Radius + _Blend, _Radius, UVDistance) * _CellBuffer[3 * i + 2];
        //Add it to the result if there isn't already a higher visibility value for the current texel
        val = max(_Mask[id.xy].r, val);
        _Mask[id.xy] = float4(val, _Mask[id.xy].g, _Mask[id.xy].b, 1);   
    }
}

Just in case you will wonder, I have tried different numthreads combinations [1,1,1], [2,2,1], [4,4,1]. Texture size is devisable by 2.

The problem is definitely out of my scope. I am hoping for someones help.

1 Like

You are reading from and writing to _Mask[id.xy] multiple times. I am not familiar with Metal, but what you are doing is definitely problematic. Writes to group memory and device memory from compute shaders are basically asynchronous during the shader execution, unless you call one of the barrier functions you are not guaranteed to read what any thread wrote.

Are you running your editor on Windows? It’s possible the Direct3D compiler noticed you writing and reading the same location over and over and is using registers to store the intermediate results before writing the final result to the memory. Meanwhile the iOS Metal compiler was more literal and left the memory accesses as-is.

Instead or writing intermediate results to the buffer, use a local variable instead, and then write the final result after your loop is done. This should work more consistently across different compilers.

Thank you for your reply. Could you please elaborate a little bit more on “local variable”? I am confused.

I figured it out. You were right! I had to use a local variable instead. Thank you. LOVE U!

[numthreads(8,8,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
    //Reset the pixel value at the start
    float4 c = float4(0,0,0,1);
    //Loop through each cell
    for (int i = 0; i < _CellCount; i++)
    {
        //Calculate the texel and cell center position in uv space [0;1] and the distance between them
        float2 UVPos = id.xy / (float)_TextureSize;
        float2 centerUVPos = float2(_CellBuffer[4 * i], _CellBuffer[4 * i + 1]) / _MapSize;
        float UVDistance = length(UVPos - centerUVPos);

        //Calculate a smooth visibility value for the current cell
        float val = smoothstep(_Radius + _Blend, _Radius, UVDistance) * _CellBuffer[4 * i + 2];
        //Add it to the result if there isn't already a higher visibility value for the current texel
        float main = _CellBuffer[4 * i + 3];
        if(main > 0.5f)
        {
            val = max(c.r, val);
            c = float4(val, c.g, c.b, 1);    
        }
        else
        {
            val = max(c.g, val);
            c = float4(c.r, val, c.b, 1);
        }
    }
    _Mask[id.xy] = c;
}