Crash when using groupshared memory + constant buffer on OpenGLES3

I just spend my entire day trying to figure out why a ComputeShader was crashing my project and I now can finally reproduce it consistently.

I declare a groupshared memory buffer and dispatch a single thread group to fill it with numbers. Works perfectly (also on device, in this case Meta Quest2). I introduce a constant buffer read, and the Editor crashes with the log message:

Assertion failed on expression: ‘cbStateIndex < m_ConstantBufferStates.size()’

The crash only happens on OpenGLES3 (Android). I can’t make Vulcan work with groupshared at all. It works fine on DirectX11, but that defeats the purpose since I’m developing for Quest2.

EDIT:
I have now tested the same setup in 2020.3.31f1 and that works. However, that does not help me because other dependencies in my project prevents me from downgrading.

EDIT2:
I realised I used a SetFloat() that should have been a SetInt() … see post below.
```csharp
*using UnityEngine;
public class Test : MonoBehaviour
{
void Start()
{
Debug.Log( "Groupshared memory working: " + AssertGroupSharedMemory() );
}
bool AssertGroupSharedMemory()
{
const int count = 10;
var computeShader = Instantiate( Resources.Load( “Test” ) );
int kernel = computeShader.FindKernel( “Execute” );

    // Create, fill and set a buffer of numbers.
    var buffer = new ComputeBuffer( count, sizeof( int) );
    int[] numbers = new int[ count ];
    for( int i = 0; i < count; i++ ) numbers[ i ] = i;
    buffer.SetData( numbers );
    computeShader.SetBuffer( kernel, "_Buffer", buffer );

    // Upload a value to a Unity managed constant buffer.
    const int constantValue = 1;
    computeShader.SetInt( "_ConstantValue", constantValue );

    // Compute the result on the CPU.
    int correctResult = 0;
    for( int i = 0; i < numbers.Length; i++ ){
        correctResult += numbers[ i ] + constantValue;
    }

    // Dispatch a single thread group (thread group size == groupshared buffer size).
    computeShader.Dispatch( kernel, 1, 1, 1 );

    // Readback the data. 
    int[] data = new int[ count ];
    buffer.GetData( data );

    // Check whether CPU and GPU agrees.
    bool success = data[ 0 ] == correctResult;

    // Clean up after the party.
    buffer.Release(); 
    Destroy( computeShader );

    return success;
}

}*
```

And the ComputeShader “Test.compute” located in a Resources folder:

```csharp
*#pragma kernel Execute

#define COUNT 10

RWStructuredBuffer _Buffer;

int _ConstantValue;

groupshared int sharedNumbers[ COUNT ];

[numthreads(COUNT,1,1)]
void Execute(
uint gi : SV_GroupIndex // Local index within group
){
// Read number and store it in shared memory.
sharedNumbers[ gi ] = _Buffer[ gi ];

// Wait until all threads in this group reach this line.
GroupMemoryBarrierWithGroupSync();

// First thread (in group) does the rest of the work.
if( gi > 0 ) return;

int sum = 0;
for( int i = 0; i < COUNT; i++ ){
    sum += sharedNumbers[ i ];
    sum += _ConstantValue; //CRASH!! This line triggers the crash.
}

// Store for later readback (at index 0).
_Buffer[ gi ] = sum;

}*
```
Is this a bug, or am I missing something?

I am trying to compare any differences between 2020.3.31 and 2021.2.16, to find out why the first works and the latter does not. I setup two projects with the same settings.

For some reason, in 2020 the ComputeShader is compiled for both Direct3D11 and OpenGLES3, while in 2021 it is only compiled for OpenGLES3.

Why? Like I wrote, I have the exact same settings, including Graphics API in Player Settings. Direct3D11 for Windows, and OpenGLES3 for Android.

2020.3.31

7987440--1026216--Compile2020.jpg

**** Platform OpenGL ES 3:
Compiled code for kernel Execute
keywords: <none>
#version 310 es
#define HLSLCC_ENABLE_UNIFORM_BUFFERS 1
#if HLSLCC_ENABLE_UNIFORM_BUFFERS
#define UNITY_UNIFORM
#else
#define UNITY_UNIFORM uniform
#endif
#define UNITY_SUPPORTS_UNIFORM_LOCATION 1
#if UNITY_SUPPORTS_UNIFORM_LOCATION
#define UNITY_LOCATION(x) layout(location = x)
#define UNITY_BINDING(x) layout(binding = x, std140)
#else
#define UNITY_LOCATION(x)
#define UNITY_BINDING(x) layout(std140)
#endif
UNITY_BINDING(0) uniform CGlobals {
    float _ConstantValue;
};
struct _Buffer_type {
    int[1] value;
};
layout(std430, binding = 0) buffer _Buffer {
    _Buffer_type _Buffer_buf[];
};
int u_xlati0;
int u_xlati1;
float u_xlat2;
int u_xlati2;
bool u_xlatb2;
shared struct {
    uint value[1];
} TGSM0[10];
layout(local_size_x = 10, local_size_y = 1, local_size_z = 1) in;
void main()
{
    u_xlati0 = int(_Buffer_buf[gl_LocalInvocationIndex].value[(0 >> 2) + 0]);
    TGSM0[gl_LocalInvocationIndex].value[(0 >> 2)] = uint(u_xlati0);
    memoryBarrierShared();
    barrier();
    if(gl_LocalInvocationIndex != uint(0)) {
        return;
    }
    u_xlati0 = int(0);
    for(int u_xlati_loop_1 = int(0) ; u_xlati_loop_1<10 ; u_xlati_loop_1++)
    {
        u_xlati2 = int(TGSM0[u_xlati_loop_1].value[(0 >> 2) + 0]);
        u_xlati2 = u_xlati2 + u_xlati0;
        u_xlat2 = float(u_xlati2);
        u_xlat2 = u_xlat2 + _ConstantValue;
        u_xlati0 = int(u_xlat2);
    }
    _Buffer_buf[gl_LocalInvocationIndex].value[(0 >> 2)] = u_xlati0;
    return;
}
**** Platform Direct3D 11:
Compiled code for kernel Execute
keywords: <none>
binary blob size 552:
//
// Generated by Microsoft (R) D3D Shader Disassembler
//
//
// Input signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Input
//
// Output signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Output
      cs_5_0
      dcl_globalFlags refactoringAllowed
      dcl_constantbuffer CB0[1], immediateIndexed
      dcl_uav_structured u0, 4
      dcl_input vThreadIDInGroupFlattened
      dcl_temps 1
      dcl_tgsm_structured g0, 4, 10
      dcl_thread_group 10, 1, 1
   0: ld_structured_indexable(structured_buffer, stride=4)(mixed,mixed,mixed,mixed) r0.x, vThreadIDInGroupFlattened.x, l(0), u0.xxxx
   1: store_structured g0.x, vThreadIDInGroupFlattened.x, l(0), r0.x
   2: sync_g_t
   3: if_nz vThreadIDInGroupFlattened.x
   4:   ret
   5: endif
   6: mov r0.xy, l(0,0,0,0)
   7: loop
   8:   ige r0.z, r0.y, l(10)
   9:   breakc_nz r0.z
  10:   ld_structured r0.z, r0.y, l(0), g0.xxxx
  11:   iadd r0.z, r0.z, r0.x
  12:   itof r0.z, r0.z
  13:   add r0.z, r0.z, cb0[0].x
  14:   ftoi r0.x, r0.z
  15:   iadd r0.y, r0.y, l(1)
  16: endloop
  17: store_structured u0.x, vThreadIDInGroupFlattened.x, l(0), r0.x
  18: ret
// Approximately 0 instruction slots used

In 2021.2.16

7987440--1026219--Compile2021.jpg

**** Platform OpenGL ES 3:
Compiled code for kernel Execute
keywords: <none>
#version 310 es

#define HLSLCC_ENABLE_UNIFORM_BUFFERS 1
#if HLSLCC_ENABLE_UNIFORM_BUFFERS
#define UNITY_UNIFORM
#else
#define UNITY_UNIFORM uniform
#endif
#define UNITY_SUPPORTS_UNIFORM_LOCATION 1
#if UNITY_SUPPORTS_UNIFORM_LOCATION
#define UNITY_LOCATION(x) layout(location = x)
#define UNITY_BINDING(x) layout(binding = x, std140)
#else
#define UNITY_LOCATION(x)
#define UNITY_BINDING(x) layout(std140)
#endif
UNITY_BINDING(0) uniform CGlobals {
    float _ConstantValue;
};
struct _Buffer_type {
    int[1] value;
};

layout(std430, binding = 0) buffer _Buffer {
    _Buffer_type _Buffer_buf[];
};
int u_xlati0;
int u_xlati1;
float u_xlat2;
int u_xlati2;
bool u_xlatb2;
shared struct {
    uint value[1];
} TGSM0[10];
layout(local_size_x = 10, local_size_y = 1, local_size_z = 1) in;
void main()
{
    u_xlati0 = int(_Buffer_buf[gl_LocalInvocationIndex].value[(0 >> 2) + 0]);
    TGSM0[gl_LocalInvocationIndex].value[(0 >> 2)] = uint(u_xlati0);
    memoryBarrierShared();
    barrier();
    if(gl_LocalInvocationIndex != uint(0)) {
        return;
    }
    u_xlati0 = int(0);
    for(int u_xlati_loop_1 = int(0) ; u_xlati_loop_1<10 ; u_xlati_loop_1++)
    {
        u_xlati2 = int(TGSM0[u_xlati_loop_1].value[(0 >> 2) + 0]);
        u_xlati2 = u_xlati2 + u_xlati0;
        u_xlat2 = float(u_xlati2);
        u_xlat2 = u_xlat2 + _ConstantValue;
        u_xlati0 = int(u_xlat2);
    }
    _Buffer_buf[gl_LocalInvocationIndex].value[(0 >> 2)] = u_xlati0;
    return;
}

Hi!
It would be great if you could file a bug report. We’ll take a look at what’s happening.
Thank you!

Thanks, it’s already been done.
https://fogbugz.unity3d.com/default.asp?1413012_1dsqcjek57bg531m

I hope there is another solution than to wait for a new version of Unity, because this really is a stopper for my current project.

If we find a workaround we’ll post here :slight_smile:

1 Like

It seems the crash produced by the code above was caused by a type mismatch. I was adding a float from a constant buffer to an integer before storing the result (as a float) into the groupshared memory of integers without first casting to an integer. Curiously, this crash did not happen in 2020.3.

However I still keep getting crashes in my project, and the same failed assertion “‘cbStateIndex < m_ConstantBufferStates.size()’” in my Editor log.

I will return if I can make a reproducible example.

@aleksandrk

Ok. It turns out that this has nothing to do with groupshared. It is more related to my earlier post about compute shaders not recompiling instantly in in 2021:

Below is an example that crashes Unity 2021.2.16f1 on my Windows machine. I works fine on the same machine in 2020.3.31f1. The example just fills a compute buffer with a value from a constant buffer. Nothing special at all.

The crash does not happen at every Play execution (driving me nuts). It happens only right after the compute shader has been edited. When I edit and run other compute shaders, that are not using constant buffer values, I see that changes are applied only after second execution. This would explain why the constant buffer (in the compute shader below) is in some kind of unknown state the first time it is executed. For this reason, after the Editor is restarted, the compute shader has been compiled and the crash does not happen … but when I edit it and Play again. Crash.

I would greatly appreciate if someone would test the example below as described above. I feel like I am rapidly losing my mind here. Oh and remember the platform target has to be Android with Graphics API OpenGLES3.

EDIT:
The same issue seems to apply to 2022.1.0b13. So no relief there.

EDIT2:
Because this perhaps has something to do with the triggering of compilation, I checked that I am using the same code editor in 2020 and 2021. I am. It is Visual Studio Editor v. 16.4.3 and the Unity package by the same name v. 2.0.4.

EDIT3:
I have tried disabling “Asynchronous Shader Compilation” in the Editor settings. It made no difference. Also noticed that the shader sometimes need more than two Play executions to compile the edited shader.

using UnityEngine;

public class Test2 : MonoBehaviour
{
    void Start()
    {
        const int count = 10;

        // Create compute shader and find kernel.
        var computeShader = Instantiate( Resources.Load<ComputeShader>( "Test2" ) );
        int kernel = computeShader.FindKernel( "Execute" );

        // Create and set a buffer.
        var buffer = new ComputeBuffer( count, sizeof( float ) );
        computeShader.SetBuffer( kernel, "_Buffer", buffer );

        // Upload a value to a Unity managed constant buffer.
        computeShader.SetFloat( "_ConstantValue", 1f );

        // Dispatch 10 thread groups (each group with one thread).
        computeShader.Dispatch( kernel, count, 1, 1 );

        // Readback the data and log.
        float[] data = new float[ count ];
        buffer.GetData( data );
        Debug.Log( string.Join( ", ", data ) );

        // Clean up after the party.
        buffer.Release();
        Destroy( computeShader );
    }
}

And the ComputeShader “Test2.compute” located in a Resources folder:

#pragma kernel Execute

RWStructuredBuffer<float> _Buffer;

float _ConstantValue;

[numthreads(1,1,1)]
void Execute( uint ti : SV_DispatchThreadID)
{
    _Buffer[ ti ] = _ConstantValue;
}

We’ll investigate it as soon as the bug report reaches us :slight_smile:

1 Like

As a user, I can’t find a way to link the bug case to this thread. If you have the ability to do so, you are most welcome. CASE 1413012. Ticket 1413012_1dsqcjek57bg531m.

Yeah, I did already a couple of days ago :slight_smile:

1 Like

I tried to update my project again from 2020.3.31f1 to 2021.2.17f1 and noticed something odd. After conversion, I hit Play and to my surprise everything worked. Then I opened one of the compute shaders, added a one space character at the end of a line, went back in the Editor and hit Play again … and this resulted in an immediate crash leaving the same error message as mentioned earlier in the Editor log. So perhaps something with shader recompilation?

Maybe it’s not clearing some runtime data immediately… We’ll investigate :slight_smile:

Good news! The bug was reproduced and is now active on the issue tracker :slight_smile:

Yep, I already looked at it.
A temporary workaround is to not use uniforms / constant buffers in your shader. It also happens only if the graphics API the Editor uses is not in the graphics API list for the current build target.

Thank you for looking at this @aleksandrk .

That’s like a car without wheels for me I’m afraid, haha =)

Well, you could put your data in a structured buffer :slight_smile:

1 Like

Fixed in Unity 2022.2.0a11! Goodbye 2020.3! =D

It will be backported :slight_smile:
Although it looks like it wasn’t reproducible on 2020.3