Need help for Compute Shader !

I am just learning and experimenting with writing Compute Shaders ( i am not familiar with shaders too much), and i am trying to do something like a simple raycast and write to a render texture from a Compute Shader. Everything works just perfect, i get the wanted result. The ray-triangle intersections happens really fast - just less than a half of a second. However at the moment i try to apply a new color to the render texture, performance breaks down. The time needed jumps to 5 seconds. I couldn’t break out of a loop without worsening the performance even more. I couldn’t even use a bool flag in the loop which i could use outside the loop to update the texture colors if it is set to true in the loop.

The performance gets really bad. How would i update the render texture colors?

Here is the shader code: Any help is appreciated.

// =============================================================================

//--------------------------------------------------------------------
#pragma kernel MainCS

//--------------------------------------------------------------------
struct Triangle
{
    float3 v0;
    float3 v1;
    float3 v2;
    float3 n;
};

// Precomputed and set from C# script
struct Pixel
{
    float3   position;
    float3   direction;
    int      index;
    float    pixelColor;
};

//-----------------------------------------------------------------------------
#define blocksize 8

// variables
int imageSize;

// buffers
RWStructuredBuffer<Pixel>        pixels    : register(u0); // UAV
RWTexture2D<float4>              rendTex   : register(u1); // UAV
const StructuredBuffer<Triangle> tris      : register(t0); // SRV


// This kernel writes some color in the current pixel if there is ray intersection with some of the triangles from the tris buffer.  In general works well but slow. The intersection part without writing to the render texture is SUPER FAST. When i attempt to write to the texture - gets SUPER SLOW. Render Texture random write is enabled from the C# script

[numthreads(blocksize,blocksize,1)]
void MainCS (uint3 id : SV_DispatchThreadID, uint3 Gid : SV_GroupID, uint3 GTid : SV_GroupThreadID, uint GI : SV_GroupIndex )
{
    // Get the current pixel ID - pixels is 1D array
    int pixelID = (int)(id.y * imageSize + id.x);

    // Ray
    float3 rayO = pixels[pixelID].position;
    float3 rayD = pixels[pixelID].direction;

    // Intersection variables
    float3 pt0, pt1, pt2, edge0, edge1, edge2, cross1, cross2, cross3, n;
    float angle1, angle2, angle3;
    float r, _a, b;
    float3 w0, I;

    bool bIntersect = false;

    [loop][allow_uav_condition]
    for (uint tr = 0; tr < tris.Length; tr++)
    {
        // Somecalculations
        pt0 = tris[tr].v0; pt1 = tris[tr].v1; pt2 = tris[tr].v2;
        edge0 = rayO - pt0; edge1 = rayO - pt1; edge2 = rayO - pt2;

        // First check - is the ray intersecting the triangle
        if (dot(rayD, cross(edge0, edge1)) >= 0.0 ||
            dot(rayD, cross(edge1, edge2)) >= 0.0 ||
            dot(rayD, cross(edge2, edge0)) >= 0.0) continue;

        // Fiding the intersection point
        n = normalize(cross(pt0 - pt1, pt0 - pt2));
        w0 = rayO - pt0;
        _a = -dot(n, w0);
        b  =  dot(n, rayD);
        r  = _a / b;
        I = rayO + rayD * r;
       
        // Second check - before validate the hitpoint
        if (_a < 0.0)
        {
            // Here i would want to update texture colors
            
            // ==============================================
            // Variant 1 =======================================
            // Only update the texture without break;
            // Gives proper result but is SLOW - 3 seconds
            rendTex[id.xy] = float4(1.0, 0.0, 0.0, 1.0);
            // if add break; - MUCH SLOWER
            break;
           
            // ===============================================
            // Variant 2 - Part 1 ==================================
            // rising flag to true - fast
            if(!bIntersect)
            {
                bIntersect = true;
            }
        }
    }

// Variant 2 - Part 2 - When using the flag - updating Render texture colror is SUPER SLOW but acurate
    if(bIntersect)
        rendTex[id.xy] = float4(1.0, 0.0, 0.0, 1.0);
   
// Variant 3 - actually not a variant - but writing to the texture is fast
// If only apply some random color to the texure outside the loop, without using the bIntersect flag is super FAST
    rendTex[id.xy] = float4(1.0, 0.0, 0.0, 1.0); // SUPER FAST - BUT INACURATE
}

Did someone have a problem breaking out of a loop in Compute Shaders or problems with the speed when writing to a render texture ? Another strange thing is that, if all the if statements are removed from the loop and only the texture update is left, the performance is good.

did you ever get this to work? and if so, what was the issue? i tried running through the code, but there’s not enough of it for me to debug properly ;(

i can’t tell if i’m not passing in stuff properly, or it’s just not working, but i can’t get it to work at all…