Compute shaders strange error

I have a compute shader and the C# script which goes with it used to modify an array of vertices on the y axis simple enough to be clear.

But despite the fact that it runs fine the shader seems to forget the first vertex of my shape (except when that shape is a closed volume?)

Here is the C# class :

Mesh m;
//public bool stopProcess = false; //Useless in this version of exemple
MeshCollider coll;
public ComputeShader csFile; //the compute shader file added the Unity way
Vector3[] arrayToProcess; //An array of vectors i'll use to store data
ComputeBuffer cbf; //the buffer CPU->GPU (An early version with exactly 
                   //the same result had only this one)
ComputeBuffer cbfOut; //the Buffer GPU->CPU
int vertexLength;

void Awake() { //Assigning my stuff
  coll = gameObject.GetComponent<MeshCollider>();
  m = GetComponent<MeshFilter>().sharedMesh;
  vertexLength = m.vertices.Length;
  arrayToProcess = m.vertices; //setting the first version of the vertex array (copy of mesh)

void Start () {

   cbf = new ComputeBuffer(vertexLength,32); //Buffer in
   cbfOut = new ComputeBuffer(vertexLength,32); //Buffer out


void Update () {
   csFile.Dispatch(0,vertexLength,vertexLength,1); //Dispatching (i think there is my mistake)
   cbfOut.GetData(arrayToProcess); //getting back my processed vertices
   m.vertices = arrayToProcess; //assigning them to the mesh
   //coll.sharedMesh = m; //collider stuff useless in this demo

And my compute shader script :

#pragma kernel CSMain

RWStructuredBuffer<float3> Board : register(s[0]);
RWStructuredBuffer<float3> BoardOut : register(s[1]);

float time;

void CSMain (uint3 id : SV_DispatchThreadID)
	float valx = (sin((time*4)+Board[id.x].x));
	float valz = (cos((time*2)+Board[id.x].z));
    Board[id.x].y = (valx + valz)/5;
    BoardOut[id.x] = Board[id.x];

At the beginning I was reading and writing from the same buffer, but as I had my issue I tried having separate buffers, but with no success. I still have the same problem.

Maybe I misunderstood the way compute shaders are supposed to be used (and I know I could use a vertex shader but I just want to try compute shaders for further improvements.)

To complete what I said, I suppose it is related with the way vertices are indexed in the Mesh.vertices Array.

I tried a LOT of different Blocks/Threads configuration but nothing seems to solve the issue combinations tried :

Block            Thread   
60,60,1         1,1,1
1,1,1           60,60,3
10,10,3         3,1,1

and some others I do not remember. I think the best configuration should be something with a good balance like :

Block : VertexCount,1,1 Thread : 3,1,1

About the closed volume: I’m not sure about that because with a Cube {8 Vertices} everything seems to move accordingly, but with a shape with an odd number of vertices, the first (or last did not checked that yet) seems to not be processed

I tried it with many different shapes but subdivided planes are the most obvious, one corner is always not moving.


After further study i found out that it is simply the compute shader which does not compute the last (not the first i checked) vertices of the mesh, it seems related to the buffer type, i still dont get why RWStructuredBuffer should be an issue or how badly i use it, is it reserved to streams? i cant understand the MSDN doc on this one.

This is a duplicate of a question i asked on SO : c# - Unity Compute Shaders Vertex Index error - Stack Overflow

Question answered properly here : StackOverflow

I’m familiar with Compute Shaders but have never touched Unity, but having looked over the documentation for Compute Shaders in Unity a couple of things stand out.

The cbf and cbfOut ComputeBuffers are created with a stride of 32 (bytes?). Both your StructuredBuffers contain float3s which have a stride of 12 bytes, not 32. Where has 32 come from?

When you dispatch your compute shader you’re requesting a two-dimensional dispatch (vertexLength,vertexLength, 1) but you’re operating on a 1D array of float3s. You will end up with a race condition where many different threads think they’re responsible for updating each element of the array. Although awful for performance, if you want a thread group size of [numthreads(1,1,1)] then you should dispatch (vertexLength, 1, 1) numbers of waves/wavefronts when calling Dispatch (ie, Dispatch (60,1,1) with numThreads(1,1,1)).

For best/better performance the number of threads in your thread group / wave should at least be a multiple of 64 for best efficiency on AMD hardware. You then need only dispatch ceil(numVertices/64) wavefronts and then simply insert some logic into the shader to ensure id.x is not out of bounds for any given thread.


The documentation for the ComputeBuffer constructor is here: Unity ComputeBuffer Documentation While it doesn’t explicitly say “stride” is in bytes, it’s the only reasonable assumption.

Original answer from Adam Miles on StackOverflow

I don’t have a pro version here and i don’t even have a DX11 card :D. So i can’t say much about the compute shaders, however you might try a simple shader just to prove it processes each element.

For example passing in an array with all elements (0,0,0) and let the shader just do

    BoardOut[id.x].x = 1.0