Calculating Normals of a Mesh in Compute Shader

I’m trying to achieve the exact same result with Unity’s built-in ‘Mesh.RecalculateNormals()’ method.

I can get the same result in a C# script. (script attached below)

I can not get the same result even though I’m using the same algorithm. (compute script attached below)

Can someone point out the thing I’m missing or doing wrong? (picture of a sample result with compute shader attached below)

Some notes:

• I get different results every time I dispatch Compute Shader.
• I have used Unity’s default sphere and a simple sphere created in blender. Same results.
• My ambient light color is black, that is why bottom half of the sphere is complete black. It does not affect the results.

Here’s how I calculate normals in C# - CPU:

``````private void CalculateNormalsCPU()
{
var sphereMesh = MeshFilter.mesh;
var vertices = sphereMesh.vertices;
var triangles = sphereMesh.triangles;
var triangleCount = triangles.Length / 3;

var normals = new Vector3[vertices.Length];

for (var i = 0; i < triangleCount; i++)
{
var triangleIndex = i * 3;
var vertex1 = vertices[triangles[triangleIndex]];
var vertex2 = vertices[triangles[triangleIndex + 1]];
var vertex3 = vertices[triangles[triangleIndex + 2]];

var side1 = vertex2 - vertex1;
var side2 = vertex3 - vertex1;

var triangleNormal = Vector3.Normalize(Vector3.Cross(side1, side2));

normals[triangles[triangleIndex]] += triangleNormal;
normals[triangles[triangleIndex + 1]] += triangleNormal;
normals[triangles[triangleIndex + 2]] += triangleNormal;
}

for (int i = 0; i < vertices.Length; i++)
{
normals[i] = normals[i].normalized;
}
sphereMesh.normals = normals;
}
``````

Here is how I prepare and dispatch my Compute Shader:

``````private void CalculateNormalsComputeShader()
{
var sphereMesh = MeshFilter.mesh;
var vertexCount = sphereMesh.vertexCount;
var triangleCount = sphereMesh.triangles.Length / 3;
sphereMesh.normals = new Vector3[vertexCount];

var trianglesBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Structured, sphereMesh.triangles.Length, sizeof(int));
trianglesBuffer.SetData(sphereMesh.triangles);

sphereMesh.vertexBufferTarget |= GraphicsBuffer.Target.Raw;
var vertexBuffer = sphereMesh.GetVertexBuffer(0);

vertexBuffer.Dispose();
trianglesBuffer.Dispose();
}
``````

``````#pragma kernel CalculateNormals
#pragma kernel NormalizeNormals

#define PI 3.14159265359
#define TAU 6.28318530718

uint VertexCount;
uint TriangleCount;
uint Stride;

StructuredBuffer<uint> Triangles;

{
if (id.x >= TriangleCount) return;

uint triangleIndex = id.x * 3;

uint indexVertex1 = uint(Triangles[triangleIndex]);
uint indexVertex2 = uint(Triangles[triangleIndex + 1]);
uint indexVertex3 = uint(Triangles[triangleIndex + 2]);

float3 vertex1 = asfloat(VertexBuffer.Load3(indexVertex1 * Stride));
float3 vertex2 = asfloat(VertexBuffer.Load3(indexVertex2 * Stride));
float3 vertex3 = asfloat(VertexBuffer.Load3(indexVertex3 * Stride));

float3 side1 = vertex2 - vertex1;
float3 side2 = vertex3 - vertex1;

float3 triangleNormal = normalize(cross(side1, side2));

float3 normalVertex1 = asfloat(VertexBuffer.Load3(indexVertex1 * Stride + 12));
VertexBuffer.Store3(indexVertex1 * Stride + 12, asuint(normalVertex1 + triangleNormal));

float3 normalVertex2 = asfloat(VertexBuffer.Load3(indexVertex2 * Stride + 12));
VertexBuffer.Store3(indexVertex2 * Stride + 12, asuint(normalVertex2 + triangleNormal));

float3 normalVertex3 = asfloat(VertexBuffer.Load3(indexVertex3 * Stride + 12));
VertexBuffer.Store3(indexVertex3 * Stride + 12, asuint(normalVertex3 + triangleNormal));
}

{
if (id.x >= VertexCount) return;
uint vid = id.x * Stride;

float3 normal = asfloat(VertexBuffer.Load3(vid + 12));
VertexBuffer.Store3(vid + 12, asuint(normalize(normal)));
}
``````

The threads in a compute shader run out-of-order with a lot of them running simultaneously.

Easiest way would be to quantize the floats to ints (eg multiply each component by 2^16 or something) and then use atomic operations (InterlockedAdd) to add them directly to the memory location. This would be quite fast and you wouldn’t need to change your algorithm at all. For example, instead of…

``````float3 normalVertex1 = asfloat(VertexBuffer.Load3(indexVertex1 * Stride + 12));
VertexBuffer.Store3(indexVertex1 * Stride + 12, asuint(normalVertex1 + triangleNormal));
float3 normalVertex2 = asfloat(VertexBuffer.Load3(indexVertex2 * Stride + 12));
VertexBuffer.Store3(indexVertex2 * Stride + 12, asuint(normalVertex2 + triangleNormal));
float3 normalVertex3 = asfloat(VertexBuffer.Load3(indexVertex3 * Stride + 12));
VertexBuffer.Store3(indexVertex3 * Stride + 12, asuint(normalVertex3 + triangleNormal));
``````

You would write (untested):

``````float QUANTIIZE_FACTOR = 32768.0;
int3 quantizedNormal = (int3) (triangleNormal * QUANTIIZE_FACTOR);
int ignore;

VertexBuffer.InterlockedAdd(indexVertex1 * Stride + 12, quantizedNormal.x, ignore);
VertexBuffer.InterlockedAdd(indexVertex1 * Stride + 16, quantizedNormal.y, ignore);
VertexBuffer.InterlockedAdd(indexVertex1 * Stride + 20, quantizedNormal.z, ignore);
VertexBuffer.InterlockedAdd(indexVertex2 * Stride + 12, quantizedNormal.x, ignore);
VertexBuffer.InterlockedAdd(indexVertex2 * Stride + 16, quantizedNormal.y, ignore);
VertexBuffer.InterlockedAdd(indexVertex2 * Stride + 20, quantizedNormal.z, ignore);
VertexBuffer.InterlockedAdd(indexVertex3 * Stride + 12, quantizedNormal.x, ignore);
VertexBuffer.InterlockedAdd(indexVertex3 * Stride + 16, quantizedNormal.y, ignore);
VertexBuffer.InterlockedAdd(indexVertex3 * Stride + 20, quantizedNormal.z, ignore);
``````

EDIT: Should be `int` not `uint`, but you probably figured that out.

1 Like

This should definitely work, but the InterlockedAdd do have as I understood some performance impact. I saw another trick in the implementation of Ziva; they calculate the normals per face, not per vertex. and thus there is no race conditions in the compute shader. I guess this is the fastest way of doing normals calculation.

I am tempted to do something like this because my app / game deforms some large mesh and needs to recompute normals every frame. @ay_ahmet did you see a performance improvement compared to unity provided function. I can’t tell if Unity does the job on cpu or gpu.