Compute shaders

Hey guys,

I was hoping to try using compute shaders to do some computationally heavy work.
Lets take something simple like terrain generation … if done on the cpu you need to run the generation on a second thread but its usually noise based and maybe 1 or 2 octaves might get you a result within about 10 seconds on a typical patch of terrain.

But what if iwanted to make really detailed terrains that require say, 7 or 8 octaves of noise?

So that got me thinking what if i could do something like this in a compute shader …

#pragma kernel Generate

RWStructuredBuffer<float3> vertexBuffer : register(u0);

float3[] genVertsAt(uint2 xzPos)
{
   //TODO: put some height generation code here.
   //      could even run marching cubes / dual contouring code.
}

[numthreads(32, 1, 32)]
void Generate (uint3 threadId : SV_GroupThreadID, uint3 groupId : SV_GroupID)
{
   uint3 currentXZ =  groupId * uint3(32, 1, 32) + threadId;
   vertexBuffer.append(genVertsAt(currentXZ.xz));
}

And then something like this in a unity script …

using UnityEngine;
using System.Collections;

public class Test : MonoBehaviour
{
   public ComputeShader Generator;
   public MeshTopology Topology;

   void OnEnable()
   {
      var computedMeshPoints = ComputeMesh();
      CreateMeshFrom(computedMeshPoints);
   }

   private Vector3[] ComputeMesh()
   {
      var size = 32*32;
      var buffer = new ComputeBuffer(size, 12, ComputeBufferType.Append);
      Generator.SetBuffer(0, "vertexBuffer", buffer);
      Generator.Dispatch(0, 1, 0, 0);
      var results = new Vector3[size];
      buffer.GetData(results);
      buffer.Dispose();
      return results;
   }

   private void CreateMeshFrom(Vector3[] generatedPoints)
   {
      var filter = GetComponent<MeshFilter>();
      var renderer = GetComponent<MeshRenderer>();

      if (generatedPoints.Length > 0)
      {
         var mesh = new Mesh { vertices = generatedPoints };
         var indices = new int[generatedPoints.Length];

         //TODO: build this different based on topology of the mesh being generated
         for (int i = 0; i < indices.Length; i++)
            indices[i] = i;

         mesh.SetIndices(indices, Topology, 0);

         mesh.RecalculateNormals();
         mesh.Optimize();
         mesh.RecalculateBounds();
      }
      else
      {
         filter.sharedMesh = null;
      }
   }
}

I can’t seem to find a way to do this that works though.

In my case i’m building “floating islands” so i need to consider more than just x and z values before i can go generating a y, i’ve been trying to figure out a way to generate voxel volumes and then generate mesh from the voxel volume all in compute shaders but unless i can beat the basics i may have to rethink my plan.

Anyone fancy giving this a try and then maybe giving me some ideas how to write this?

I also would love to hear from anyone who has used any of the following …

Compute shaders (generally speaking).
Append buffers (handy when generating mesh data from a voxel volume).
Getting data from the gpu back on to the cpu.

I’m keen to find a way to do this without relying on Texture3D because I would like to find a solution that works for users that don’t have unity pro.
Seems like such an odd thing to limit … its just a render target after all.

EDIT: Looking at it a second time, it appears I misread your computer shader, and that you do intend to have it generate verts for the toplogy. You can probably disregard this post but I’ll leave it here, you know, for posterity.

I’m really not the best person to respond to this, but here goes.
As far as I can tell, you’re only generating a vertex per index. I haven’t used MeshTopology but I’m not under the impression that it will triangulize your vertex data for you. If it does, I should really look into using it. I believe you have to give it an array where every 3 verts defines a triangle (or 4/quad, depending on your topology) if you want the mesh to have a surface. As far as I can tell, you’re just generating an array of row-ordered verts. You might need to pass it to a second kernel that spits out quad or tri verts from it. That might be the only thing holding you back here. Then again, I’ve never done this so what do I know.
For what it’s worth, there’s also the option of spitting out a displacement map instead, or, as I’ve noticed much of the compute shader crowd are fond of doing, passing the data into a geometry shader to really leave the CPU out of the equation.

Also, I could very easily be wrong, but I was under the impression that compute shaders aren’t supported in the free version.

P.S. When are these browser spell-checkers going to learn about shaders being a thing?

The code was more “seudo code” than verbatim …
As far as i can tell calling buffer.append() and giving it a float3[ ] is technically invalid anyway … i would make a call for each float3 i wanted to append to the buffer.

MeshTopology was more to tell the calling code how the index information should be constructed (currently just fixed based on the order of the verts being correct, this would likely work with points only).

I see this being the second part of a 4 stage rendering process in a voxel based unity package i’m working on.

The idea was that I would have a voxel buffer and a vertex buffer, then i would do something like …

  1. generate voxel buffer on the gpu
  2. pass that to mesh generator (suedo code above) to generate a vertex buffer.
  • vertex buffers could be generated from a portion of the voxel data.
  1. Pull the resulting mesh data back from gpu and build a unity mesh from it.
  2. add material information and render.

3 of those stages involving shaders, 1, 2, and 4 … stage 3 simply takes a buffer result and works on it in CPU code.

My thinking is that this approach could be much faster than a completely CPU based approach but i can’t seem to figure out how to work with append buffers.

I misread the compute shader initially, thinking it was spitting out 1 vertex per index. I’m not clear on what problem you’re having with the append buffer, but it looks like you’ve only given it enough space for 1 vertex per index (3232(vector3size = 12)). If you’re going to make cubes at each point, or even just quads, you’re going to need to give it a lot more space.

Also,

Generator.Dispatch(0, 1, 0, 0);

It looks looks like you’re dispatching it with 0 threads (1 * 0 * 0).
Seems like you’d want

Generator.Dispatch(0, 32, 1, 32);

I believe you are correct (good catch).
Here’s some more “suedo code” that attempts to get across what i’m trying to achieve …

using UnityEngine;
using System.Collections;

public class GPUGeneratedGameObject : MonoBehaviour {

   ComputeBuffer buffer;
   public ComputeShader VoxelGenerator;
   public ComputeShader MeshGenerator;
   public MeshTopology Topology;

   void OnEnable()
   {
      ComputeVoxels();
      var verts = GenerateMeshVerts();
      BuildMeshFrom(verts);
   }

   private void ComputeVoxels()
   {
      var size = (8 * 8 * 8); // thread group size
      buffer = new ComputeBuffer(size, 12, ComputeBufferType.Append);
      VoxelGenerator.SetBuffer(0, "vertexBuffer", buffer);
      VoxelGenerator.Dispatch(0, 10, 10, 10); // total voxel array size = (80 * 80 * 80), with each "chunk" being (8 * 8 * 8)
   }

   private Vector3[] GenerateMeshVerts()
   {
      var size = 4096;
      var vertBuffer = new ComputeBuffer(size, 12, ComputeBufferType.Append);
      MeshGenerator.SetBuffer(0, "voxelBuffer", buffer);
      MeshGenerator.SetBuffer(0, "vertexBuffer", vertBuffer);
      // when expanding later it might be worth adding more params to handle pulling a vert buffer for a 
      // "chunk" of voxels instead of the whole array ...
      MeshGenerator.Dispatch(0, 10, 10, 10);
      var results = new Vector3[size];
      vertBuffer.GetData(results);
      vertBuffer.Dispose();

      return results;
   }

   void BuildMeshFrom(Vector3[] vertices)
   {
      //TODO: handle the vert array is too big or empty
      var filter = GetComponent<MeshFilter>();
      var mesh = new Mesh { vertices = vertices };
      mesh.SetIndices(CreateIndicesFor(vertices), Topology, 0);
      filter.sharedMesh = mesh;
   }

   int[] CreateIndicesFor(Vector3[] verts)
   { 
      var result = new int[verts.Length];
      switch (Topology)
      { 
         //TODO: build based on topology
      }

      return result;
   }

   void OnDisable()
   {
      buffer.Dispose();
   }
}

This question was really about the second compute shader and how I can make a vertex buffer for a mesh based on an unkown number of verts to begin with.
I would “crawl” the voxel buffer in the mesh generation shader and emit triangles / quads for each visible voxel surface but since i can’t seem to get it to give me the simplest of values back from the shader / buffer to the cpu i’ve hit a bit of a brick wall.

Any ideas?

My understanding of …
Generator.Dispatch(0, 32, 1, 32);

… is that you are saying …

shader.Dispatch(kernelIndex, xThreadGroups, yThreadGroups, zThreadGroups);

… each group will have [numthreads(x,y,z)] from the attribute on the compute function.

I could of course be wrong.

EDIT:
I tried this by the way, it didn’t resolve the problem :frowning:

I’m a little unclear as to what the actual problem you’re experiencing is. Can you tell us what is happening? Are you getting errors or is it just not giving you the result you’re expecting (if this, then what are you seeing vs what you want to be seeing) ?

Just a long shot, but I do notice that in your pseudo-code you use ‘RWStructuredBuffer’ and ‘.append’. If that’s what you have in your actual compute shader, I don’t think that will work. You’d need ‘AppendStructuredBuffer’ and ‘.Append’ (capital A), I believe.

I actually tested your code, with a few changes.

#pragma kernel Generate

AppendStructuredBuffer<float3> vertexBuffer : register(u0);

[numthreads(32, 1, 32)]
void Generate(uint3 threadId : SV_GroupThreadID, uint3 groupId : SV_GroupID)
{	
	uint3 currentXZ = groupId * uint3(32, 1, 32) + threadId;
	vertexBuffer.Append(float3(currentXZ.x / 10.0, sin(currentXZ.x / 10.0)*cos(currentXZ.z / 10.0), currentXZ.z/10.0));
}
using UnityEngine;
using System.Collections;

public class Test : MonoBehaviour
{
	public ComputeShader Generator;
	public MeshTopology Topology = MeshTopology.Points;
	
	void Start()
	{
		var computedMeshPoints = ComputeMesh();
		CreateMeshFrom(computedMeshPoints);
	}
	
	private Vector3[] ComputeMesh()
	{
		var size = 32*32;
		var buffer = new ComputeBuffer(size, 12, ComputeBufferType.Append);
		Generator.SetBuffer(0, "vertexBuffer", buffer);
		Generator.Dispatch(0, 1, 1, 1);
		var results = new Vector3[size];
		buffer.GetData(results);
		buffer.Dispose();
		return results;
	}
	
	private void CreateMeshFrom(Vector3[] generatedPoints)
	{
		var filter = GetComponent<MeshFilter>();
		var renderer = GetComponent<MeshRenderer>();
		
		if (generatedPoints.Length > 0)
		{
			var mesh = new Mesh { vertices = generatedPoints };
			var indices = new int[generatedPoints.Length];

			for (int i = 0; i < indices.Length; i++)
				indices[i] = i;
			
			mesh.SetIndices(indices, Topology, 0);
			filter.mesh = mesh;
		}
		else
		{
			filter.sharedMesh = null;
		}
	}
}

And got this:
1516469--86447--$mesh.png

Hey Dan,

Odd … i copied your code in to unity and got a blank screen when i ran it.
I put a breakpoint on line 23 of the CPU code and was showing an array of 1024 Vector3(0,0,0) any idea why / what might cause that?

Scrap that … all this time it turns out i was referring to a second copy of the shader file in unity (i must have screwed up setting this up on step 1).

I’m gonna go cry about how pathetic a screw up that was and hide in shame now.

Thanks for your help though :slight_smile:
Massively useful.

Odd question though before i go die …

Why do you basically have to declare it like an array and know what the size is before you can call “Append” … surely the buffer should be of size 0 then increase each time you call “Append” on it more like a generic list in C#?

I don’t imagine that the shader units themselves are physically capable of dynamically allocating memory, by which I mean that I don’t imagine they have operands that would allow them to allocate memory themselves. Geometry shaders, in a sense can accomplish this, in that they can produce a variable length output (up to some limit, which is probably pre-allocated in full). The problem with dynamic memory allocation is that it’s slow and infrequently required, so there probably hasn’t been much demand for it. There are ways to fake it, but every way I’ve heard of still requires you to allocate a fixed-length buffer first; then you can sort of emulate dynamic memory management of that buffer between multiple outputs. Of course, you could also just keep track of how much you actually used after each compute, and periodically check to see if it’s worth re-allocating the buffer to a different size, either to free up excessive unused space or to increase your cushion.

Oh right …
Does seem odd though that you get something that behaves like a list but actually is more like an array “under the bonnet” so to speak lol.

I’m making progress now I have pointed in the right direction … thank you for that, i’ve been asking about that block of code for weeks, most people just give me vague answers like “go read a book” or “heres a link to a page you won’t understand”, good to get some common sense for a change !!!