Context:
I have grass that you can cut like in Legends of Zelda: Breath of the Wild.
You can also build stuff where the grass used to be
This will ‘destroy’ the grass.
Destroying grass is executed by recalculating if there should be grass or not in the while 8x8x8 block.
The issue arises when the total number of grass is different, the ‘length’ values for each blade of grass is not in the same index anymore, the only way I know to fix this is by doing a O(n*n) brute force that checks if the new grass exists in the old grass buffer, and if does, copy the length value.
I intend to first organize the grass in 1x1x1 blocks, and then do the O(n*n) brute force in those blocks, instead of the 8x8x8 block.
The issue lies in how to organize the grass by position. I’d like to try to do the whole thing in a single compute shader, but I am not sure how to go about it. The whole octree solution would probably force me to do tons of CPU<->GPU calls and queries for buffer sizes and whatnot, so here is the question:
Is there a way to do the whole procedure in a single kernel, or have a kernel call a ton of other kernels? A way to dispatch shaders from shaders?
I would move this question to the shader topic, you would have better chance to get answers, since there are people there who actually know something about shaders and it’s more rare here since it’s more of a game play scripting topic.
If you want to move your question, just hit the report on the bottom of your initial post and ask the moderator to move.
Apparently not, since Compute Shaders are actually used(at least on my case) to offload expensive gameplay functions rather than making something look nice, which is what most people in the Shaders forum are attempting to do.
After a quick look in Wikipedia, this seems to be great for datasets that have a fixed amount (for instance, you know that there is a single blade of grass every single position), but this is not the case for my data.
The maximum amount of grass in each 8x8x8 tile is 16,777,216 but there is hardly ever more than 8,000. And since they were added by a compute shader, the order is almost completely random.
To make matters worse, there is no way to assume where the ‘top left back 1/8th’ of the data set starts or ends, since it’s impossible to predict it’s size
Well then why it’s used for sparse voxel octree, keyword being sparse, look for nvidia and paper about it on gpu
Turns out I’m reading one that mention morton: https://serv.cusp.nyu.edu/~hvo/papers/sfc.pdf
probably not relevant but has a case of “sparse” but spatially relevant data
The thing is I’m reading a lot of paper who are all about bringing those incoherently produce data and gave them order with morton, I’m not a specialist though
Sounds like crazy mathmagic, this morton thing gives me plenty of ideas on how it can be used for arrays of data you already know how big they are, but it does not give me any idea on how to fix my current issue.
I could sort both sets with morton (or any sort, like x + y* 256+ z * 65536), then I could solve it in O(n), but I can’t use mult threading, and I also have no idea how to multi thread sorting on the GPU works
I think I will:
1: GPU: count how many blades of grass are in each 0.50.50.5 blocks (which would be written to a 4096 uint array)
2: CPU: allocate the needed ComputeBuffers (maybe cache them)
3: GPU: split the whole set in 4096 sets (please tell me AppendBuffer arrays are a thing, otherwise I will need to make in batches of 32 or something)
4: (Optional) GPU?: sort each of these sets
5: GPU: do each set linearly, but each set gets it’s own GPU thread
I might end up needing to make kernels for "0.50.50.5 blocks with [0,32) blades, 0.50.50.5 blocks with [32, 128), etc)
Sounds like a terrible solution, but if I can make it work in less than 4 frames outside the main thread, it should be good enough, for now.
Hopefully someone gives me a better solution
I am even starting to think that I will get more performance doing this on the CPU instead of going through all of these hoops
I kind of misread the original post anyway, I’m sure there is a solution with morton BUT why destroy the blade of grass? In zelda they left small nub, why not shrink the blade and have constant performance?
I admittedly don’t know too much about shaders, so if I say stupid, please, ignore it. But AFAIK they usually solve this problem with texture maps, no? So wouldn’t it be a solution for you to take the grass-size value or the grass-existence value from a texture map and just change the texture when you want to change the grass in certain areas? (Effectively just draw something on the texture)
Also it looks like the removal problem is similar to what they do for occlusion culling, watch the horizon zero down talk on this subject, it also explain a bit more about how to handle teh gpu.
Also morton code is design to have access that scale with gpu cache, hence why it’s becoming such widely used in gpu. Since gpu are organized in group of thread core, generally multiple of 2^n, morton allow to query all relevant data in cache friendly linearity
Although I could just not draw or keep it always cut, the blades of grass below a barn will never, for the rest of the game, most likely, ever grow again. Might as well remove them. If it were not for the lenght/bend values of all the other blades of grass in the same tile, this feature would already be finished
I’ve been procrastinating learning how occlusion culling, and even shadows, work for so long that I think I will never actually going and learning it
I think I have misunderstood you, but you absolute mad man, you might actually have solved it.
Each blade of grass can exist anywhere within 256x256y256z of a 888 tile.
Since they most often only occupy a single plane of this 3D space, I only store:
position uint(x, y, z, 0)
length float
bendDirection float
bendStrenght float
normalRotation float
This usually uses up 2562561*(45) bytes of video memory (1.25 Mega)
Storing this data in 3D would use up 256256256(44) bytes whopping 256 mega of video memory, that’s why i didn’t do it.
But a single 1x1x1 block would only take 161616(4*4), 64kb of memory.
So all I need to do to solve this is simple:
1 GPU: Query which of the 512 1x1x1 blocks within the 8x8x8 tile actually contain any grass (will be usually be between 12 and 28 of them)
2: CPU: Order the 12-28 operations of
2.a: go over all the old data, if it’s within the current 1x1x1 current block, save the data in 3D
2.b: go over all of the new data, if it’s within the current 1x1x1 block, read from the 3D data
2.a and 2.b does not requires me waiting any data from the GPU, that only happens once, when I check which blocks actually need to be saved in 3D
If no better idea comes up, I will check if this one works tomorrow