Why Unity URP Cluster use ConstantBuffer instead of StructuredBuffer?

  • Recently I was looking at the Unity URP source code, I found that in Tile/Cluster Render URP uses ConstanBuffers to store Tile/Cluster data:
CBUFFER_START(urp_ZBinBuffer)
        float4 urp_ZBins[MAX_ZBIN_VEC4S];
CBUFFER_END
CBUFFER_START(urp_TileBuffer)
        float4 urp_Tiles[MAX_TILE_VEC4S];
CBUFFER_END

A constant buffer is extremely fast if all threads in a warp access the same value. But, if all threads read from different spots the reads are serialized – a phenomenon called constant waterfalling, which makes the reading slow (causes headache for people doing bone animations). In all the scenarios you described above, every threads reads from the same address, so I’d go with constant buffers, except maybe for the third scenario (in case you have many, many lights).
Structured buffers on the other hand utilize the unified cache architecture, which means the first read is slow, but all subsequent reads are very fast (if the requested data is already in the cache).

Blockquote

So I think it’s better to use StructuredBuffer here,or is there something I don’t know about the URP that makes ConstantBuffer better here?

Ok… and I look at some more code,

  • I found a useStructured option for AdditionalLightDatas, so why not for TileBuffer and ZBinBuffer?

  • And, When I compiled Shader to the OpenGL3x platform code, there was a lot of type conversion from uint to int:



    these compiled from:

    So there are no performance problems with so many type conversions?

  • And When I try to get an element from an array using type uint as a index, I find that it is converted to type int:

cbuffer acbuffer
{
    float4 arrayDatas[1024];
}
uint index = ...
float data = arrayDatas[index];

// will be compiled to:
uint u_xlatu0;
int u_xlati0;

u_xlatu0 = ...
u_xlati0 = (int)u_xlatu0;
float data = arrayDatas[u_xlati0];

Is there any way to avoid this?

  • Are there any of these type conversion issues on the Vulkan platform?

As a premise, accessing constant buffers is significantly faster than accessing structured buffers.
Additionally, even with a large constant buffer array, it’s usually not an issue if the region accessed by a single warp (also known as a wavefront) is narrow.
They likely determined that there is locality between tiles and warps.

1 Like

oh,Thanks!
I took a closer look at Unity URP’s Tile + ZBin Cluster, They do use a number of methods to ensure as much continuous sampling as possible
About the type conversion efficiency compiled by Shader, I will do some comparative tests to verify when I have time
finaly, It would be nice if there was a way to make the compiled shader code to use uint directly as the Index of the Array