In this message however, youāre talking about the 32 bytes packing and 7 textures to sample at the pixel level. How can those 32 bytes/7textures hold in a single texel fetched from that 3D texture in VRAM?
I realize my message was a bit misleading. APV doesnāt just do 1 trilinear texture tap when looking up probe data, it does 3 if you are only using L1 SH or 7 if you are L2 SH. What I mean is that it only does trilinear texture taps, never manual interpolation. The textures are laid out like this:
// L0+L1:
Texture3D<half4> L0_L1Rx; // FP16
Texture3D<unorm float4> L1G_L1Ry; // R8G8B8A8
Texture3D<unorm float4> L1B_L1Rz; // R8G8B8A8
// L2:
Texture3D<unorm float4> L2_0; // R8G8B8A8
Texture3D<unorm float4> L2_1; // R8G8B8A8
Texture3D<unorm float4> L2_2; // R8G8B8A8
Texture3D<unorm float4> L2_3; // R8G8B8A8
This is quite VRAM efficient, that was my main point. Each probe takes up a total of 32 bytes, or just 16 bytes for L1 probes.
If you were to sample each probe individually and do manual trilinear interpolation, you would be doing 24-56 texture taps instead of 3-7.
at the cost of not even knowing where the probes are (which is needed for the DDGI approach).
You do know where the probes are with APV, they are just locked onto a grid, and you canāt easily reject contribution from only some of the probes in the neighborhood of a pixel, since you are always interpolating all 8 of them with trilinear filtering. Best you can do is warp the sampling position, thatās what APVās āleak reductionā does.
With typical DDGI approach, each probe is much larger, often 8x8 or 16x16 octahedral maps. The size will depend on the texture format you use. Original paper used R11G11B10F for irradiance and RG16F for depth moments. With just 8x8 maps, thatās 8x8x4x2 = 512 bytes per probe. And to interpolate this data, you must manually sample 8 such octahedral maps and calculate the trilinear weights. Itās just a tradeoff, one setup isnāt inherently better than the other. DDGI handles leaking better but also isnāt a silver bullet.
Iām also very curious about the SH L1 representation that Unity offers. How does it compare to L2 in terms of bandwith/texture taps (and any other relevant considerations)?
I explained it above for the case of APV. L1 SH is 4 coefficients per color channel, L2 SH is 9. For L1, you thus have 4x3=12 coefficients total. That can be stored exactly in 3 RGBA textures. Those coefficients should be HDR, but we use a trick introduced by some of the Frostbite guys to compress one of the textures to LDR range. It makes use of the fact that you can upper-bound the L1 terms using the L0 terms when storing irradiance. Iāll spare you the derivation, it basically boils down to this:
float3 CompressL1(float irradianceL0, float3 irradianceL1) {
return irradianceL1 * (sqrt(3) / (4 * irradianceL0)) + 0.5;
}
float3 DecompressL1(float irradianceL0, float3 compressedIrradianceL1) {
return (compressedIrradianceL1 - 0.5) * ((4 * irradianceL0) /
sqrt(3));
}
For L2 SH, thereās no such trick, the 5x3=15 additional coefficients are stored in 4 more HDR textures. So 3 vs 7 taps for L1 vs L2.
L1 is typically good enough for indirect lighting. Irradiance is a very āsmooth/blurryā signal, so you donāt need a lot of precision. You can think of L1 SH as roughly equivalent to storing a light color and a dominant light direction. This means it cannot represent more than 1 āpeakā, like if you have 2 strong light sources illuminating the same point in a dark room. L2 is better at that.