Adaptive Probe Volume URP (Leaks & other weird behaviors)

Hi there, is the APV implementation in URP similar to HDRP?

As a first try, i am quite disoriented with the result to be honest.
The way it behave seems pretty weird.


The debugged probe color make no sense at all, and it seems that the leaks prevention measures are not effective at all.

A couple months ago i implemented GI Probes inside Built In Renderer.
Following Nvidia talk.

In similar environment, i am getting this result.

Here are few questions to the URP APV Team.

  1. Are Probes inside geometry being Invalidated?
    Seems like Yes when i am in Leak Prevention set to ā€œNoneā€ But not in ā€œValidity and Normal Basedā€

  2. Are you Light Probe to Face Normal dot product to reduce leaks? (Even if the answer is yes, it seems ineffective in ā€œValidity and Normal Basedā€ mode)

  3. Is depth and depth squared stored inside probes data? And are you using that to reduce leaks?

I am not criticizing your team work, Since i have experience with DDGI probes i just wanted to know more about the leak reduction prevention are why they seems to not be very effective?

Also it looks like HDRP have much more leaks prevention types.
Are these features coming to Unity6 anytime soon?

Best regards

Your probes look like they might be noisy. The sample counts in the Lighting Window (Direct, Indirect, Environment Samples) affect the probe bake. What are these set at?

APV has a few different way of mitigating leaks. Some work at bake time, some work at runtime.

At bake time, you have Virtual Offset (instead of baking invalid probes at their original position, push the position used for bake out of geometry), Dilation (fill invalid probes with data from valid neighboring probes), and Rendering Layer Masks (define different subsets of probes to sample from for each rendering layer). These are all configurable via the Lighting Window, in the Adaptive Probe Volume tab.

At runtime, you have Normal Bias (push sampling position along the normal), View Bias (push sampling position along the view direction), ā€œLeak Reductionā€ (avoid sampling invalid probes due to trilinear filtering, either by warping UVs or by doing filtering in software). These are all configurable via the volume system, under Adaptive Probe Volume Options.

So to try to answer your questions:

  • Are Probes inside geometry being Invalidated?
    • At bake time: Yes, if you have virtual offset enabled
    • At runtime: Yes, if you have leak reduction enabled
  • Are you Light Probe to Face Normal dot product to reduce leaks?
    • Not entirely sure what you mean here, perhaps you could elaborate? I think the closest we have are the aforementioned View Bias and Normal Bias settings.
  • Is depth and depth squared stored inside probes data? And are you using that to reduce leaks?
    • No. It was explored at one point, but decided against. Not entirely sure why, but I believe it was deemed to expensive memory budget wide to store the extra data.

Hello there.
Thanks a lot for the answer. I really appreciate your time.

Btw, i am using Unity 2023

My bake settings are respectively
direct 32
indirect 256
environment 256

Just a quick follow up question.
At Bake Time, If virtual Offset is not enabled, it will not invalidate probe inside geometry?

Regarding the Probe to Face Normal dot product let me explain.
When your surface shader pixel loop through all the surrounding probes to check which probe should contribute to lighting, if the ā€œProbe to Pixel directionā€ is similar to the pixel normal, it should not contribute.
It is called Smooth Backface in the NVidia talk

float weight = 1.0;
{
     //Smooth backface
     float3 directionToProbe = normalize(probePosition - trueSurfacePosition);
     float backfaceWeight = Square(max(0.0001, (dot(directionToProbe, surfaceNormal) + 1.0) * 0.5)) + 0.2;
     weight *= backfaceWeight;
}

In the NVidia talk here is the implementation

float backfaceWeight = (dot(directionToProbe, N) + 1) * 0.5;
weight *= backfaceWeight;

Regarding the depth and depth squared, its such a shame that you guys removed it from the implementation.
Could we have a checkbox option to decide if we want that in our game or not?
Yes, it use one additional texture to store the depth and depth squared, but its totally worth it.
Is the HDRP implementation different?

Potentially one side effect of that is the need of using way too much probes in the scene to achieve the same results.
You can get really amazing results with a pretty low probe count (large spacing between probes) if the leaks are prevented correctly.

Here is an example of my probe system. I am using a distance of 3 meters per probes.
I had also implemented the lerp between different probe scenario.

Would that be possible to talk to the APV team on a call or something to exchange about this topic?
Best regards,

During bake, if a probe sees enough backface to be considered invalid:

  • If Virtual Offset is enabled, we attempt to make the probe ā€˜valid’ by baking it at a different position. Virtual Offset mitigates the kind of leaking where dark probes inside of geometry leak their darkness close to surface of this geometry. After a probe has been ā€˜fixed’ it is considered valid from that point on.
  • If Virtual Offset is disabled, the probe remains in an invalid status. This invalid status can be used to inform runtime leak reduction.

Well, we don’t. One of the main pillars of APV’s design was to prioritize runtime performance across all the supported platforms. The probe data is stored in a big 3D texture atlas, with one texel per probe. When it comes time to sample the nearest 8 probes in shader, we use a single (hardware) trilinear texture tap. What you suggest would increase the amount of texture taps by a factor of 8.

It isn’t something we have plans to do, but you can always submit a feature request here and it’ll be considered.

To give some more context: It’s quite a large amount of extra data, relatively speaking. DDGI stores everything in octahedral maps. APV uses a much more compact representation. The irradiance signal takes up 32 bytes per probe (using a compressed spherical harmonic representation, if you are curious). For depth, we’d have to use something like octahedral maps as well. Assuming FP16 format and 8x8 maps, a single probe’s depth+depth^2 would take up 8x8x2x2=256 bytes. Relative to the existing data, this is massive. Keep in mind scenes using APV tend to have much denser probe grids than what you typically see with DDGI. For any realistic scene, the depth+depth^2 data would likely be around 85% of your VRAM budget.

Other than that, actually using the depth would force us to handle each neighboring probe individually - apropos what I mentioned about trilinear filtering earlier.

No, not really. The vast majority of the code is shared.

Sorry, this isn’t something we typically offer. If your employer uses Unity, you can go through enterprise support channels.

The probe system you’ve been teasing looks pretty neat, btw :wink: Out of curiosity, what kind of hardware are you targeting?

@Pema-Malling

Hello there i really appreciate your super detailed answer on this topic!

Well that’s an interesting implementation you guys have, even tho it sounds a bit like reinventing the wheel…
But i guess R&D is the duty of any tech leading company so that’s probably worth it to do research on this topic.
At the end of the day if your implementation work, that is all that matter!

Thank you very much!
It is made to work on mobile platform. We successfully tested baking on large scenes and probes blending with no specific issues.

Few details about my own implementation :
I am not storing 32 bytes irradiance signal, just a simple baked rgb8 UNorm of the irradiance result.
The reason for that is that a baked diffuse (Or diffuse ambient, not sure how to call that) rarely need values that exceeding 1. I think this is overkill.

Each probes is 8x8 Octahedral mapped.
Depth and Depth squared are also 8x8 RG32 SFloat

To be honest i could probably compress more, i haven’t really tested all that much different compression type at this point.


This scene use 2 1024x1024 maps, one for the color and one for the depth.


And i am barely using the full capacity of the map.

Compared to the old school baking of full lightmaps and how cumbersome it is to unwrap all the UVS etc, DDGI use less memory and no UV unwrapping for pretty stunning results.

Anyway, as i said, if you guys implementation is smooth and give great results, that is all that matter!

Good luck to your team :wink:

Just to clarify what I meant by ā€œ32 byte irradiance signalā€: We use a compressed spherical harmonics encoding that stores irradiance coming from each direction in 32 bytes per probe. That’s 32 bytes for the entire probe, not just 1 pixel of some larger texture. Spherical harmonics are a basis that can be used to encode spherical functions, and can be thought of as a more compact alternative to cubemaps or octahedral maps. L2 spherical harmonics requires 9 coefficients per color channel. We store these packed in 7 textures, 1 of them being FP16, and the others being R8G8B8A8, so (2 byte x 4 channel) + (6 tex x 1 byte x 4 channel) = 32 byte. If you used for example an 8x8 octahedral R8G8B8A8 map instead, that’d be 8x8x4=256 byte per probe, rather than 32. If you stored just a single RGB value instead, you’d get no directional information. Spherical harmonics are very well suited to storing irradiance since it’s a low frequency signal. If you tried to encode radiance, depth or occlusion instead, it wouldn’t really work, since those are too high frequency (for L2).

Thanks for sharing. Looks nice. I’ll jot some of your suggestions down as potential future research areas.

@Pema-Malling
Interesting! I tried that myself actually, but the math didn’t add up and i was unable to pack the L2 inside a texture.
That is definitely the best option and give the smoothest result!

Woaw, thank you and good luck to the team!

Hi Pema, I’m learning a lot from this post as I’m investigating the use of APVs for one of our projects.

I’m confused at these two interventions in this thread:

In here you mention the 3D texture and the single texture fetch that allows to interpolate the color data directly from that single, trilinearly filtered fetch.
What I understand is that the 3D texture is a direct representation of the 3D probe structure in the scene. Sampling that texture trilinearly result in sampling all probes at once (which is very efficient) at the cost of not even knowing where the probes are (which is needed for the DDGI approach).

In this message however, you’re talking about the 32 bytes packing and 7 textures to sample at the pixel level. How can those 32 bytes/7textures hold in a single texel fetched from that 3D texture in VRAM?

I’m also very curious about the SH L1 representation that Unity offers. How does it compare to L2 in terms of bandwith/texture taps (and any other relevant considerations)?

BTW: Huge fan of your work. Just found out about the your HLSL web interpreter/debugger, really neat stuff!

@jujunosuke

Your implementation looks absolutely stunning, great work!

I’m really interested in knowing more about your mileage with DDGI.

First question is how do you get these indirect shadows ? Is DDGI responsible for that, is it a side-effect of the visibility term computed from depth data, as seen in the talk? Nobody really talks about that but to me aside from leaking, the biggest problem with APV is capturing a sub-probe shadow signal (because it technically can’t).

For clarification, I’m talking about the sort of ambiant occlusion that we can see in your gifs, especially in the statue. Do you use any other technique in association with DDGI to capture that level of detail?

Was this gif built and tested on mobile platform or are you talking about the baking step being successful here?
I wonder how DDGI holds up for mobile as DDGI seems more expensive in it’s approach as APVs.

In this message however, you’re talking about the 32 bytes packing and 7 textures to sample at the pixel level. How can those 32 bytes/7textures hold in a single texel fetched from that 3D texture in VRAM?

I realize my message was a bit misleading. APV doesn’t just do 1 trilinear texture tap when looking up probe data, it does 3 if you are only using L1 SH or 7 if you are L2 SH. What I mean is that it only does trilinear texture taps, never manual interpolation. The textures are laid out like this:

// L0+L1:
Texture3D<half4> L0_L1Rx;         // FP16
Texture3D<unorm float4> L1G_L1Ry; // R8G8B8A8
Texture3D<unorm float4> L1B_L1Rz; // R8G8B8A8

// L2:
Texture3D<unorm float4> L2_0;     // R8G8B8A8
Texture3D<unorm float4> L2_1;     // R8G8B8A8
Texture3D<unorm float4> L2_2;     // R8G8B8A8
Texture3D<unorm float4> L2_3;     // R8G8B8A8

This is quite VRAM efficient, that was my main point. Each probe takes up a total of 32 bytes, or just 16 bytes for L1 probes.

If you were to sample each probe individually and do manual trilinear interpolation, you would be doing 24-56 texture taps instead of 3-7.

at the cost of not even knowing where the probes are (which is needed for the DDGI approach).

You do know where the probes are with APV, they are just locked onto a grid, and you can’t easily reject contribution from only some of the probes in the neighborhood of a pixel, since you are always interpolating all 8 of them with trilinear filtering. Best you can do is warp the sampling position, that’s what APV’s ā€œleak reductionā€ does.

With typical DDGI approach, each probe is much larger, often 8x8 or 16x16 octahedral maps. The size will depend on the texture format you use. Original paper used R11G11B10F for irradiance and RG16F for depth moments. With just 8x8 maps, that’s 8x8x4x2 = 512 bytes per probe. And to interpolate this data, you must manually sample 8 such octahedral maps and calculate the trilinear weights. It’s just a tradeoff, one setup isn’t inherently better than the other. DDGI handles leaking better but also isn’t a silver bullet.

I’m also very curious about the SH L1 representation that Unity offers. How does it compare to L2 in terms of bandwith/texture taps (and any other relevant considerations)?

I explained it above for the case of APV. L1 SH is 4 coefficients per color channel, L2 SH is 9. For L1, you thus have 4x3=12 coefficients total. That can be stored exactly in 3 RGBA textures. Those coefficients should be HDR, but we use a trick introduced by some of the Frostbite guys to compress one of the textures to LDR range. It makes use of the fact that you can upper-bound the L1 terms using the L0 terms when storing irradiance. I’ll spare you the derivation, it basically boils down to this:

float3 CompressL1(float irradianceL0, float3 irradianceL1) {
    return irradianceL1 * (sqrt(3) / (4 * irradianceL0)) + 0.5;
}
float3 DecompressL1(float irradianceL0, float3 compressedIrradianceL1) {
    return (compressedIrradianceL1 - 0.5) * ((4 * irradianceL0) /
sqrt(3));
}

For L2 SH, there’s no such trick, the 5x3=15 additional coefficients are stored in 4 more HDR textures. So 3 vs 7 taps for L1 vs L2.

L1 is typically good enough for indirect lighting. Irradiance is a very ā€˜smooth/blurry’ signal, so you don’t need a lot of precision. You can think of L1 SH as roughly equivalent to storing a light color and a dominant light direction. This means it cannot represent more than 1 ā€˜peak’, like if you have 2 strong light sources illuminating the same point in a dark room. L2 is better at that.

Thank you so much for that reply, it’s such high quality information.

For the context of my needs, my project needs to ship both on PC (high visual quality needed) and mobile (Meta Quest 3 so very-low end in comparison and bandwith will need attention).

APVs seemed like a really good candidate for lighting because we also need to have Time of Day feature as well as dynamic weather.

At a first glance, APVs have:

  • much Lower bandwidth than lightmaps
  • probe density bakes can be scaled per-platform
  • light scenario blending is god-sent for Time of Day/Weather

Using that technique for both platforms could be a huge gain for the production.

The real problem I faced is that APVs can’t really replace lightmaps IMO because I can’t get that static shadows information, which is a higher-frequency signal than the probe grid (+ shadows can be diagonals to the grid).

There’s a bunch of videos online with really good-looking APVs setup, but I’m starthing to think that they’re always using other techniques (SSGI ?) in combination with APVs to make up for the lack of finer static GI information (which honestly, makes sense).

I guess the question is, if you were to try and replace lightmaps by APVs, how would you handle that problem?
Because if we’re using dynamic lights for shadows, that can’t really run on mobile since we’re in Forward rendering.

I might interpret here but it sounds like you have experience with DDGI, did you or the RnD team prototype it by any chance? If that’s the case, I’m trying to understand how do they get those indirect shadows which is a sub-probe frequency signal.
Do you have any idea on how that works?

That 24-56 tap is only if we’re using SH representations for color, right? And actually that’s just color. Pretty sure SH is too low-frequency for the Depth signal so you would add another 8 taps for the octahedral map depth if I’m not mistaken.

If we’re talking about octahedral maps, we’re back to 8*2 = 16 samples per pixel.
But granted, it’s a tradeoff:

  • Memory:
    • DDGI: 512b
    • APVs: 32b (so a 16x increase)
    • However, DDGI claims to need much less spacing between probes. And since that’s a x^3 growing factor, needing 3x less spacing would actually means up to 27x less probes/memory.
      I just can’t really believe how 3m-spaced-probes can give off such a good result, but it’s good to keep in mind that DDGI uses less probes.
  • Taps:
    • DDGI: 16
    • APVs: 3 or 7
  • ALUs:
    • DDGI: I guess many more ALUs since it’s doing trilinear interpolation manually + computing the visibility term per probe (so that’s done 8 times).
    • APVs: Not sure about the ALUs overhead here. I guess simply decompressing the SH in the fragment shader, but it does not look that expensive?

What do you mean by warping the position? And what are the criterias for warping it?

Thanks a lot for your time!

To illustrate what I mean with the shadows and the sub-probe frequency problem.

In here we can clearly see the voxelized aspect of the grid because we’re in a high-contrast lighting environment and APVs are the sole technique in the lighting used.

Is there anything I’m inherently doing wrong here?
How can I replace lightmaps with APVs in a scenario such as this one?

The real problem I faced is that APVs can’t really replace lightmaps IMO because I can’t get that static shadows information, which is a higher-frequency signal than the probe grid (+ shadows can be diagonals to the grid).

I guess the question is, if you were to try and replace lightmaps by APVs, how would you handle that problem?
Because if we’re using dynamic lights for shadows, that can’t really run on mobile since we’re in Forward rendering.

This isn’t specific to APV, it’s inherent to probe-grid-based solutions. Shadows are too high frequency to be represented well by a low resolution probe grid. It’s like using an extremely low resolution lightmap, you’d get similar aliasing.

As you’ve pointed out, APV can store occlusion / shadow information in probes. The primary purpose for this is to provide low-quality fallback shadows for distant objects when using distance shadowmask mode.

For crisp shadows, you’ll need to use lightmaps, realtime lights (shadow maps) or something like raytraced shadows. APV is only really designed to provide indirect illumination.

I might interpret here but it sounds like you have experience with DDGI, did you or the RnD team prototype it by any chance? If that’s the case, I’m trying to understand how do they get those indirect shadows which is a sub-probe frequency signal.
Do you have any idea on how that works?

I’m familiar with the algorithm. DDGI provides indirect (global) illumination (it’s in the name). For direct shadows you’d typically use shadowmaps or raytraced shadows. The main innovation of DDGI is using stored depth moments to mitigate leaking between probes, for example through walls. One shouldn’t confuse that with shadows.

That 24-56 tap is only if we’re using SH representations for color, right? And actually that’s just color. Pretty sure SH is too low-frequency for the Depth signal so you would add another 8 taps for the octahedral map depth if I’m not mistaken.

Yes. Storing depth or depth^2 in low order SH is a terrible idea.

What do you mean by warping the position? And what are the criterias for warping it?

I mean instead of sampling each probe individually and manually interpolating, you can take the position that you would have sampled the grid at, and ā€˜shift’ it a bit, biasing it away from bad probes, based on some heuristic. So you still use hardware trilinear interpolation, but you influence the results by changing the interpolation position. APV stores a 1bit validity term for each probe for this, which indicates if the probe sees too many backfaces to be considered valid, like probes inside of walls.

You can look at the code here (Graphics/Packages/com.unity.render-pipelines.core/Runtime/Lighting/ProbeVolume/ProbeVolume.hlsl at master Ā· Unity-Technologies/Graphics Ā· GitHub and Graphics/Packages/com.unity.render-pipelines.core/Runtime/Lighting/ProbeVolume/ProbeVolume.hlsl at master Ā· Unity-Technologies/Graphics Ā· GitHub).

Is there anything I’m inherently doing wrong here?
How can I replace lightmaps with APVs in a scenario such as this one?

No. It’s just a limitation of this kind of algorithm, you aren’t crazy. Use lightmaps or real time lights. Eventually we’d like to add cached shadow maps to URP which would also be a nice option. I think there are some 3rd party solutions for that too.

Thanks a lot, this really made everything click for me.
Your time and knowledge is highly appreciated! :pray:

SSGI does smooth it out. APV is the ā€œcheapestā€ baked lighting (I don’t count light probes) so the quality is low. The upsides are fast bake times, memory, and pixel lighting, meaning you can get specular/normals. So, basically, if you are unwilling to setup for lightmaps and need per-pixel. Let’s be real - it’s for mobile games that are OK with Half-Life 1 lighting.

Thanks for the explanation and the illustrative screenshots!