Filtered VSM in Mobile, which RT format use?

I have successfully implemented a Filtered Variance Shadow Map in Unity, the pipeline is like this:

  • Create render texture ‘A’ and render texture ‘B’ with same parameters

  • For each frame

  • Render to texture ‘A’ depth moments values from light’s view (2 x 16 bits float values per texel)

  • Render to texture ‘B’ the horizontal blur of texture ‘A’ (“Ping”)

  • Render to texture ‘A’ the vertical blur of texture ‘B’ (“Pong”)

  • Render objects to the frame buffer using textura ‘A’ as input for depth comparison

Render textures are created with RGHalf format (2 x 16 bits float channels). It works great in Unity Editor, problems come with Mobile platforms. Usually RGHalf format is not compatible in Android devices (at least it’s not in my 4 devices). There is not much choice:

ARGB1555: True
ARGB2101010: True
ARGB32: True
ARGB4444: True
ARGB64: False
ARGBFloat: False
ARGBHalf: False
ARGBInt: True
BGR101010_XR: False
BGRA10101010_XR: False
BGRA32: False
Depth: True
R8: True
RFloat: False
RG16: True
RG32: False
RGB111110Float: False
RGB565: True
RGBAUShort: True
RGFloat: False
RGHalf: False
RGInt: True
RHalf: False
RInt: True
Shadowmap: True

So it seems that the only way to take is using a ARGB32 format to maximize compatibility and use Encode/Decode functions to transform RGBA → float2 and viceversa. It could be a decent solution if not having to to decode 9 samples in blur shader for each fragment (I’m using a 3x3 kernel) and then encode the final result.

Summarizing penalties:

  • 1 x Encode in depth shader

  • 9 x Decode + 1 x Encode in horizontal blur shader

  • 9 x Decode + 1 x Encode in vertical blur shader

  • 1 x Decode in final depth comparison

Can I choose a better solution?

ARGB2101010 might get you something usable. The alternative would be to use an RGInt which would still require some encoding / decoding, but might be cheaper (certainly less math, but the float to int conversion might be overall more expensive).

Encoding float values into multiple channels of an ARGB8888 is pretty darn common for mobile though, and the decoding isn’t really that expensive, obviously doing it that many times will start to have an impact. You may need to reduce the amount of blurring you’re doing.

Btw, a 3x3 kernel should only be 3 taps for each horizontal and vertical pass. “9 x decode” makes me think you’re doing the full 3x3 grid each time, which is counter to the whole separable pass idea.

You are right, it’s 3 taps on each pass, I typed it wrong :slight_smile:

Downsampling the ping pong RT to 1/4 original size works pretty nice and the final result is practically the same. 60 fps running on a ZTE A610. Got some light bleeding though, need to rework a bit the chebyshev function.

Thanks for the help bgolud!

1 Like