Sampling data from several uv coordinates in same texture

Is there a pre-made function that does this, given the user-defined offsets ?
As of now i use

float sampledZ = tex2Dgrad(tex, (centerUV) + float2(xOffset, yOffset), float2(0, 0), float2(0, 0))).r

Note that this is for depth texture from custom camera in orthographic mode with a frustum/near/far planes that exactly matches the size of stock unity cube, so basically no transformations are needed (i checked it and so far it works as intended).

Since i’m taking several samples from sorrounding nearby texels, i will repeat the above operation so i’m wondering :

  • if there is a function that does this in a single shot (filling up some kind of user-defined array of floats) ?
  • if i’m going to end up using tex2Dgrad() several times per vertex rendered in frag() part of shader, what is the performace impact ?

The answer is yes, and no.

Is there a function that you can pass an array of arbitrary positions or offsets and get back an array of values?
No.

Is there a function that you can get back more than one value from?
Yes! But…

HLSL has a function called Gather() which you can use to get back the value of 4 texels at once. But specifically it’s 4 neighboring texels, the four texels used for bilinear filtering at that specific UV position. The other two important bits of information about Gather() is it only returns a float4 with the red channel value of the four texels, and it only works with Direct3D 11 or better. For something like sampling a depth texture, getting only the red channel is fine, because that’s the only channel with data anyway. For the Direct3D 11 minimum requirement, that depends on your use case if that’s a problem or not.

If those caveats all work for you, then you’ll want your code to look a little like this:

// note, this is Texture2D and SamplerState, not a single sampler2D
Texture2D _CameraDepthTexture;
SamplerState sampler_CameraDepthTexture;
float4 _CameraDepthTexture_TexelSize;

// in the shader function
float2 uv = // sample position somewhere in the middle of 4 pixels
float4 depthValues = _CameraDepthTexture.Gather(sampler_CameraDepthTexture, uv);

// direct3d gather() returns samples counter clockwise starting in the bottom left
// but unity renders direct3D upside down, so the order for what's shows on screen is
// | x | y |
// | w | z |
// also note, it only ever samples the top mip

So lets say you want to sample 9 positions, a center texel and the 8 around it. You can do that with just four Gather() calls instead of nine tex2Dlod() calls, or two Gather() and two tex2Dlod(), though I honestly haven’t seen any difference in performance between those two options.

Also, note, if you don’t use Gather(), you should probably be using tex2Dlod() instead of tex2Dgrad().

If you want to see an example implementation, see this shader:
https://gist.github.com/bgolus/c3bc079a81c5b43e9830b98a0d7c32d6

3 Likes

As for the last question:

It depends. If all the samples are very close together in the texture, like the example of a single texel and the 8 surrounding values, the difference in performance vs a single sample is certainly measurable, though the difference between sampling only 4 of the surrounding values (either on the axis or diagonally) and all 8 is almost unmeasurable on modern GPUs. Lower end or mobile GPUs it may be a more significant impact. Really the answer is … it depends on what GPU you’re using, what else is happening in the scene / shader being rendered, how hot the device is, etc. The only way to answer that question is to try it and find out yourself.

2 Likes