Hi there! I have a few questions regarding shaders and I hope some use people with more experience can help me out. So without further ado, here they are:
1. Branching
When using branching, if each 2x2 pixel block is taking the same path we are good.
- What happens if that 2x2 block is discarded?
- What happens if it is overlapped by another mesh opaque or alpha cut mesh (like some grass).
Do we lose that optimization? Are those pixes shaded anyways even if overlapped?
2. Branching on high-ish-end mobile
In my tests, branching on mobile with uniform floats always seems to have a major impact on performance and I always avoided using branching. Unity seems to do the same in Birp/URP. Tested on Xiaomi Mi Mix 2, Snapdragon 835. Any ideas?
3. 2x2
Regarding branching, I got 2x2 from Jason Booth’s article: Branching on a GPU. If you consult the internet about… | by Jason Booth | Medium. Some articles say 8x8, some 32x32, some 64x64. Is this related to the architecture? Until today I only heard about 32 and 64.
4. Interpolators
In my shaders, I often use many calculations per vertex, but how the shaders are set, I use quite a few interpolators, 5 to 7 maybe. Let’s consider some simple math, like calculating a scale and offset for UVs and then passing it to the pixel shader. Can this interpolator be more expensive than the actual code?
5. Samplers
Let’s say I have 6 textures used for blending and the texture limit is not exceeded. What would you suggest and why in terms of performance:
a. Use 6 samplers, each tex with its own
b. Use 2 samplers for each Albedo, Normal, Mask pair
6. Noise texture vs math
Classic question. Using an uncompressed 256 noise texture sampled in world space vs a 3d noise (approx 80 math instructions when checked with Unity’s compile and show code), I get the same fps on mobile with a relatively complex scene. Some posts say sampling the texture is cheaper, I don’t see any difference in fps. Is sampling a tex that expensive or is the simple math used for noise that cheap?
7. Texture sampling size
Is sampling 32x32 pixel in a 32x32 texture as fast as sampling 32x32 pixels in a 4k texture the same in terms of performance?
8. Alpha Test
Why is Clip() that expensive that there is an option in HDRP to bypass it in Forward or Deferred pass, and used with early Z optimization in the Depth pass? And the difference is big if it is not performed in more passes. I see up to 10 fps increase if the bypass is enabled. Or it is related to early Z optimization?
999. Should I care that much?
In most cases what I noticed is that shaders are just super-fast and GPUs just too complex. I can throw a bunch of features in there and in most cases I don’t see much difference in performance. In most cases, I optimized shader by ear, trying to avoid too much texture sampling, using other textures as uv manipulation, Tring to add as much as possible to the vertex shader, and usually just being careful with what I do and how I can combine different features together to get the most out of my shaders. What is your approach, how do you debug performance?
Thanks. More will come for sure