The symptoms are the classic black pixel cancer spreading across the screen likely propagated via bloom or dof, not entirely sure which since i don’t have a reliable reprocase i haven’t been able to verify which propagates it or if it’s something else. And the docs seem to make a presumption that the HDRP postprocesses don’t generate NaN’s, who knows maybe they’re right though i’m left wondering if they couldn’t given some ironious inputs from a shader graph.
The project makes very extensive use of custom shaders mostly in the form of shader graphs and some custom hlsl includes + extensive use of visual effect graphs. Around 95-99% of the shaders in project are either shader graphs or mostly graphs with a minor include here and there. The HDRP SRP is unedited and Not included as editable. (No stock Unity shaders present beoynd UI canvas elements)
But these are intentionally redefined in an include to override instancing vertex positions for an inhouse instancing indirect rendering system and simply consists of a custom TRS matrix copied from unity object transforms.
Figuring that i’m not going to find out the cause of our NaNs, i enabled Stop NaNs in the frame settings + camera, this still does not prevent the issue and NaNs infrequently still pop up in a live standalone player seemingly randomly and are smeared across the screen by PP.
Considering the Stop NaN pass is rather expensive and doesn’t even seem to work for it’s intended purpose i’m at a loss on how do i find the originating NaN preferably or how do i ensure the Stop NaN’s pass actually do it’s job and turn NaN’s black.
The “NaN Tracker” fullscreen debug mode available in the Rendering tab of the HDRP Debug menu (Window → Analysis → Rendering Debugger) will help you track down NaNs in your scenes.
It sounds like you have a lot going on in a large project
dealing with nans starts at a very low level, in cases about shaders in HDRP some common good practice can help such as saturate occlusion. If your using lot of blending in normal it’s also good to avoid nans after normalizing with mixedNormal.z += 1e-5f;
in short, this sort work needs to start at the beginning and be part of development all the way thru completion. even then, you will find a few surprises.
in some case changing transform direction world to tangents can produce nan also so it’s good practice to normalize division in complex transforms. yes, it adds a little complexity but it’s way better than producing a nan
In your case I suggest you may need to minimize turn off chucks/blocks of the project until you can in pin down what is influencing it and use the debug tools as much as possible to make it less painful
I did not mention it specifically but that is what we have been trying to do. Problem is that it occurs infrequently and we get one or two instances a week of it happening and often in situations where getting full captures i could debug is often impossible.
We do, there is a lot of good advice in your post. Let me detail our use cases a little bit more.
The shaders are essentially a shader graph recreation for HDRP shaders for the features we want and stripped of everything we don’t need for performance gains and also enforcing as few shader variants as possible (for which HDRP/Lit is horrible).
In addition to that we have a global effect where we shade portions of the world with a different set of textures, the normal blending for that is done exactly as you outline there, we don’t have 0 vectors in normals (well to my knowledge atleast but i was aware of this when creating the effects, but obviously our project has a NaN issue Somewhere so i’ll revisit that portion of it again just to be 110% sure).
Biggest problem is reproducibility and the NaN tracker seems a tad inadequate for the job for rare occurrences. Spending a full week doing nothing but walking around with it on doesn’t seem like a very productive course of action since there is no guarantee of success.
Guess what i’m really asking for is, do better methods of finding NaN issues exist than the renderpipeline debug view which not fit for debugging intermittent issues such as we’re experiencing.
EDIT: And ofcourse, the topic issue, why isn’t Stop NaN’s working. If i’m paying 0.3ms for it on an average players system, why even have it on if it doesn’t work?
For future-proofing your pipeline, I would certainly consider running immediate NaN tests after implementing custom features just to validate their integrity
with it being hard to reference manually right now and automation probably isn’t easy to implement for you either.
id suggest looking into some of the common areas where nans crop.
Terrain textures missing, Post FX ( they absolutely can NaN), and custom shaders.
Since you don’t have the greatest ability to know when then NaN propagates.
Where possible, try to isolate these three situations as it may give hints on where the NaN lives so you can hone in on it more.
as Slime73 mentioned then NaN tracker is great for non full screen NaNs, if that isn’t turning up with anything however, then it may well be a hint towards the NaN being a full-screen post effect or a custom pass shader, so again, where possible, try to isolate them to see if you still get the NaN.
a few months ago we had a Screen space nan with Chromatic Abberation.
This caused a nan all the time.
the ONLY way we found this was by setting the editor layout to default with the aspect to free aspect and then zooming with the scaling slider to see it.
There was no other way we could see this.
I’ve attached some videos of us stumbling on this.
Thanks, thats some solid advice, from what you said it would indicate ours is more likely a fullscreen effect issue and it certainly wouldn’t hurt to try that too. Thank you for the examples!
I haven’t tried this myself yet so maybe it wouldn’t work out, but something I’ve thought about is having a (toggleable) script which inserts something before postprocess effects - or maybe even at multiple stages of the rendering pipeline - to do sync or async readbacks of the color buffer and analyze the contents on the CPU to look for NaNs, and perhaps save a screenshot or even take a RenderDoc capture as well if it finds one.
It’d be a bit complicated to set up but might help quite a bit when tracking down hard-to-repro NaN issues.
One issue I’ve run into several times is HDRP’s use of smaller-than-the-RT viewports when rendering a camera can make it easy to accidentally sample outside the currently valid bounds within the RT, it could also be handy to ‘initialize’ RTs to all-NaNs when they’re first created, to make that sort of thing way more obvious when it happens.