Unite 2012 Rendering Talk

Hi everyone,

Kuba and I gave a talk at Unite yesterday about getting rad fast performance with the Unity rendering pipeline. You should really check out the slides / project files.

The video will be uploaded at some stage (out of my hands), and when it is I’ll post it here.

Welp, there’s my weekend gone! Awesome guys, thanks :slight_smile:

Not much in-depth information but a nice overview :slight_smile:

And your pragma multi compile reminds me… http://forum.unity3d.com/threads/104060-Shader-defines
http://forum.unity3d.com/threads/148548-Built-in-Shader-Variables-Documentation

So, I know of multi_compile_fwdbase and now, this “multi_compile AAA BBB”. Any more multi_compile stuff? If so, how would I have learned about it?

So internally multi_compile_fwdbase is just an optimisation. What it does is just only compile ‘sensible’ combinations of the base shader. That is: if you have lightmapping, don’t compile directional light ect. You could do this with standard multi_compile but it would leave you with some invalid shader combinations from a pipeline perspective (they would compile and be executable… it just would not make a lot of sense to render with them). We just don’t compile them to reduce build size…

Lets have a look at what the others are:
multi_compile_prepassfinal
multi_compile_shadowcaster
multi_compile_shadowcollector
multi_compile_fwdbase
multi_compile_fwdadd
multi_compile_fwdadd_fullshadows
multi_compile_fwdbasealpha
multi_compile_lightpass
multi_compile_particles

I have not experimented or looked into what each one does, but if you look at the pass type and the defines in a #debug shader it will get you part of the way towards figuring out which defines would be enabled / disabled for the passes. These defines will also be set for each pass of a #debug shader so you should be able to see whats used :slight_smile:

Just out of the topic, Does Unity4 bring texture sampling in dynamic if blocks?

Exact same shader works with many engines out there but just cant make it work with unity because of this issue.

I don’t see how that would be possible. You can’t do a plain old tex2D inside a dynamic branch, period (in any engine).

What you can do (and that does not require Unity 4.0), is sample a texture with your own mip level, e.g. tex2Dlod. Just make sure to add “#pragma glsl” to your shader so it will compile for OpenGL properly (since the default profile does not have that instruction).

Oh, my bad i used HLSL on other engines obviously.

So, let me ask again then. Does/Will Unity 4 support HLSL kind of flow control?

Read my answer again :wink: HLSL, Cg, GLSL is irrelevant. You can not do a tex2D inside a dynamic branch, period. It’s an undefined operation (why? because the GPU won’t be able to compute derivatives for mipmapping).

The only kind of texture sampling you can do is when you indicate the texture mip level yourself.

As to “when will Unity use HLSL-style something” - Unity already uses HLSL (for DX11 Xbox360). As well as Cg (for D3D9, OpenGL PS3). As well as HLSL2GLSL + GLSL Optimizer (for mobile) and so on. Shading language is irrelevant; you can’t do texture samples inside of dynamic branches.

Well, i am confused now. This msdn page says:

What i am trying to do is apply the technique shown here with Unity, as it works in jmonkeyengine and xna version.

The problemeatic portion of the shader is;

As you see there is alot of if else going on there and this guy samples a texture depending on every situation which i cant make it work with Unity.

So, how to make this work with Unity?

OK, I’m lost. Such a restriction might exist on some GPUs but as far as I know there is no such restriction on most of today’s GPUs. But maybe I’m misunderstanding something. I made a small example in GLSL for what I consider “texture samples inside of dynamic branches” and the mipmapping appears to work fine:

Shader "GLSL shader with conditional texture lookup" {
   Properties {
      _MainTex ("Texture Image", 2D) = "white" {} 
      _Param ("Parameter", Range(0,1)) = 0.5
   }
   SubShader {
      Pass {    
         GLSLPROGRAM
 
         uniform sampler2D _MainTex;    
         uniform float _Param;
 
         varying vec4 textureCoordinates; 
 
         #ifdef VERTEX
 
         void main()
         {
            textureCoordinates = gl_MultiTexCoord0;
            gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
         }
 
         #endif
 
         #ifdef FRAGMENT
 
         void main()
         {
            if (_Param < 0.5)
            {
	            gl_FragColor = 
	               texture2D(_MainTex, vec2(textureCoordinates));   
            }
            else
            {
	            gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0);
            }

         }
 
         #endif
 
         ENDGLSL
      }
   }
}

As far as I understand it, this works because GPUs usually process fragments in small groups (minimum 4, up to something like 64 I guess) for which the fragment programs are processed in lock-step, i.e. all fragments of each group go through each conditional block together and the partial derivatives can be approximated for all variables using the neighboring fragments. If some fragments of such a group need to take one path and other fragments of the same group need to take another path, then all fragments take both paths. At least that is my understanding of how GPUs work.

i didnt try but im sure it wont work if you write it like this:

But it would work if you define the vec4 before the if else though.

Of course, it doesn’t work, “vec4” is a reserved keyword in GLSL.
Of course, you have to define variables before you use them in GLSL.
What’s your point?

For reference, this works fine:

Shader "GLSL shader with conditional texture lookup" {
   Properties {
      _FirstTex ("Texture Image", 2D) = "white" {} 
      _SecondTex ("2nd Texture Image", 2D) = "white" {}
      _Param ("Parameter", Range(0,1)) = 0.5
   }
   SubShader {
      Pass {    
         GLSLPROGRAM
 
         uniform sampler2D _FirstTex;    
         uniform sampler2D _SecondTex;    
         uniform float _Param;
 
         varying vec4 texCoords; 
 
         #ifdef VERTEX
 
         void main()
         {
            texCoords = gl_MultiTexCoord0;
            gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
         }
 
         #endif
 
         #ifdef FRAGMENT
 
         void main()
         {
            vec4 vec;
            if (_Param < 0.5)
            {
               vec = texture2D(_FirstTex, vec2(texCoords));
            }
            else
            {
               vec = texture2D(_SecondTex, vec2(texCoords));
            }

            gl_FragColor = vec; 
         }
 
         #endif
 
         ENDGLSL
      }
   }
}

i meant the below wouldnt work with Unity as it doesnt work for me:

but if you define it outside the if statement and change the variable inside the if statement, it works just like your refference.

Your code wouldn’t work in any language with scoped variable definitions, including C, C++, Java, …

Below you can find the whole source code of the mentioned shader from NGene engine which defies every rule you are telling me. And it works.

// Water pixel shader
// Copyright (C) Wojciech Toman 2009

sampler heightMap: register(s0);
sampler backBufferMap: register(s1);
sampler positionMap: register(s2);
sampler normalMap: register(s3);
sampler foamMap: register(s4);
sampler reflectionMap: register(s5);

// We need this matrix to restore position in world space
float4x4 matViewInverse;

// Level at which water surface begins
float waterLevel = 0.0f;

// Position of the camera
float3 cameraPos;

// How fast will colours fade out. You can also think about this
// values as how clear water is. Therefore use smaller values (eg. 0.05f)
// to have crystal clear water and bigger to achieve "muddy" water.
float fadeSpeed = 0.15f;

// Timer
float timer;

// Normals scaling factor
float normalScale = 1.0f;

// R0 is a constant related to the index of refraction (IOR).
// It should be computed on the CPU and passed to the shader.
float R0 = 0.5f;

// Maximum waves amplitude
float maxAmplitude = 1.0f;

// Direction of the light
float3 lightDir = {0.0f, 1.0f, 0.0f};

// Colour of the sun
float3 sunColor = {1.0f, 1.0f, 1.0f};

// The smaller this value is, the more soft the transition between
// shore and water. If you want hard edges use very big value.
// Default is 1.0f.
float shoreHardness = 1.0f;

// This value modifies current fresnel term. If you want to weaken
// reflections use bigger value. If you want to empasize them use
// value smaller then 0. Default is 0.0f.
float refractionStrength = 0.0f;

// Modifies 4 sampled normals. Increase first values to have more
// smaller "waves" or last to have more bigger "waves"
float4 normalModifier = {1.0f, 2.0f, 4.0f, 8.0f};

// Strength of displacement along normal.
float displace = 1.7f;

// Describes at what depth foam starts to fade out and
// at what it is completely invisible. The third value is at
// what height foam for waves appear (+ waterLevel).
float3 foamExistence = {0.65f, 1.35f, 0.5f};
// another nice values for the same thing are:
// float2 foamExistence = {0.35f, 0.65f, 0.5f};

float sunScale = 3.0f;

float4x4 matReflection =
{
	{0.5f, 0.0f, 0.0f, 0.5f},
	{0.0f, 0.5f, 0.0f, 0.5f},
	{0.0f, 0.0f, 0.0f, 0.5f},
	{0.0f, 0.0f, 0.0f, 1.0f}
};

float4x4 matViewProj;

float shininess = 0.7f;
float specular_intensity = 0.32;

// Colour of the water surface
float3 depthColour = {0.0078f, 0.5176f, 0.7f};
// Colour of the water depth
float3 bigDepthColour = {0.0039f, 0.00196f, 0.145f};
float3 extinction = {7.0f, 30.0f, 40.0f};			// Horizontal

// Water transparency along eye vector.
float visibility = 4.0f;

// Increase this value to have more smaller waves.
float2 scale = {0.005f, 0.005f};
float refractionScale = 0.005f;

// Wind force in x and z axes.
float2 wind = {-0.3f, 0.7f};


// VertexShader results
struct VertexOutput
{
	float4 position : POSITION0;
	float2 texCoord : TEXCOORD0;
};

struct PS_OUTPUT
{
	float4 diffuse: COLOR0;
	float4 normal: COLOR1;
	float4 position: COLOR2;
};


float3x3 compute_tangent_frame(float3 N, float3 P, float2 UV)
{
	float3 dp1 = ddx(P);
	float3 dp2 = ddy(P);
	float2 duv1 = ddx(UV);
	float2 duv2 = ddy(UV);
	
	float3x3 M = float3x3(dp1, dp2, cross(dp1, dp2));
	float2x3 inverseM = float2x3( cross( M[1], M[2] ), cross( M[2], M[0] ) );
	float3 T = mul(float2(duv1.x, duv2.x), inverseM);
	float3 B = mul(float2(duv1.y, duv2.y), inverseM);
	
	return float3x3(normalize(T), normalize(B), N);
}

// Function calculating fresnel term.
// - normal - normalized normal vector
// - eyeVec - normalized eye vector
float fresnelTerm(float3 normal, float3 eyeVec)
{
#ifdef SIMPLIFIED_FRESNEL
		// Simplified
		return R0 + (1.0f - R0) * pow(1.0f - dot(eyeVec, normal), 5.0f);
#else		
		float angle = 1.0f - saturate(dot(normal, eyeVec));
		float fresnel = angle * angle;
		fresnel = fresnel * fresnel;
		fresnel = fresnel * angle;
		return saturate(fresnel * (1.0f - saturate(R0)) + R0 - refractionStrength);
#endif
}

float4 main(VertexOutput IN): COLOR0
{
	float3 color2 = tex2D(backBufferMap, IN.texCoord).rgb;
	float3 color = color2;
	
	float3 position = mul(float4(tex2D(positionMap, IN.texCoord).xyz, 1.0f), matViewInverse).xyz;
	float level = waterLevel;
	float depth = 0.0f;

	
	// If we are underwater let's leave out complex computations
	if(level >= cameraPos.y)
	{
#ifdef USE_UNDERWATER

		depth = length(position - cameraPos);
		float depthN = depth * fadeSpeed;
		float depth2 = level - cameraPos.y;
		
		float3 waterCol = saturate(length(sunColor) / sunScale);
		waterCol = waterCol * lerp(depthColour, bigDepthColour, saturate(depth2 / extinction));
			
		if(position.y <= level)
		{
			color2 = color2 - color2 * saturate(depth2 / extinction);
			color = lerp(color2, waterCol, saturate(depthN / visibility));
		}
		else
		{
			float3 eyeVec = position - cameraPos;	
			float3 eyeVecNorm = normalize(eyeVec);
			float t = (level - cameraPos.y) / eyeVecNorm.y;
			float3 surfacePoint = cameraPos + eyeVecNorm * t;
			
			eyeVecNorm = normalize(eyeVecNorm);
			depth = length(surfacePoint - cameraPos);
			float depthN = depth * fadeSpeed;
			
			float depth2 = level - cameraPos.y;
			
			float2 texCoord = 0;
			texCoord = IN.texCoord.xy;
			texCoord.x += sin(timer * 0.002f + 3.0f * abs(position.y)) * (refractionScale);
			color2 = tex2D(backBufferMap, texCoord).rgb;
			
			color2 = color2 - color2 * saturate(depth2 / extinction);
			color = lerp(color2, waterCol, saturate(depthN / visibility));
			
			float3 myNormal = normalize(float3(0.0f, 1.0f, 0.0f));
		
			texCoord = surfacePoint.xz * 1.6 + wind * timer * 0.00016;
			float3x3 tangentFrame = compute_tangent_frame(myNormal, eyeVecNorm, texCoord);
			float3 normal0a = normalize(mul((2.0f * tex2D(normalMap, texCoord) - 1.0f).xyz, tangentFrame).xyz);
	
			texCoord = surfacePoint.xz * 0.8 + wind * timer * 0.00008;
			tangentFrame = compute_tangent_frame(myNormal, eyeVecNorm, texCoord);
			float3 normal1a = normalize(mul((2.0f * tex2D(normalMap, texCoord) - 1.0f).xyz, tangentFrame).xyz);
			
			texCoord = surfacePoint.xz * 0.4 + wind * timer * 0.00004;
			tangentFrame = compute_tangent_frame(myNormal, eyeVecNorm, texCoord);
			float3 normal2a = normalize(mul((2.0f * tex2D(normalMap, texCoord) - 1.0f).xyz, tangentFrame).xyz);
			
			texCoord = surfacePoint.xz * 0.1 + wind * timer * 0.00002;
			tangentFrame = compute_tangent_frame(myNormal, eyeVecNorm, texCoord);
			float3 normal3a = normalize(mul((2.0f * tex2D(normalMap, texCoord) - 1.0f).xyz, tangentFrame).xyz);
			
			float3 normal = normalize(normal0a * normalModifier.x + normal1a * normalModifier.y +
									  normal2a * normalModifier.z + normal3a * normalModifier.w);
									  
			float3 mirrorEye = (2.0f * dot(eyeVecNorm, normal) * normal - eyeVecNorm);
			float dotSpec = saturate(dot(mirrorEye.xyz, -lightDir) * 0.5f + 0.5f);
			float3 fresnel = 0;
			float3 specular = (1.0f - fresnel) * saturate(-lightDir.y) * ((pow(dotSpec, 512.0f)) * (shininess * 1.8f + 0.2f))* sunColor;
			specular += specular * 25 * saturate(shininess - 0.05f) * sunColor;
		}
		
		return float4(color, 1.0f);
#else
		return float4(color2, 1.0f);
#endif
	}
	
	if(position.y <= level + maxAmplitude)
	{
		float3 eyeVec = position - cameraPos;
		float diff = level - position.y;
		float cameraDepth = cameraPos.y - position.y;
		
		// Find intersection with water surface
		float3 eyeVecNorm = normalize(eyeVec);
		float t = (level - cameraPos.y) / eyeVecNorm.y;
		float3 surfacePoint = cameraPos + eyeVecNorm * t;
		
		eyeVecNorm = normalize(eyeVecNorm);
		
		float2 texCoord;

#ifdef USE_WAVES
		for(int i = 0; i < 10; ++i)
		{
			texCoord = (surfacePoint.xz + eyeVecNorm.xz * 0.1f) * scale + timer * 0.000005f * wind;
			
			float bias = tex2D(heightMap, texCoord).r;
	
			bias *= 0.1f;
			level += bias * maxAmplitude;
			t = (level - cameraPos.y) / eyeVecNorm.y;
			surfacePoint = cameraPos + eyeVecNorm * t;
		}
#endif
		
		depth = length(position - surfacePoint);
		float depth2 = surfacePoint.y - position.y;
		
		eyeVecNorm = normalize(cameraPos - surfacePoint);
		
#ifdef USE_WAVES
		float normal1 = tex2D(heightMap, (texCoord + float2(-1, 0) / 256)).r;
		float normal2 = tex2D(heightMap, (texCoord + float2(1, 0) / 256)).r;
		float normal3 = tex2D(heightMap, (texCoord + float2(0, -1) / 256)).r;
		float normal4 = tex2D(heightMap, (texCoord + float2(0, 1) / 256)).r;
		
		float3 myNormal = normalize(float3((normal1 - normal2) * maxAmplitude,
										   normalScale + 50 * normalScale * saturate(0.15f - dot(eyeVecNorm, float3(0.0f, 1.0f, 0.0f))),
										   (normal3 - normal4) * maxAmplitude));   
		
		texCoord = surfacePoint.xz * 1.6 + wind * timer * 0.00016;
		float3x3 tangentFrame = compute_tangent_frame(myNormal, eyeVecNorm, texCoord);
		float3 normal0a = normalize(mul((2.0f * tex2D(normalMap, texCoord) - 1.0f).xyz, tangentFrame));

		texCoord = surfacePoint.xz * 0.8 + wind * timer * 0.00008;
		tangentFrame = compute_tangent_frame(myNormal, eyeVecNorm, texCoord);
		float3 normal1a = normalize(mul((2.0f * tex2D(normalMap, texCoord) - 1.0f).xyz, tangentFrame));
		
		texCoord = surfacePoint.xz * 0.4 + wind * timer * 0.00004;
		tangentFrame = compute_tangent_frame(myNormal, eyeVecNorm, texCoord);
		float3 normal2a = normalize(mul((2.0f * tex2D(normalMap, texCoord) - 1.0f).xyz, tangentFrame));
		
		texCoord = surfacePoint.xz * 0.1 + wind * timer * 0.00002;
		tangentFrame = compute_tangent_frame(myNormal, eyeVecNorm, texCoord);
		float3 normal3a = normalize(mul((2.0f * tex2D(normalMap, texCoord) - 1.0f).xyz, tangentFrame));
		
		float3 normal = normalize(normal0a * normalModifier.x + normal1a * normalModifier.y +
								  normal2a * normalModifier.z + normal3a * normalModifier.w);
#else
		float3 normal = float3(0.0f, 1.0f, 0.0f);
#endif
		
		texCoord = IN.texCoord.xy;
		texCoord.x += sin(timer * 0.002f + 3.0f * abs(position.y)) * (refractionScale * min(depth2, 1.0f));
		float3 refraction = tex2D(backBufferMap, texCoord).rgb;
		if(mul(float4(tex2D(positionMap, texCoord).xyz, 1.0f), matViewInverse).y > level)
			refraction = color2;

		float4x4 matTextureProj = mul(matViewProj, matReflection);
				
		float3 waterPosition = surfacePoint.xyz;
		waterPosition.y -= (level - waterLevel);
		float4 texCoordProj = mul(float4(waterPosition, 1.0f), matTextureProj);
		
#ifdef USE_WAVES
		float4 dPos;
		dPos.x = texCoordProj.x + displace * normal.x;
		dPos.z = texCoordProj.z + displace * normal.z;
		dPos.yw = texCoordProj.yw;
		texCoordProj = dPos;
#endif
		
		float3 reflect = tex2Dproj(reflectionMap, texCoordProj).xyz;
		

		float fresnel = fresnelTerm(normal, eyeVecNorm);

		
		float3 depthN = depth * fadeSpeed;
#ifdef NO_BIG_DEPTH
		float3 waterCol = depthColour - saturate(depthColour * depth2 / extinction);
		/// @todo check if / 3.0 below is not a better solution
		waterCol = saturate(saturate(length(sunColor) / 2.0f) * waterCol);
		refraction = lerp(refraction, waterCol, saturate(depthN / visibility));
#else
		float3 waterCol = lerp(depthColour, bigDepthColour, saturate(depth2 / extinction));
		/// @todo check if / 3.0 below is not a better solution
		waterCol = saturate(saturate(length(sunColor) / 2.0f) * waterCol);

		waterCol = saturate(length(sunColor) / sunScale);
		refraction = lerp(lerp(refraction, depthColour * waterCol, saturate(depthN / visibility)),
						  bigDepthColour * waterCol, saturate(depth2 / extinction));
#endif

		float foam = 0.0f;		
#ifdef USE_FOAM
		texCoord = (surfacePoint.xz + eyeVecNorm.xz * 0.1) * 0.05 + timer * 0.00001f * wind + sin(timer * 0.001 + position.x) * 0.005;
		float2 texCoord2 = (surfacePoint.xz + eyeVecNorm.xz * 0.1) * 0.05 + timer * 0.00002f * wind + sin(timer * 0.001 + position.z) * 0.005;
		
		if(depth2 < foamExistence.x)
			foam = (tex2D(foamMap, texCoord) + tex2D(foamMap, texCoord2)).x * 0.5f;
		else if(depth2 < foamExistence.y)
		{
			foam = (lerp((tex2D(foamMap, texCoord) + tex2D(foamMap, texCoord2)) * 0.5f, 0.0f,
						 (depth2 - foamExistence.x) / (foamExistence.y - foamExistence.x))).x;
			
		}
		
		if(maxAmplitude - foamExistence.z > 0.0001f)
		{
			foam += ((tex2D(foamMap, texCoord) + tex2D(foamMap, texCoord2)) * 0.5f * 
				saturate((level - (waterLevel + foamExistence.z)) / (maxAmplitude - foamExistence.z))).x;
		}
#endif


		float3 specular = 0.0f;
#ifdef SPECULAR_SIMPLIFIED
		float3 H = normalize(eyeVecNorm - lightDir);
		
		float e = shininess * 64.0f;
		float kD = saturate(dot(normal, -lightDir)); 
		specular = kD * specular_intensity * pow( saturate( dot( normal, H ) ), e ) * sqrt( ( e + 1 ) / 2 );
		specular *= sunColor;
#else
		// CryTek's way
		float3 mirrorEye = (2.0f * dot(eyeVecNorm, normal) * normal - eyeVecNorm);
		float dotSpec = saturate(dot(mirrorEye.xyz, -lightDir) * 0.5f + 0.5f);
		specular = (1.0f - fresnel) * saturate(-lightDir.y) * ((pow(dotSpec, 512.0f)) * (shininess * 1.8f + 0.2f))* sunColor;
		specular += specular * 25 * saturate(shininess - 0.05f) * sunColor;
#endif
		

		color = lerp(refraction, reflect, fresnel);
		color = saturate(color + max(specular, foam * sunColor));
		
		color = lerp(refraction, color, saturate(depth * shoreHardness));
	}
	
	if(position.y > level)
		color = color2;

	return float4(color, 1.0f);
}

@Aubergine: In which lines? After the if-block (i.e. in lines 82 to 85), it uses only the variables position, level, color and color2. These are all defined before the if-block (namely in lines 49 to 53).

May I suggest that you have a second look into how scoped definitions of variables work?

texcoord is defined inside if else block and used outside of it.

EDIT: You are always welcome to try to translate this shader for Unity if you can. I am working on it since sometime but using the exact same flow of the above shader just doesnt work for unity and gives alot of those “ddx/ddy/tex…etc cant be used in dynamic blocks” error.

It says you can use dynamic branches. Which is true and all fine. You can’t sample textures without providing your own mip level inside of a dynamic branch.

That said, for simple dynamic branches, the compiler might be able to remove the branch and pretend it did not happen. But since Unity uses many shader compilers, you can’t expect all of them to be able to do the same in all cases.

Either remove the dynamic branch: i.e. for each “if” block, remove the if, do all calculations, and then add them to result or not depending on the condition. Small example:

// with dynamic branch
float4 col = 1.0;
if (x > 0.5)
    col += tex2D(mytexture, myuv);

// without dynamic branch
float4 col = 1.0;
float4 tex = tex2D(mytexture, myuv);
if (x <= 0.5) // will be turned into a simple "compare and set" operation
    tex = 0.0;
col += tex;

Your code has “if (_Param < 0.5)”, which is not a dynamic branch. You’re branching on a uniform value, i.e. this is a “static branch”. All pixels of that draw call definitely take the same path.

In GLSL, texture sampling inside of a real dynamic branch is not an error, but an “undefined result”. Might work, might return garbage, might post your private emails on your public facebook wall.

In HLSL, the above is a compile error. Unless the compiler is smart enough to remove your dynamic branch (i.e. rewrite your code as if the branch was not there). But you can’t count on that; so if you have that situation then maybe better to just do that yourself.