Fastest shader code on IOS, cg or regular shaderlab?

So I recently dove into shaders in Unity and since I’m only developing for the mobile platforms, I have a few questions about shader performance.

Now I am only interested in shader model 2.0 and above, so I’m not worried about 1st gen iPhones and such.

While looking into the Mobile shaders, I noticed that none of them use any cg code, so i have been wondering if there’s a reason for that? Also, can you achieve more advance shader effects such as edge outlines and rim lighting without using cg code, and only regular ShaderLab coding?

Thanks in advance for your time!
Stephane

Writing in GLSL with good use of precision modifiers is how you write the fastest shaders. Auto-translated Cg will be close (or better if you don’t know what you’re doing). You need one of them for edge effects. Fixed Function is probably used so much because of compatibility, and the fact that the effects are achievable without the programmable pipeline. You shouldn’t use them, though, if you can help it, because they are slower than GLSL.

On iOS, it’s been our experience that fixed function shaders under OGLES 1.1 are consistently faster than those same shaders under OGLES 2.0. I assume that this is because the 1.1 emulation at the driver/hardware level is faster than the automatic translation to GLSL by Aras.

Interesting. I’ve written GLSL shaders for people and they told me they were faster than the equivalent optimized ShaderLab. I’ve been learning a lot of hardware-specific tricks this year, though. This tool has also proven really helpful to teach me a few undocumented tricks, too:

http://www.imgtec.com/powervr/insider/powervr-pvrunisco.asp

Ironically, it’s total garbage on OS X, but works great on Windows. :stuck_out_tongue:

I wonder if it’s possible to write in Cg and get good GLSL out of it, with enough specialized effort. I haven’t learned Cg yet, so can’t make a good effort.

Cg is at least cross platform, so if you can get decent speed out of it, it’s the way to go I guess.

Since Unity 3.2 the GLSL auto translation got significantly faster anyway, also the precision modifiers are supported for cg → GLSL since about then too so the larger major points that normally were troublesome are gone.

GLSL can still be faster, but in those 3% of the cases where this would hold it only does if you are a shader expert to master mind, otherwise you won’t pull it off, you are likely going to write something slower.

But there is one thing that cg unhappily lends itself to ‘missassume’: that using surface shaders and node shader editors is a smart idea … and it definitely isn’t. Surface shaders are fine for the few special shaders with pixel light interaction for ipad2 that you might need but for anything else, focus on normal cg shaders

Depends on time constraints. There’s no reason you can’t write a GLSL SubShader that comes before a Cg one. (Not that I believe in cross-platform apps. ;))

Thanks a lot for all of your answers!

I just want to clarify something which I’m a little confused about. The following shader code is one of the Mobile shaders which comes with Unity:

Shader "Mobile/Transparent/Vertex Color" {
Properties {
	_Color ("Main Color", Color) = (1,1,1,1)
	_SpecColor ("Spec Color", Color) = (1,1,1,0)
	_Emission ("Emmisive Color", Color) = (0,0,0,0)
	_Shininess ("Shininess", Range (0.1, 1)) = 0.7
	_MainTex ("Base (RGB) Trans (A)", 2D) = "white" {}
}

Category {
	Tags {"Queue"="Transparent" "IgnoreProjector"="True" "RenderType"="Transparent"}
	ZWrite Off
	Alphatest Greater 0
	Blend SrcAlpha OneMinusSrcAlpha 
	SubShader {
		Material {
			Diffuse [_Color]
			Ambient [_Color]
			Shininess [_Shininess]
			Specular [_SpecColor]
			Emission [_Emission]	
		}
		Pass {
			ColorMaterial AmbientAndDiffuse
			Fog { Mode Off }
			Lighting Off
			SeparateSpecular On
        	SetTexture [_MainTex] {
            Combine texture * primary, texture * primary
        }
        SetTexture [_MainTex] {
            constantColor [_Color]
            Combine previous * constant DOUBLE, previous * constant
        }  
		}
	} 
}
}

Is this considered a fixed function pipeline shader? There is no Cg at all in this one, and this is much easier to write for a noob. Now the following shader is one I put together with Cg:

Shader "SRShaders/UsePassSnippets/UnlitDiffuseLightmap" {

    // UNLIT shader, not affected by any scene lights
    SubShader {
        Pass 
        {
            Name "UNLIT"
            Tags { "LightMode" = "Always" }

            
            CGPROGRAM
			#include "UnityCG.cginc"
			
			#pragma vertex vertTrack
			#pragma fragment fragTrack			

			half4 _MainTex_ST;
			sampler2D _MainTex;
			half4 _LightMap_ST;
			sampler2D _LightMap;
			fixed4 _Color;
			
			struct appdataTrack 
			{
				float4 vertex : POSITION;
				float4 texcoord : TEXCOORD0;
				float4 texcoord1 : TEXCOORD1;
			};
			
			struct v2ftrack 
			{
				half4 pos : POSITION;
				half4 color : COLOR;
				half2 uv : TEXCOORD0;
				half2 uv2 : TEXCOORD1;
			};
			
			v2ftrack vertTrack ( appdataTrack v )
			{
				v2ftrack o;
				o.pos = mul(UNITY_MATRIX_MVP, v.vertex);
				o.uv.xy = TRANSFORM_TEX(v.texcoord,_MainTex);
				o.uv2.xy = TRANSFORM_TEX(v.texcoord1,_LightMap);
				o.color = _Color;	
				
				return o; 
			}
		
            // Apply shader
            half4 fragTrack(v2ftrack i) :COLOR 
            { 
            	fixed4 tex = tex2D(_MainTex, i.uv.xy);
            	fixed4 tex2 = tex2D( _LightMap, i.uv2.xy );
            	return (i.color * 2) * (tex * (tex2 * tex2)); 
            }

            ENDCG
        }
    }
}

Obviously the code in the first shader example is much easier to write when you’re new to shaders, but if I understand correctly, it is slower then writing the same shader in Cg, for example…am I correct? It also has lots of limitations I would think?

Last (dumb) question, does using a shader editor like SSE make shaders slower? If yes, is it really that much slower or can I live with it when developing for the IOS/Android?

Again, thanks a lot for all your help, and sorry for the dumb questions :wink:
Stephane

Yes the first one is fixed function :slight_smile:

The second one is normal cg (no surfaceshader).

And yes using SSE will make it slower cause SSE can only generate surface shaders and surface shaders automatically are using lights etc which adds a large number of instructions, where as handcoded shaders that are not surface shaders can or can not use the light etc.
if you can live iwth the much slower depends on you. if you did intend to target anything weaker than the ipad2 I would think twice about it (if you use SSE and surface shaders you normally also have pixel lights, if you don’t have them then its normally a totally different story)

for android I would avoid such tools altogether. even the strongest android device today has that bad drivers etc that it barely gets up to 30-40% of the ipad2 if I recall right (thats the SGS2)

That first shader has a ton of cruft. The Material block and associated variables doesn’t do anything, and Alphatest Greater 0 is ignored on iOS, because it’s a performance hit on PowerVR GPUs. There’s also no point in using Category. It can’t be improved, performance-wise, however:

Shader "Mobile/Transparent/Vertex Color" {
	
Properties {
	_Color ("Main Color  (A = Opacity)", Color) = (1,1,1,1)
	_MainTex ("Base (RGB) Trans (A)", 2D) = "white" {}
}

SubShader {
	Tags {"Queue"="Transparent" "IgnoreProjector"="True" "RenderType"="Transparent"}
	ZWrite Off
	Blend SrcAlpha OneMinusSrcAlpha
	Fog {Mode Off}
	BindChannels {
		Bind "vertex", vertex
		Bind "color", color
	}
	Pass {
		SetTexture[_MainTex] {Combine texture * primary}
		SetTexture[_MainTex] {
			ConstantColor[_Color]
			Combine previous * constant Double, previous * constant
		}
	}
}

}

Here’s what the GLSL converter spits out for your Cg (cleaned up a bit to compile as its own shader):

Shader "SRShaders/UsePassSnippets/UnlitDiffuseLightmap" {

SubShader {Pass {
	Name "UNLIT"
	GLSLPROGRAM
	varying mediump vec2 xlv_TEXCOORD1;
	varying mediump vec2 xlv_TEXCOORD0;
	varying mediump vec4 xlv_COLOR;

	#ifdef VERTEX
	uniform mediump vec4 _MainTex_ST;
	uniform mediump vec4 _LightMap_ST;
	uniform lowp vec4 _Color;
	void main () {
		mediump vec4 tmpvar_1;
		mediump vec4 tmpvar_2;
		mediump vec2 tmpvar_3;
		mediump vec2 tmpvar_4;
		highp vec4 tmpvar_5;
		tmpvar_5 = (gl_ModelViewProjectionMatrix * gl_Vertex);
		tmpvar_1 = tmpvar_5;
		highp vec2 tmpvar_6;
		tmpvar_6 = ((gl_MultiTexCoord0.xy * _MainTex_ST.xy) + _MainTex_ST.zw);
		tmpvar_3 = tmpvar_6;
		highp vec2 tmpvar_7;
		tmpvar_7 = ((gl_MultiTexCoord0.xy * _LightMap_ST.xy) + _LightMap_ST.zw);
		tmpvar_4 = tmpvar_7;
		tmpvar_2 = _Color;
		gl_Position = tmpvar_1;
		xlv_COLOR = tmpvar_2;
		xlv_TEXCOORD0 = tmpvar_3;
		xlv_TEXCOORD1 = tmpvar_4;
	}
	#endif
	
	#ifdef FRAGMENT
	uniform sampler2D _MainTex;
	uniform sampler2D _LightMap;
	void main () {
		lowp vec4 tmpvar_1;
		tmpvar_1 = texture2D (_LightMap, xlv_TEXCOORD1);
		gl_FragColor = ((xlv_COLOR * 2.0) * (texture2D (_MainTex, xlv_TEXCOORD0) * (tmpvar_1 * tmpvar_1)));
	}
	#endif
	ENDGLSL
}}

}

Using that medium precision variable in the fragment shader is destroying your performance, for absolutely no visual improvement. Unfortunately, I don’t yet know how to fix your Cg, to account for this, but here’s a cleaned up GLSL shader for you. According to PVRUnisco, the fragment shader is down from 15 cycles to 3. The vertex shader is down from 18 to 13. However, that’s only because five of those had been wasted on the varying color, which is unnecessary. (You might think to multiply it in the vertex shader, but it can be doubled in the fragment shader at no cost, as long as you use order and/or parentheses to help the compiler.)

Why you’re squaring the lightmap, I have no idea. If instead you meant to double it, that can save you a further cycle.

Shader "Give/This A Reasonable Name" {
	
Properties {
	_Color ("Main Color", Color) = (1,1,1)
	_MainTex ("Base", 2D) = ""
	_LightMap ("Lightmap", 2D) = ""
}

SubShader {Pass {
	Name "UNLIT"
	GLSLPROGRAM
	varying mediump vec2 mainUV, lightmapUV;

	#ifdef VERTEX
	uniform mediump vec4 _MainTex_ST, _LightMap_ST;
	void main () {
		gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
		mainUV = gl_MultiTexCoord0.xy * _MainTex_ST.xy + _MainTex_ST.zw;
		lightmapUV = gl_MultiTexCoord0.xy * _LightMap_ST.xy + _LightMap_ST.zw;
	}
	#endif
	
	#ifdef FRAGMENT
	uniform lowp sampler2D _MainTex, _LightMap;
	uniform lowp vec4 _Color;
	void main () {
		lowp vec4 lightmap = texture2D(_LightMap, lightmapUV);
		gl_FragColor = texture2D(_MainTex, mainUV) * lightmap * lightmap * (_Color * 2.);
	}
	#endif
	ENDGLSL
}}

}

Dreamora: Thanks a lot for the answers and tips, it really helps and I now have a better understanding of the whole ShaderLab/Cg stuff!

Jessy: I really appreciate the time you take to help out, thanks a lot! I am new at programing Shaders and your skill level is way beyond mine, so I have a hard time understanding exactly what you mean in a few areas above. If you don’t mind spending more time helping me out, I do have a few questions for you - or anyone else willing to help of course:

  1. “Using that medium precision variable in the fragment shader is destroying your performance” - Are you talking about the “fixed4 tex” var? If yes, what’s the lowest precision one can use when defining Tex2D variables?

  2. “Why you’re squaring the lightmap, I have no idea” - What do you mean by “squaring”? Is this the part you’re referring to: return (i.color * 2) * (tex * (tex2 * tex2)); ?

  3. I started learning Cg, but looking at some of your GLSL code, it seems easier to write then Cg. Would learning GLSL instead be a better approach for a shader programing noob like myself?

That’s all folks! Again, I really appreciate the time you guys put into helping out new developers like myself…trust me, you’re making my life a lot easier!!!

Stephane

Like I said, I don’t know Cg. I tried to mess with your code for a minute without success. lowp is the default for sampler2D’s, and what I use for everything in fragment shaders. xlv_COLOR uses mediump, as you can see.

Sometimes, I need to brighten something more than 2x, but 2 is the maximum value low precision offers. In these cases, it’s better to multiply something by 2 again (and add it to itself, as it’s free to add after a multiply), than use higher precision.

When you multiply something by itself, that’s called squaring it. I can’t think of why you’d want to do that with a lightmap.

I have no idea. It’s worked for me; I think it’s more readable than Cg, but my opinion might change if I use Cg more.

This would definitely be the place to start with GLSL: http://en.wikibooks.org/wiki/GLSL_Programming/Unity

This looks like it will be a good place for me to start learning Cg: http://forum.unity3d.com/threads/106793-Cg-tutorial-port-in-Unity.

The only platforms I care about at the moment are iOS and Xbox 360. As I can’t afford to develop for the 360 yet, it may be some time before I bother with Cg. :wink: If you care about more platforms, Cg is probably a better choice, as has been mentioned. It’s best to be able to work with both languages, though, so you can improve GLSL performance.

glsl is definitely not easier. not only will you barely find any documentation and tutorial where as cg has a host of documentation and additionally is ‘nearly hlsl’ which offers you another pile of information etc, to make it worse GLSL is also fragmented. there is GLSL on desktop and GLSL for OpenGL ES which have slight deltas especially on what you can get your hands on information wise but also on other things. Its the way to get the fast-fastest shaders for mobiles yeah. but if you are expert enough to write them, you wouldn’t be in need to ask questions about either anymore and they would both be ‘easy’ :wink:

also GLSL only works on 3 platforms, thats android, ios and the osx, where as cg basically works everywhere.

Ah yes, got it :slight_smile: I’m not using the “lightmap” as a “lightmap”…not really…I’m just using it to multiply with the main texture to darken the shadows/AO areas, and I wanted them twice as dark, so I did that for testing purposes. I know, i know…don’t ask…lots of testing lately :stuck_out_tongue:

Thanks for the links and precision info!

Thanks for the info dreamora. I thought I read somewhere that GLSL is definitively not for noobs…my skills level is not high enough to get into it, so I’ll stick to Cg for now!

You guys are great! Thanks a lot again for your time :sunglasses:
Stephane

Jessy/Dreamora, if you guys don’t mind, I have 1 last question about speed and optimized shaders…I used the code that Jessy gave me above to create the optimized transparency shader, and I also created a new shader which does the same thing but with Cg code instead. Here there are:

Simple shader:

Shader "SRShaders/Transparent/Simple Transparency" {

Properties {

	_Color ("Main Color", Color) = (1,1,1,1)
	_MainTex ("Base (RGB) Trans (A)", 2D) = "white" {}
}

SubShader {

	Tags {"Queue"="Transparent" "IgnoreProjector"="True" "RenderType"="Transparent"}
	ZWrite Off
	Blend SrcAlpha OneMinusSrcAlpha
	Fog { Mode Off }
	
	BindChannels {
	
		Bind "texcoord", texcoord0
		Bind "vertex", vertex
		Bind "color", color
	}
	
	Pass {
	
    	SetTexture [_MainTex] { Combine texture * primary }
    	SetTexture [_MainTex] {
    	
	        ConstantColor [_Color]
	        Combine previous * constant DOUBLE, previous * constant
    	} 
    } 
} 
}

Now here’s my Cg version of the same shader:

Shader "SRShaders/Transparent/Cg Transparency"
{
	Properties 
	{
		_Color("Main Color", Color) = (0.5,0.5,0.5,1)
		_MainTex("Base (RGB) Gloss (A)", 2D) = "gray" {}
	}
	
	CGINCLUDE
	
		#include "UnityCG.cginc"
	
		half4 _MainTex_ST;
		sampler2D _MainTex;
		fixed4 _Color;
		
		struct v2f 
		{
			half4 pos : POSITION;
			half4 color : COLOR;
			half2 uv : TEXCOORD0;
		};
		
		v2f vert ( appdata_base v )
		{
			v2f o;
			o.pos = mul(UNITY_MATRIX_MVP, v.vertex);
			o.uv.xy = TRANSFORM_TEX(v.texcoord,_MainTex);
			o.color = _Color;	
			
			return o; 
		}
	
	    // Apply shader
	    half4 frag(v2f i) : COLOR 
	    { 
	    	fixed4 tex = tex2D(_MainTex, i.uv.xy);
	    	return (i.color * 2) * tex; 
	    }
	    
    ENDCG
	
	SubShader 
	{
		Tags { "RenderType" = "Transparent" "Queue" = "Transparent+100"}
		Cull Off
		Lighting Off
		ZWrite Off
		Fog { Mode Off }
		Blend SrcAlpha OneMinusSrcAlpha
		
        Pass
        {
			CGPROGRAM
			#pragma vertex vert
			#pragma fragment frag
			#pragma fragmentoption ARB_precision_hint_fastest 
			
            ENDCG
        }
	}
}

On IOS, will the Cg shader be faster then the non-Cg version above? Thanks again for your time!!!
Stephane

you will have to test. There is no ‘theoretical’ thing. But normally the non cg is faster cause the iOS driver are extremely optimized. But the moment you go over to android it can be exactly vise versa cause the android gpu drivers are a joke, even intel and ati create better ones than most android handset makers have the nerve to integrate in android :frowning:

Got it, thanks for the help Dreamora, I will leave you guys alone now hahaha :wink: