Transparent GLSL Shader for Mobile - Improve more??

Hi,

So I have perused around the forums, etc and started to understand that fixed function shaders are slower (on mobile devices) that are using GL2.0, etc. So here is my conundrum:

I am making a corn field for example with the corn being a texture with alpha. So I always have it billboarding and it looks ok, optimizing thousands of billboards is another story.

I was using the mobile vertex transparent shader, but standing in my cornfield with thousands of corn textures each on their own plane dropped me to 12fps. Randomly trying out “Unlit/Premultiplied Colored” got me to 17-18fps.

I found this GLSL shader online and it obviously is very basic, it looks like it just discards anything with transparency. The shader is below. As you can see “if (gl_FragColor.a < 0.5) { discard }” does the magic to make it transparent. However this still only bumped me up to 19-20fps. If I comment out the if statement, it runs at 30+ fps, but with black around corn instead of transparent obviously. It seems crazy to me that that single conditional check takes that much more time. Is there anything even simpler I can do than that to make it transparent?

It seems like the discard instruction can be kind of expensive, I found this quote:

This doesn’t make sense to me, that it would cause even more poor performance… it is dropping that fragment!! Is there another method that is faster?

Thanks for any help you can give!

    Shader "Custom/MobileAlpha1" {
       Properties {
          _MainTex ("RGBA Texture Image", 2D) = "white" {}
          //_Cutoff ("Alpha Cutoff", Float) = 0.5
       }
       SubShader {
          Pass {    
             Cull Off // since the front is partially transparent,
                // we shouldn't cull the back
     
             GLSLPROGRAM
     
             uniform sampler2D _MainTex;    
             //uniform float _Cutoff;
     
             varying vec4 textureCoordinates;
     
             #ifdef VERTEX
     
             void main()
             {
                textureCoordinates = gl_MultiTexCoord0;
                gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
             }
     
             #endif
     
             #ifdef FRAGMENT
     
             void main()
             { gl_FragColor
                gl_FragColor =
                   texture2D(_MainTex, vec2(textureCoordinates));
                if (gl_FragColor.a < 0.5)
                   // alpha value less than user-specified threshold?
                {
                  discard; // yes: discard this fragment
                }
             }
     
             #endif
     
             ENDGLSL
          }
       }
       // The definition of a fallback shader should be commented out
       // during development:
       // Fallback "Unlit/Transparent Cutout"
    }

I’ve just started using ShaderLab, so there might still be some optimisations.
Here is the GLSL code I’m using for a sprite shader:

Shader "Custom/SpriteShader3" {
	Properties {
		_MainTex ("Base (RGB) Trans (A)", 2D) = "white" {}
	}
	SubShader {
		Pass {
			Tags { "Queue"="Transparent" "IgnoreProjector"="True" "RenderType"="Transparent" }
	//		LOD 100
			
			ZWrite Off
			Cull off
			Blend SrcAlpha OneMinusSrcAlpha
			Lighting Off
		
			GLSLPROGRAM
			
			#ifdef VERTEX

			varying vec2 _texUV;
	 
			void main()
			{
			  gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
			  _texUV = gl_MultiTexCoord0.st;
			}
			#endif
	 
			#ifdef FRAGMENT
			uniform sampler2D _MainTex;
	 
			varying vec2 _texUV;
	 
			void main()
			{
				gl_FragColor = texture2D(_MainTex, _texUV);
			}
			#endif
			
			ENDGLSL
		}
	} 
}

Also, I’m currently trying to figure out why it’s not generating the fragment shader and automatically switch to “Fixed function”.
No idea if this the expected ShaderLab behavior or not.

I agree that it doesn’t make intuitive sense, but the hardware is optimized for massively parallel, similar operations. There’s a lot of info available on the specifics, but the gist of it is, discarding fragments and branching are slow, and I don’t believe discarding can fulfill a useful purpose without branching, so that method is doubly bad. Sometimes, it’s your only choice, but hopefully we can do better for your case.

Without seeing your graphics, I can’t make a complete recommendation. But, here are the steps I’d take to improve performance:
Blend instead of discard.
Use a mesh that matches the shape of your corn instead of a quad.
Use two submeshes; maximize the area of the mesh that is opaque, without going overboard on vert count.
This post has graphics to illustrate.

In the mean time, you can use the CG version, which actually compiles to the same GLSL code and do generate both vertex and fragment shaders.

Shader "Custom/SpriteShader2" {
	Properties {
		_MainTex ("Base (RGB) Trans (A)", 2D) = "white" {}
	}
	SubShader {
		Pass {
			Tags { "Queue"="Transparent" "IgnoreProjector"="True" "RenderType"="Transparent" }
	//		LOD 100
			
			ZWrite Off
			Cull off
			Blend SrcAlpha OneMinusSrcAlpha
			Lighting Off
			
			CGPROGRAM
	
			#pragma vertex vert
			#pragma fragment frag
			
			#pragma glsl
			
			#include "UnityCG.cginc"
			
			sampler2D _MainTex;
			
			struct v2f {
			    float4  pos : SV_POSITION;
			    float2  uv : TEXCOORD0;
			};
			
			v2f vert (appdata_base v)
			{
			    v2f o;
			    o.pos = mul (UNITY_MATRIX_MVP, v.vertex);
			    o.uv = v.texcoord;
			    return o;
			}
			
			half4 frag (v2f i) : COLOR
			{
			    half4 texcol = tex2D (_MainTex, i.uv);
			    return texcol;
			}
			ENDCG
		}
	}
}

That doesn’t do anything; don’t bother with it.

Too high of precision!

I have one quad, and I was going to swap out UV coordinates on the mesh to change “growth stages”.

I don’t know if I can really make a non-quad mesh for these… the corn plant would need at least 20-30 vertices wouldn’t you say? I have corn fields with 5000+ plants.

Right now I have the quads as a child to a “field” object with this script:

foreach(Transform child in transform) {
				if (child.renderer.isVisible) {
					if (child.transform.InverseTransformPoint(Camera.main.transform.position).z < 3) {
						child.transform.LookAt(new Vector3(Camera.main.transform.position.x, 0, Camera.main.transform.position.z));						
					}
				}
			}

It loops through all the children seeing if it needs to rotate them based on a timer (not shown), if rendered, and if at a certain viewing angle. This actually works very well on mobile even… just quite a bit of “popping” as the they are only rotated every .25-0.5secs.

So it seems like somehow I need to make a simple transparent shader with billboarding capabilities…?

Maxime, how is your shader making the transparency work compared to the one I posted? This? “Blend SrcAlpha OneMinusSrcAlpha”?

I see it has IFDEF for vertex and fragment. Which will it use? Maybe I need to research that part more.

Thanks for the help!!

On mobile? just forget it. Figure out a different technique. Possibly, if you can make it out of 3D meshes with an opaque shader, it MIGHT run ok. Or if not many are visible. But drawing 5k transparent quads on desktop would be slow, never mind mobile.

I have a test scene running on my Motorola Atrix (which is kind of old actually) and it has a tractor visible, along with a corn field with 2250+ corn quads. Using the shader at very top there, I can get18-18-19fpswhile almost entire field and tractor is in view. When I take out that discard instruction it goes to 30+.

I was thinking that this was going to work!! Especially if I can use a better transparency shader, and tack on the billboard inside shader as well…

What if I have all my fields be around 2k crops, and I somehow don’t let all be seen at once… I want a pretty large game world.

Edit: So my Atrix is a tegra2… will my results change by going to iOS, or even an Android with PowerVR (Galaxy S4??)… negative/positive?

I’m not optimistic about this, but I’d like to try and see if I can improve it. Provide us with a test package + scene that’s set up for easy framerate stress profiling and I’ll do that.

You will test it on a mobile device or just desktop?

So I read through all of this: GLSL Programming/Unity/Transparent Textures - Wikibooks, open books for an open world

Basically I was trying to do the top one (discarding), and then there is alpha testing but it seems like it is just another way to do discarding, and the last is blending. I should have tried all three last night.

Here is the unity package to test. Right now I made a 2400 corn field, all visible in view right away.

https://www.dropbox.com/s/2ajvu1c7akg7ssu/corn_alpha_test.unitypackage

Thanks for the help!

PS: The field object has the child-corn rotation script.

It’s been a while since I’ve used GLSL or CG, and never worked much on mobile / OpenGL ES.

Just read OptimizingGraphicsPerformance about lowp, mediump and highp qualifiers existing in OpenGL ES Shading Language.

Using float4 or fixed4 in the CGPROGAM actually generates the same code for OpenGL ES platform.
I’m not sure lowp is actually supported with the version of OpenGL ES Shading Language used by Unity3d?
Anyway to change the SHADER_API_GLES version being used?

I’d be curious to know for such a simple example, how the half4/mediump version really performs compared to the other (probably depends on the platform/OS and GPU).

Here is the OES GLSL generated code:

  1. with half4:
#define SHADER_API_GLES 1
#define tex2D texture2D

#ifdef VERTEX
#define gl_ModelViewProjectionMatrix glstate_matrix_mvp
uniform mat4 glstate_matrix_mvp;

varying highp vec2 xlv_TEXCOORD0;

attribute vec4 _glesMultiTexCoord0;
attribute vec4 _glesVertex;
void main ()
{
  gl_Position = (gl_ModelViewProjectionMatrix * _glesVertex);
  xlv_TEXCOORD0 = _glesMultiTexCoord0.xy;
}

#endif
#ifdef FRAGMENT

varying highp vec2 xlv_TEXCOORD0;
uniform sampler2D _MainTex;
void main ()
{
  mediump vec4 texcol_1;
  lowp vec4 tmpvar_2;
  tmpvar_2 = texture2D (_MainTex, xlv_TEXCOORD0);
  texcol_1 = tmpvar_2;
  gl_FragData[0] = texcol_1;
}

#endif"
  1. with float4/fixed4
#define SHADER_API_GLES 1
#define tex2D texture2D

#ifdef VERTEX
#define gl_ModelViewProjectionMatrix glstate_matrix_mvp
uniform mat4 glstate_matrix_mvp;

varying highp vec2 xlv_TEXCOORD0;

attribute vec4 _glesMultiTexCoord0;
attribute vec4 _glesVertex;
void main ()
{
  gl_Position = (gl_ModelViewProjectionMatrix * _glesVertex);
  xlv_TEXCOORD0 = _glesMultiTexCoord0.xy;
}

#endif
#ifdef FRAGMENT

varying highp vec2 xlv_TEXCOORD0;
uniform sampler2D _MainTex;
void main ()
{
  gl_FragData[0] = texture2D (_MainTex, xlv_TEXCOORD0);
}

#endif"

I suppose it’s all because of the _MainTex declaration specifying an Alpha channel and the pass tags to render it in the transparent queue.

Properties {
        _MainTex ("Base (RGB) Trans (A)", 2D) = "white" {}
    }

    SubShader {
        Pass {
            Tags { "Queue"="Transparent" "IgnoreProjector"="True" "RenderType"="Transparent" }

Honestly, as mentioned, I’m new to ShaderLab, so not entirely sure what’s exactly required.

Both parts are used, but at different stages of the rendering pipeline.
See the full rendering pipeline described: OpenGLInsights Pipeline
(Note the Vertex Shading block in Vertex Processing stage and Fragment Shading block in the Fragment Processing stage)

Doesn’t work. Please test your exports in a new project to ensure they do. Also, please export in a root folder for organization; I’ve got a template project I test with, which has folders for various forum threads.

I haven’t owned a desktop in seven years! :open_mouth: My 3rd gen iPad is where I planned to test this.

Sorry about that… this should work fine now: Dropbox - Error - Simplify your life

Will stuff perform quite a bit different on a 3rd gen iPad than my tegra2 Atrix? All I have is an iPod touch 4, though I haven’t setup a dev account etc to start testing on it yet.

It doesn’t. Could be due to the blender file; try with an FBX, and please, again, put the entire package in a folder.

You mean dont do Assets > Export? Just give you the actual root Asset folder?

OR

Assets/Corn Alpha Test/all files

And then export “Corn Alpha Test” as unitypackage?

Yes, that please. Then I can keep template stuff organized as I want, in Assets, and not have anyone else’s assets mangle that structure.

Ok removed blend and swapped with FBX. It should work this time, I don’t know why not…

https://www.dropbox.com/s/2ajvu1c7akg7ssu/corn_alpha_test.unitypackage

I just tried (and successfully) created a 10,000 texture corn field with Particle Renderer… read my post here: http://forum.unity3d.com/threads/185808-Billboarding-System-shader-Y-Axis-locked?p=1279342&viewfull=1#post1279342

I will have to see how well it performs on mobile when I get home. Is this a bad way of doing things? It all billboards veeery nicely (in editor), etc…