So I have perused around the forums, etc and started to understand that fixed function shaders are slower (on mobile devices) that are using GL2.0, etc. So here is my conundrum:
I am making a corn field for example with the corn being a texture with alpha. So I always have it billboarding and it looks ok, optimizing thousands of billboards is another story.
I was using the mobile vertex transparent shader, but standing in my cornfield with thousands of corn textures each on their own plane dropped me to 12fps. Randomly trying out “Unlit/Premultiplied Colored” got me to 17-18fps.
I found this GLSL shader online and it obviously is very basic, it looks like it just discards anything with transparency. The shader is below. As you can see “if (gl_FragColor.a < 0.5) { discard }” does the magic to make it transparent. However this still only bumped me up to 19-20fps. If I comment out the if statement, it runs at 30+ fps, but with black around corn instead of transparent obviously. It seems crazy to me that that single conditional check takes that much more time. Is there anything even simpler I can do than that to make it transparent?
It seems like the discard instruction can be kind of expensive, I found this quote:
This doesn’t make sense to me, that it would cause even more poor performance… it is dropping that fragment!! Is there another method that is faster?
Thanks for any help you can give!
Shader "Custom/MobileAlpha1" {
Properties {
_MainTex ("RGBA Texture Image", 2D) = "white" {}
//_Cutoff ("Alpha Cutoff", Float) = 0.5
}
SubShader {
Pass {
Cull Off // since the front is partially transparent,
// we shouldn't cull the back
GLSLPROGRAM
uniform sampler2D _MainTex;
//uniform float _Cutoff;
varying vec4 textureCoordinates;
#ifdef VERTEX
void main()
{
textureCoordinates = gl_MultiTexCoord0;
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
}
#endif
#ifdef FRAGMENT
void main()
{ gl_FragColor
gl_FragColor =
texture2D(_MainTex, vec2(textureCoordinates));
if (gl_FragColor.a < 0.5)
// alpha value less than user-specified threshold?
{
discard; // yes: discard this fragment
}
}
#endif
ENDGLSL
}
}
// The definition of a fallback shader should be commented out
// during development:
// Fallback "Unlit/Transparent Cutout"
}
Also, I’m currently trying to figure out why it’s not generating the fragment shader and automatically switch to “Fixed function”.
No idea if this the expected ShaderLab behavior or not.
I agree that it doesn’t make intuitive sense, but the hardware is optimized for massively parallel, similar operations. There’s a lot of info available on the specifics, but the gist of it is, discarding fragments and branching are slow, and I don’t believe discarding can fulfill a useful purpose without branching, so that method is doubly bad. Sometimes, it’s your only choice, but hopefully we can do better for your case.
Without seeing your graphics, I can’t make a complete recommendation. But, here are the steps I’d take to improve performance:
Blend instead of discard.
Use a mesh that matches the shape of your corn instead of a quad.
Use two submeshes; maximize the area of the mesh that is opaque, without going overboard on vert count. This post has graphics to illustrate.
I have one quad, and I was going to swap out UV coordinates on the mesh to change “growth stages”.
I don’t know if I can really make a non-quad mesh for these… the corn plant would need at least 20-30 vertices wouldn’t you say? I have corn fields with 5000+ plants.
Right now I have the quads as a child to a “field” object with this script:
foreach(Transform child in transform) {
if (child.renderer.isVisible) {
if (child.transform.InverseTransformPoint(Camera.main.transform.position).z < 3) {
child.transform.LookAt(new Vector3(Camera.main.transform.position.x, 0, Camera.main.transform.position.z));
}
}
}
It loops through all the children seeing if it needs to rotate them based on a timer (not shown), if rendered, and if at a certain viewing angle. This actually works very well on mobile even… just quite a bit of “popping” as the they are only rotated every .25-0.5secs.
So it seems like somehow I need to make a simple transparent shader with billboarding capabilities…?
Maxime, how is your shader making the transparency work compared to the one I posted? This? “Blend SrcAlpha OneMinusSrcAlpha”?
I see it has IFDEF for vertex and fragment. Which will it use? Maybe I need to research that part more.
On mobile? just forget it. Figure out a different technique. Possibly, if you can make it out of 3D meshes with an opaque shader, it MIGHT run ok. Or if not many are visible. But drawing 5k transparent quads on desktop would be slow, never mind mobile.
I have a test scene running on my Motorola Atrix (which is kind of old actually) and it has a tractor visible, along with a corn field with 2250+ corn quads. Using the shader at very top there, I can get18-18-19fpswhile almost entire field and tractor is in view. When I take out that discard instruction it goes to 30+.
I was thinking that this was going to work!! Especially if I can use a better transparency shader, and tack on the billboard inside shader as well…
What if I have all my fields be around 2k crops, and I somehow don’t let all be seen at once… I want a pretty large game world.
Edit: So my Atrix is a tegra2… will my results change by going to iOS, or even an Android with PowerVR (Galaxy S4??)… negative/positive?
I’m not optimistic about this, but I’d like to try and see if I can improve it. Provide us with a test package + scene that’s set up for easy framerate stress profiling and I’ll do that.
Basically I was trying to do the top one (discarding), and then there is alpha testing but it seems like it is just another way to do discarding, and the last is blending. I should have tried all three last night.
Here is the unity package to test. Right now I made a 2400 corn field, all visible in view right away.
Using float4 or fixed4 in the CGPROGAM actually generates the same code for OpenGL ES platform.
I’m not sure lowp is actually supported with the version of OpenGL ES Shading Language used by Unity3d?
Anyway to change the SHADER_API_GLES version being used?
I’d be curious to know for such a simple example, how the half4/mediump version really performs compared to the other (probably depends on the platform/OS and GPU).
Honestly, as mentioned, I’m new to ShaderLab, so not entirely sure what’s exactly required.
Both parts are used, but at different stages of the rendering pipeline.
See the full rendering pipeline described: OpenGLInsights Pipeline
(Note the Vertex Shading block in Vertex Processing stage and Fragment Shading block in the Fragment Processing stage)
Doesn’t work. Please test your exports in a new project to ensure they do. Also, please export in a root folder for organization; I’ve got a template project I test with, which has folders for various forum threads.
I haven’t owned a desktop in seven years! My 3rd gen iPad is where I planned to test this.
Will stuff perform quite a bit different on a 3rd gen iPad than my tegra2 Atrix? All I have is an iPod touch 4, though I haven’t setup a dev account etc to start testing on it yet.
I will have to see how well it performs on mobile when I get home. Is this a bad way of doing things? It all billboards veeery nicely (in editor), etc…