Circles

Hi all,

I’m trying to get pixel perfect circles into a game, something like “Hundreds”. So far I’ve had difficulty getting any sort of decent performance on iPhone 4 (familiar story right?).

What seemed to be my best bet, has turned out to be the slowest of the lot- and that’s writing a custom vert/frag shader that shades the circle.

The shader beneath for some reason is super-slow on the iphone 4 (I’ve tried tweaking precision to be fixed/half for as many of the variables as possible, but with no real gain). 80 circles using this shader is taking 20-30ms to draw- which is obviously unacceptable.

Note, you can test this shader on a quad (although I have a circle mesh making script that draws a coarse circle, but to show actual texture, you’d need procedural geometry, where the x/y vertices are also stored in uv2. The reason for this is that view-space shading like this doesn’t work when batching is on.)

I think at least part of the problem must be that with 80 circles on screen, all alpha blended, all calculating circles length per pixel is just too much for the humble GPU.

Shader "Sprite/CircleClipLocalSpace"
{
    Properties
    {
        _MainTex ("Base (RGB), Alpha (A)", 2D) = "white" {}
        _Edge ("Edge", Float) = 2
        _Strength ("Strength", Range(0,1)) = 0.5
     }

    SubShader
    {

        Tags { "Queue"="Transparent" "IgnoreProjector"="True" "RenderType"="Transparent" }
	
        Pass
        {
            Blend SrcAlpha OneMinusSrcAlpha
			Cull Off 
			Lighting Off 
 			ZWrite Off
 			
			Fog { Color (0,0,0,0) }

            CGPROGRAM
			#pragma exclude_renderers d3d11 xbox360
            #pragma vertex vert
            #pragma fragment frag
            #pragma target 3.0
            #pragma glsl 
            #include "UnityCG.cginc"

	    sampler2D _MainTex;
            float4 _MainTex_ST;
            float _Edge;
            float _Strength;
			          
            struct v2f
            {
                half4 vertex : POSITION;
                half2 texcoord : TEXCOORD0;
                half2 savedVertices : TEXCOORD1;
                fixed4 color : COLOR;
                half2 texUV : TEXCOORD2;  
            };

			struct appdata_t
			{
				float4 vertex : POSITION;
				float2 texcoord : TEXCOORD0;
				float2 savedVertices : TEXCOORD1;
				float4 color : COLOR;
			};
			
			inline float4 Overlay (float4 a, float4 b) {

			    return lerp(   1 - 2 * (1 - a) * (1 - b),    2 * a * b,    step( a, 0.5 ));
			
			}
			
            v2f vert (appdata_t v)
            {
                v2f o;
                o.vertex = mul(UNITY_MATRIX_MVP, v.vertex);
                o.texcoord = v.texcoord - half2(0.5,0.5);
				o.color = v.color;
				o.texUV = v.savedVertices * _MainTex_ST.xy + _MainTex_ST.w;
				
                return o;
            }
			
            half4 frag (v2f IN) : COLOR
            { 
            	fixed4 col = Overlay(IN.color, tex2D(_MainTex, IN.texUV) * _Strength);
            	fixed4 transparent = fixed4(col.xyz,0);
            	float l = length(IN.texcoord);
				float thresholdWidth = length(float2(ddx(l),ddy(l))) * _Edge; 

				float antialiasedCircle = saturate(((1.0 - ( thresholdWidth * 0.25) - (l * 2)) / thresholdWidth) + 0.5) ;
				return lerp(transparent, col, antialiasedCircle);
            }
            ENDCG
        }
    }
}

Other approaches I’ve tried are:
-Large 2048x2048 circle sprite. This was okayish, but showed some jagged edges or edge blurred at high zooms
-High poly circle mesh with MSAA on a 2x (too slow on iPhone 4 / iPad 1)
-Optimising this shader by: Removing the overlay (helped a bit, just multiplying the IN.color), and removing the color component entirely (i.e. baking the color into the textures)

Does anyone have anything else to suggest? I’m struggling with ideas on how to do this. My last hope is to attempt to shade the interior of the circle using an opaque shader, and then attempt to shade the edges with this blend shader, using a separate mesh, but I worry that that’s going to create its own problems. Am I missing something obvious?

Thanks!
Mike

Does it not need to be sqrt(dot(…)) to produce the length (it seems to do this when converting to GLSL)? I couldn’t work out how to refactorize this to use length ^ 2 directly, to avoid the sqrt.

Thanks - every little bit helps!

Are then any reasonable strategies to make less of the circle alphablended?
Is it possible to have two passes, where one is alphablended and the other isn’t?

The reason I ask is that this particular case (beneath) is murdering my framerate.

Which part is alphablended, just the outside of the circle? If so then you really need two meshes/materials, one draw a smooth outline (a ring) and then a non-blending one to fill the interior.

Yup- that looked like the best option for me. In fact, I’ve just discovered the wonder of submeshes, which seems to be the most efficient way of doing this.

I created circle geometry with an extra layer of of triangles for the edge, and created two submeshes, one for the interior and one for the edge.

I’ve now got two versions of my shader above- one that just passes through the UV and shades with colour, no blending and one that does the clean edges trick above. I haven’t tested it on the device yet, but I have high hopes! It’s taking about 0.1ms in the editor, to shade 254 overlapping circles, and most importantly, is batching nicely.

Thanks for the help all!

Bear in mind that most mobile GPUs don’t handle overdraw that well. 254 overlapping circles sounds like quite a lot, that could well be hammering your framerate.

I’d try and make your “inner” circle shader an opaque shader and have it write to the ZBuffer, then the GPU can at least cull the fragments that are hidden behind it.

Are you using depth writes/testing to cut down on overdraw for the interior parts, too?

Not sure how to add depth testing to do that to be honest…

I’ve just tried using submeshes, which definitely works (interior polys without blending, +zwrite on, edges with my circle shader), but … it breaks batching. :frowning: These circles are honestly driving me crazy!

The scene jumps up to 35 draw calls (from 6/7). I’ll be testing on the device, because I suspect even with that many drawcalls, it will be faster than the massive amount of overdraw that existed before…

And… solved it! A complete n00b mistake. I was setting the materials array for the submeshes, using “.materials” rather than “.sharedMaterials”. Everything is now working- 13 draw calls, almost no overdraw, and 165 draw calls saved. Thanks all for you help!

why bother with maths?
Just use a gradient texture and blend it through a grayscale circles texture?

Isn’t the math here simpler, faster and more precise than textures? Textures won’t help with overdraw either.

For speed.
Everything you put inside a shader gets executed many many times per frame.
Therefore, if you can achieve the results with a “static” texture instead of a procedurally created one, like the above approach, you gain in speed (and lose in memory).
Interesting thread btw.

The use case is similar to “Hundreds”. The circles need to be anything from 10 pixels through to twice the size of the screen, while always maintaining perfect edges. I also wanted to be able to texture them in view space, while maintaining that edge.

The only issue I have now is that if the circles overlap, there’s z-fighting. Other than that, this approach gives me pixel perfect edges at any size, and no overdraw, and the shader complexity only happens for about 1% of the circle pixels.

The benefit of this approach, is that with sensible geometry and submeshes, I can use the circle / maths heavy + blended shader only for an edge fringe of pixels, and then use a static texture shader, opaque for the interior.

I’ve included a reference picture beneath to demonstrate.

1369351--68722--$Screen Shot 2013-09-25 at 21.07.39.png

Note that the darker grey circles are textured with a triangular pattern (those aren’t polygons), but the edges are calculated by the pixel shader and are pixel-perfect at any resolution, size, camera size thanks to the ddx/ddy screenspace derivs. I make the geometry slightly larger than the intended circle size so I can get away with much smaller geometry (see image beneath), but I’ll be experimenting with this- since I have CPU to spare. Obviously, I could increase the number of vertices in the circle, which would allow me to decrease the size of the edge triangles, which in turn would mean less shading for the pixel shader. On top of all that, I need the circles to batch! I’ve got it mostly working- just the z-fighting issue I mentioned earlier.

1369351--68723--$RingRadius.png

Incidentally, the submesh approach where you mix alphablended and regular texture shaders can apparently work wonders in 2d games too: These guys did some stellar work implementing a similar strategy with 2d sprites at Unite 2013 (just out of interest). http://bit.ly/1eJ6Q5T

That would be true if texture lookups were free. They are heavily limited by the bandwidth, which is with new hardware improving much slower than ALU ops. Look at the so-called normalization cubemaps. With pre-2005 hardware, it was faster to normalize vectors with a cubemap texture rather than with the normalize() function. Now it’s the other way around. Modern GPUs can do math extremely fast.

Does anyone know why interlayered materials refuse to batch? I’ve read that it breaks dynamic batching- would just like to understand why.

In this specific scenario that op described, you believe that a texture lookup is heavier than the above shader ?

runonthespot : try this shader too http://forum.unity3d.com/threads/105130-Hardware-accelerated-quadratic-bezier-curve?p=698896&viewfull=1#post698896

Btw, out of curiosity : Can you describe the expected results, in detail ? It is clear that

  • the circles should be pixel perfect
  • the circles change size and
  • the max circle size is double the screen res.

Do they move based on user input ? Do they change colour ? If so, they change colour gradually or abruptly ? What is the maximum number of circles on screen simultaneously ? They are solid coloured or have some sort of pattern / texture ?

Thanks Ippokratis,

Your shader looks quite similar to mine- the main difference is that I’m texturing in view space (usually with a tiled texture, or a round pattern with a plain coloured border + clamp). I think mine is slower mainly because I’m using ddx/ddy to determine the equivalent of your _lineWidth dynamically, and because I’m using vertex colors to color the circles, and using the Texture as a photoshop style overlay on the plain color.

I’ve created a variant of my shader above that just does this in the frag:

return Overlay(IN.color, tex2D(_MainTex, IN.texUV) * _Strength);

i.e. ignores the circle aspect, and am using that to shade all the interior polys. That seems to be working well

The next problem is that I want that tiled effect to work with different textures. If I have individual circles, I can create 3 (well, 6, considering that I have to create a variant that uses the interior shader and the edge shader) materials each with different textures for say, 50 circles, it batches down to 3(/6) draw calls. The moment I layer them though, like in my screenshot above, it breaks batching if they’re different materials. I think I might just decide to live with a single material for those cases.

Well… I put it through ShaderAnalyzer and it reported that the shader is already texture fetch bound for more than a half of the cards in the list.
Although that might be different for mobile GPUs… I’m not sure how well they handle texture fetches compared to ALU ops.