Will this cause bad overdraw and other performance issues?

Hi, just curious if these billboards are bad practice. We need every bit of performance possible. Should the billboard quad fit to the tree like the red outline I drew? This is a cutout shader, does it still cause overdraw on the completely transparent pixels on the large quad used for the billboard?
If it helps, we are targeting new iPhones running Metal, and new Androids running Vulcan.

Cheers!

Yes.

In my experience, cutout shaders ran very slow on mobile hardware (TBDR). Not sure if todays mobile GPU hardware changed in this regard.

You could give it a try and change the material to a regular alphablended shader for testing and use Xcode Instruments to measure if it makes a difference GPU performance wise.

This document could be interesting (there is probably a metal equivalent online too):
https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/Performance/Performance.html

Thanks for the reply, unfortunately from my testing alpha blending breaks batching when they overlap with other alpha blended materials, i.e different tree billboards overlapping. Also, cutout seems fine on modern mobile GPUs from my experience.

Usin discard instruction in shaders can remove some early Z optimizations because whether or not Z must be written is unknown prior to the pixel shader invocation.

So clearly the less pixels you cover, the better for performance.

Best solution would be to compute some kind of hull around your trees at the expense of a few additional vertices. But looking at your trees having a tight quad around them would already be quite good.

2 Likes

Thanks for the info!

I should mention, we currently have 10,000 trees on the map, and all the billboards are combined into a single drawcall, if we were to do a hull shape around the billboard, we would just be using regular instanced billboards, so at best we are looking at about 60+ drawcalls to render the tree billboards + additional 2-3 vertices per billboard.

This may be a tough question, if tight quads around the mesh as I drew in the first image are not an option for us, what do you think would be more beneficial for performance of the following:

  1. Additional vertices to shape billboard around tree, at cost of rendering more at least 60 extra drawcalls + probably around 20k extra vertices.

or

  1. All trees billboards in a single drawcall + lower vertices, at the cost of overdraw due to the 1:1 quad shape as shown in the first image.

I’ll mention again we are targeting high end mobile devices running Metal and Vulcan only. The goal is to keep everything on the level under 200 batches & 200k tris, absolute max, as well as keeping shaders as simple as possible, but I have little knowledge as how to properly profile shaders.

Of course the best answer is to probably just test it & profile, but unfortunately I can’t do any mobile testing for a little while, and I want to try and make the best decision moving forward for now.

You could use two billboard LODs - a single (tightly-fitted) quad for distant trees - to keep the vertex count under control if you have lots of distant trees, and a a more complex mesh for a much tighter fit on tree billboards that are larger on-screen. Both would look visually identical.

Managing the batching of the LODs would become trickier, though.

Definitely crop your imposters as much as possible (i.e. not a quad, but a shape that actually fits your tree silhouette). Even for low-ish end devices, 200k tris is nothing and reducing the amount of transparent pixels should be your #1 priority on mobile. If you find yourself gpu-limited, I would recommend that you go as far as drawing the central part of the tree as opaque geometry even if it means doubling the amount of draw calls. I don’t know the current state of Unity’s metal and vulkan implementation but draw calls (even with terrible batching) should be a lot cheaper on these APIs.

Thanks for the information, very interesting, I didn’t realize that at all, I thought 100k tris was still high even for newer mobiles. I will do a bunch of testing and post my findings.

Hmm interesting idea, I think a middle ground between tight fit & low vertex count might be best.

3 years ago, the rule of thumb for mobile VR was stay under 100k tris on the Galaxy S6.

200k tris on modern phones shouldn’t a problem, especially if you’re not worried about keeping 60 fps. However fill rate (pixel count, overdraw, etc) continues to be a limitation, especially at the sometimes insane resolutions of modern phone displays.

2 Likes

Thanks for the info, I suppose I am a bit outdated then! We are actually targeting 60 fps, but capping at 30 to combat heating, etc. We plan to enable 60 FPS down the road for very high end devices.

This is now the shape of our billboards, any devices with resolutions over x width/height, we take down the res to 70% of native.
4087549--356788--upload_2019-1-10_15-48-5.png

Hi everyone! Another similar question which I assume the answer will be yes, I have a very large water plane, the size of my terrain (2048x@2048) which is using a custom shader with pre baked transparency for the shoreline. This water plane goes under the map as well where you cannot see it. Would it be good practice to remove the mesh unity the map, as well as cutout the transparent part of the water mesh, and use 2 separate materials, 1 material for the distant water without transparency, and 1 material with transparency but only for the shore. Check the screenshots, I explain this.

Of course optimizing everything ideal, just curious if this is “worth it” as it would be a bit of extra work. Would it not a big issue because the water only really overlaps with the terrain? It is not like trees which constantly overlap each other, or… would it still be very costly for performance having all those extra transparent pixels still?

Generally not drawing where not necessary is probably the fastest option in most cases. However, many performance related things are project and hardware specific.

For example, if the rendering order of the water plane comes before the terrain, then it can cause more work as if it would render after the terrain. If it renders after the terrain, many pixels should be rejected early, but this isn’t free too. How much of a cost depends on the hardware.

I would fire up a profiler and test both cases, it seems like it’s a quick change. Once you done that a few times, you also get a better idea of it how much what stuff performance affects on what platform/device.

On mobile, you can expect it to be free with each pixel being shaded exactly once by opaque geometry, regardless of how many layers you have because the triangles get sorted by depth and only the nearest one makes it to the fragment shader stage. Although alpha-tested geometry is either completely opaque or transparent, there is no way to know which bits will be opaque before reaching the fragment shader so you can’t get any of the benefits of that geometry sort.
That being said, depth rejection can be useful to limit the overdraw of apha-tested geometry, so it is generally best to render front to back if possible, and as late as possible in the render.

2 Likes

Thanks for all the great help on this thread guys! Really cleared things up.