After about 2 weeks of trying to get everything working properly I’m out of ideas. Did anyone managed to use ST without extremely low performance? I disabled wind, smoothlod, I even modified the prefab(for the sake of testing I deleted everything but the billboard as most of the trees in my scene are billboards anyway- so I only have a parent with a lod group component, and a child with transform, billboard renderer, tree, and material component(even tried to place them as gameobject - so I got rid of the lodgroup component and the parent)). I’m out of ideas, my cpu usage is over the roof(22ms, rendering is about 2.7-3ms - without any kind of game logic, basically I have a camera and 11.2k billboards - they are batched together), I made a build, but didn’t noticed much performance improvement. I have no idea what’s going on, why is the billboard renderer so slow? Am I missing something?
We can’t help much without more info. Things that would help:
- The device you’re testing on. For in-Editor tests, the specs of the computer.
- A screenshot of the scene.
- Version of Unity.
- Any relevant options/settings you’ve tried.
@angrypenguin
Sure, sorry. I’m only testing on pc(fx6300, gtx660, 16gb). I tried to make a screenshot but nothing really can be seen(as I wrote in my original post, I made a scene just for testing the billboard performance, so there is nothing but 11k trees and a single camera pretty far away from them(far clipping plane set to 5000). I’m using 5.1, but I had the same issue with the earlier releases as well. Also may worth mentioning I tried it on another machine(fx6100, amd 4830, 4gb) and the performance was pretty much the same.
I have a similar problem with speedtrees eating the drawcalls count. I’m far from 11K trees but it seems to me that it is poorly batched. I don’t know if it makes a difference but I’m using deferred rendering path.
As far as I know what you might experience is just the limitation of dynamic batching(static batching not working on speedtree trees), my main concern is billboard rendering(I have a few hundred draw calls as well when I’m using speedtree properly(~not just the billboards) and the camera is close to them. In the old system I can easily make like 150k trees without any sort of performance issues(actually a lot more, but don’t see the point of filling every inch with trees), that’s why I’m surprised by speedtree’s billboard performance as they are batched properly(in the scene mentioned above statistics says 11 199 saved by batching - and that seems correct, when I first tried it with default settings(smoothlod on) I got way worse fps).
I was having major performance hits because I was using Water4. If you’re using Water4 ANYWHERE on your scene try unticking it and all my speedtree trees and stuff worked back at normal. Heck Water4 was giving my 8 Core CPU 200MS!
But it worked vice-versa - I could have speed-tree trees and water (disable the trees) - and it would work flawlessly, or I can disable the water and would work again flawlessly.
So I guess speedtree and water4 don’t mix I found out.
(Not sure if you’re using it, if so try not using it and see what happens).
Well 11K trees is a ridiculous amount, you need to stream them at a certain viewable distance and at least offer some sort of occlusion culling. Which is Umbra in Unity…#
Yes of course you can have a million trees “in a scene” but not in view…! Even in AAA games, you’d never have a tenth of that in view at any one time…
Did you try adjusting the LOD’s?
Deferred rendering can have an impact with Transparent / AC materials also…
No wonder…!
@N1warhead_1
Nothing in the scene, just trees. I guess Water4 has real time reflection, that might answer the mystery behind the performance troubles.
@
Well, I do wonder. In the old system I have this when I place 123k trees(please note that drawing distance for terrain trees still locked at 2000, and some of them are offscreen here, but still, that’s at least 10 more with 10 times less cpu usage - not to mention the 16 terrains bellow them)
And I have this when speedtree trees are placed as individual game objects(even worse when I place them on a terrain as then I have to add a LODGroup component to them which is of course needs pretty signifacant cpu power because all of them can be seen in this scene and as you said, 11k is a lot.
You can clearly see the difference, something must be going on with the new speedtree billboard system(the trees on their own perform well I think, but when I have a lot of them no matter what I do they use more resources then they should IMO).
Btw look at my “tree” setup, it’s just a billboard, nothing more.
I tried every rendering modes from legacy vertex lit to deffered, it gained like 2-3 fps, I wouldn’t call that a solution.
In my opinion we should be able to place trees like this without eating up all the resources. But we can’t, not even when all of them are just billboards, and that is a problem(unless I’m doing something completely wrong, in that case please tell me).
Some things about SpeedTree, firstly the poly / tris count is higher as they generally look much better. So that’s a massive factor, 11K X 5,500 without LODS is around 60 Million tris even at a low LOD (2.5K) it’s still 27 and a half million tris.
But of course not all the trees are going to be LOD3 or higher, they’ll be billboards at long distance. Now firstly you’re using a different shader and secondly you’ve not included the profiler output as to what’s causing the difference.
What types of batching are you using, have you used Umbra?
One thing I noticed about the LOD system is there’s a culling overhead applied for LOD transitions that can get a bit hairy when there’s a mass load geometry to transition. That could also factor in…
@
This is the builtin ST billboard shader(it’s the European White Birch Desktop from the Desktop Tree Package, I changed nothing but got rid of everything except billboard, so in this scene there is no lod going on at all as nothing have a lod group component).
I marked them as static, but as far as I know they can’t be static batched(and I didn’t saw any difference in performance either). If the statistics window is correct, then I have way more tris when using the old tree system (211k vs 44.5k) but still 9 times more fps, so I don’t think the problem is with the tris count - I might be wrong though, I used Unity mostly for 2D, so if I should look elsewhere please tell me.
Old tree system
vs speedtree
As said most of your issues aren’t down to speedtree, they seem down to culling and lighting so says your profiler…
I’d need to see deeper down to find out what…
Have you tried running umbra yet?
@
I have made the exact same test but replaced the billboards with cubes(transform, mesh filter, mesh renderer - disabled everything -, and the default material), now I have 17ms cpu and 6.5ms gpu usage - for me this clearly says something is wrong with the speedtree billboards, billboards should be a lot faster than anything else. Also if I mark the cubes as static they are even faster(cpu 14.5ms gpu 1.1ms) - actually this is quite amazing, I didn’t expected to see that.
My original post is about why speedtree billboards are so slow compared to the old tree billboards, or the particle system’s billboards, and what can I do about it. Occlusion culling in this case wouldn’t help as all of them can be seen(in fact it performs worse, also even the manual mentions you shouldn’t use OC in a forest scene because it would use way too much memory - at least they shouldn’t be occluders).
I made a package if you would like to take a look(8.2k billboard trees - the one comes free with Unity), I would be more than happy to hear your thoughts about it - please note I put it together in a hurry, so there might be some settings I forgot to set which could lead to obvious performance issues.
https://www.dropbox.com/s/87yzapsisvpxvqb/SpeedTree%20Billboard%20Test.unitypackage?dl=0
Part of the reason SpeedTree billboards are slower than the normal Unity tree billboards is the billboard fade feature, which has a real detrimental impact on performance… If you go to each individual tree’s settings you can set the “Fade Out Width” to 0, which should help some with the final performance of the billboards, especially since you have so many.
Of course, now they will simply pop-in and pop-out once they reach the visible threshold. You also may want to experiment with the “culled” position, raising it up to maybe 3 or 4%. By messing with the settings you can definitely eek out some extra performance over the defaults.
SpeedTree is broken on Unity, but i can recommend using them on Unreal4. Performance is much better and they also look a lot nicer.
hi there,
you may try to get rid of the “tree” component on your billboard and see if this speeds up rendering.
I made another test, about the same amount of trees, but instead of placing one at a time I changed the Conifer_Desktop in the modeler so now it places 10 trees. I left everything at default(so smoothlod is enabled), look at the result.
…but of course this way they can’t be batched, so we have changed one problem to another, and tris count in this scene can climb up to almost 3m(but this “solution” might be good for background stuff as billboards are the same triangle count wise, so they can be batched - even a lot better as this way a billboard can cover 10 trees). It’s also harder to work with, but the performance gain is extremely huge. Maybe the best workaround would be(for open world games) to use the cluster version for far away terrains, and the normal for the closer ones.
…but damn, why do we have to make something like this? I was so hyped for SpeedTree, the technology is simply amazing, I really hope they will give it more love in the future(but until then this solution looks okay for most of the time - unless you want to chop those trees, or placing them on )
@chingwa
In the scene above the billboards aren’t using the fade effect at all - unless I missed something essential, can you set the fade effect to billboards without using the lodgroup component? If so, where can I find those settings? I really did nothing in the past 2 weeks but tried every possible setting I could come up with, but didn’t found anything related to fading besides the lodgroup.
@larsbertram1
Thanks for the tip, I will!
Edit:
it made a pretty big difference! But still too slow for large landscapes unfortunatly(of course you won’t need such a dense forest, but a few km terrain can use this amount pretty quickly, and sometimes they can be(or at least should be) seen).
Here are the results:
With the Tree component
Without it:
Just to let you know, sometimes billboards with transparent alpha texture and bump are slower than making the actual geometry and adding a lit shader without bump and transparency. This is proved in the grass where the mesh is not so complex.