Splitting one big grass mesh into several small ones is definitely better idea. Umbra can occlude completely grass models hidden for example behind hills, buildings, etc. Considering shader itself the most important factor is platform. On PC grass always uses optimized version of shader. On Macs grass system uses optimized shader only for higher nVidias. On other GPUs on Mac alternative version of shader is used. Visual result is exactly the same, but that shader doesn’t benefit from many optimizations.
The major optimization available is far distance. For distant pixels (distance adjusted in shader properties, in my demos it’s about 20meters) shader skips most of its calculations. Only 1-3 (depending on settings) texture lookups are performed to determine pixel color and fragment program breaks. So - if you make one huge grass mesh on your 2000x2000 terrain, you won’t notice any performance hit (for optimized shader - PCs, nVidia Macs) that fragment program could introduce for distant areas of grass. Original terrain shader also has a few (4) tex lookups at least, so for distant pixels grass shader renders as fast as terrain shader. The only slowing down factor is then complexity of mesh (number of polys). I guess for huge terrain with many hills mesh that has to be build for grass would consists of several tens thousands triangles. Upper limit in the engine is 40000 vertices/polys. You can’t build grass mesh more complex (I put this limit because its safe for triangulator which could crash beyond). I believe that optimal number of polys per grass mesh is about 5000-10000 for larger areas. Example meadow scene has 8000tris of grass mesh, but could be much smaller if we adjust build settings compromising grass mesh smoothness on hills.
Again - splitting into small pieces will be better in terms of occlusion opportunities (grass is always static object, isn’t it ?). Don’t hesitate to split your grass into meshes of even a few polygons, because bottleneck is not vertex count here (unless you try to render >10000-20000 polys), but fragment program which is quite expensive.
LODs in mesh editor works on per distance basis. You can prepare up to 4 LODs (grass meshes) with decreasing number of polys and decide which mesh will be used on specified distance. This would work similar to Unity’s terrain which reduces number of polys on far areas. I put it into the system to give users the best opportunities available, but in fact, when you wisely use occlusion, you’ll probably don’t have to use as many LODs. First, as I said, performance bottleneck lays in fragment program, not in vertex program. Furthermore - using 4 LODs with for example 10000, 8000, 6000, 4000 tris will cause your player weight will increase considerably. In practice I believe that 2, max 3 LODs are enough.
Next optimization is that you can specify quite different shader (material) to be used for very far, specified distances at which user probably won’t see any difference between “Vertex Lit” shader and grass shader. So - if you split grass areas on your huge terrain into smaller pieces and specify that on very far distances it should use different, simple material, it will work as fast as possible.
The last optimization (that works on every platform) is that when you build grass mesh and cut several bare “holes” (for example paths) with grass coverage brush tool, then triangles that are whole cleared from grass are skipped at once - that is, instant clip function at the beginning of fragment program skips everything and that areas costs NOTHING.
As one may notice when preparing VolumeGrass I put much concern into performance issues. I made literally hundreds of tests and tweaks to push shader performance to the limits. Believe me or not - it’s not easy to optimize shader programs with complex structure and conditional instructions inside, because every single decision on how and where place some calculations might have dramatic impact on performance as, unlike CPUs, GPUs are designed to work parallel. Hopefully GPUs of todays low-mid range level like ATI HD 4650 (available at prices below $50) will be history in a year. That’s the time you’ll polish your game and then most of users could benefit from our nice looking grass. As an example I tested meadow scene on such ATI HD4650 and it worked easily 60+ fps (webplayer 800x600).
Final word about performance. With such expensive shader as VolumeGrass shader is, try to use forward rendering. If you put only one dynamic light, and rest is lightmapped, everything will be rendered in ONE pass. In deferred there are always TWO passes which has to be considered for grass shader… Of course if you want to have better shadows and many dynamic lights on scene, use deferred. In such case it’s worth using even if grass shader has to take it in two passes.
I hope this thorough post explain many eventual doubts.
ATB, Tom
P.S. thanks for nice words about my documenatation. Honestly, as I’m not native speaker I was anxious if Unity refuse my submission because of my English, but it turned out it’s not that bad…