Actually, for what you show in your image, I think unity’s terrain system would render the scene at or above the refresh rate of most monitors and “be just fine”.
The problem
I think the problem people have with unity’s terrain system is that when they try to set the far clipping plane of the camera to 8k meters and then have 125,000 high resolution trees, 50,000 high resolution bushes, 35,000 high resolution rocks, 65,000 high resolution flowers, 1 million blades of grass, all with realtime lighting and shadows in a huge open world environment they become disappointed.
Also, this “Open World” problem is not a systemic problem with unity. (Think of all the places where you do not use a terrain)… Even though your solution treats it as such.
Your solution
You have essentially re-invented a scenegraph. I couldn’t find the exact data structure you are using, but a scenegraph is actually not the right solution to this problem. Evidenced by the fact that you are having to hack things like shadows.
A better solution would be to reframe the problem as a distance problem (using something like a kd-tree). In other words, “show me the closest 500 trees, show me the closest 100 bushes, show me the closest 200 clumps of grass”. That would scale to millions of yet to be instanced objects… and then your fancy pooling, culling, lod and imposter systems could go to work.
Again, don’t think of this as a generic solution for fixing unity. There is no problem with unity. This is a way to address the “Open World” problem discussed above.
Why am I posting this?
Software development (with other people) is about sharing ideas. You asked for my opinion about a piece of technology and I gave it. It is nothing personal and I think that you are a good person.
My particular experience with unity has led me to these conclusions.
Your mileage may very.
Thanks!

Brian