Thanks for replies.
RC-1290 - Oh how I wish I had the freedom of DX11! However, we are shipping on iPhone 4 hardware, and we are squeezing every last drop of performance we can. Like many others, dynamic batching doesn’t really help. It’s like pushing down a lump in the carpet. With dynamic batching off, you get a CPU hit from the draw calls. Turning batching on means the driver does less work, but the same amount of CPU is just shifted over to the batching system. (In our experience, dynamic batching only helps when you have a handful of objects with way less vertices than the cutoff points set by Unity.)
Having said all that, we hoped to push off some rendering work to the GPU. NVidia layed out a clever way of geometry instancing without specific hardware support: You duplicate your mesh data, but insert an extra attribute per vertex that is an instance identifier. You render that entire mesh, and in your vertex shader you can use that instance identifier as an index into an array of matrices, or, in our case, simply a position vector. The end result is a single draw call, and no CPU time spent in the dynamic batching system. However, the whole thing breaks down if you can’t declare array uniforms in a vertex shader.
I was hoping that we could use bone weights as a halfway measure, but Unity does not give access to the bone weights and indices in the shaders. (Why, I have no idea, and it is really disappointing.)
I did look into doing vertex texture fetching, and wrote up a solution. In general, I don’t like locking GPU resources per frame and a 10,000 texture lookups in the vertex stage gives me pause, especially when arrays are a rather fundamental and officially supported data type in most other shader frameworks. They’re there for a purpose.
Since writing my original post, I discovered a means of actually setting a real array in the vertex shader. Because of this, the instancing technique works, and we went from 44 FPS to 52 FPS 10 minutes after I discovered this maneuver.
It doesn’t require a plugin or anything special. I don’t think this is a documented way of handling arrays, and I reverse engineered how to get it done. Before posting the solution publicly and promoting a potentially unintended means of accomplishing shader arrays, I’d like to hear a Unity person weigh in. Otherwise, I’ll post the solution later today. (Someone else probably discovered this too, or it’s published elsewhere deep in the internet that I failed to find from hours of googling.)
BTW: Here is a link to the aforementioned instancing technique: http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter03.html It’s under “Vertex Constants Instancing”