i am working on a native metal rendering & compute plugin, and i am having trouble with metal performance on iOS.
Even in an empty project, profiling with Instruments “Metal System Trace” shows that the fragment shader uses ~14ms/frame. Adding some rendering doesn’t change that - as long its not too much, it always uses its 14ms.
This would not be a problem, but when i run additional kernel shaders (= metals compute shader), it cannot be done in parallel to the fragment shader. So if my kernel shader takes 4ms, the frame already takes 18ms.
This trace is taken with 5.4 beta, but 5.3.4 shows the same pattern. Quality settings are on “fastest”, so no AntiAliasing is used, i also verified this by debugging the Xcode project.
When i run a custom iOS metal project, the system trace shows no such behaviour. For testing, i am rendering a small amount of particles.
Turns out that disabling depth and stencil solves the issue. Unfortunately this doesn’t work via “Player Settings->Resolution and Presentation->Disable Depth and Stencil”, which seems to be ignored by the Metal renderer. But fortunately its possible to prevent the creation of depth and stencil buffers in MetalHelper.mm, by deleting the contents of extern "C" void CreateSharedDepthbufferMTL(UnityDisplaySurfaceMTL* surface)
I guess just using metal encoders handling those buffers, although not actually used for rendering, poses this huge overhead. Without them fragment time is down to 2.5ms (iPad mini2) for an empty renderer, which seems to be the normal overhead for a render encoder.