I need some help figuring out a rendering issue that presents only on iOS and not PC or Mac builds. I’m going to work backwards, showing you the weird profiler first, then go into more detail about what I’m doing.
The following image is the profile of my game running on a new iPad with an A9.
Unity: 2017.2
API: Metal
Rendering path: Deferred
I disabled multi-threaded rendering because it was hiding information from the profiler.
Vsync is turned off.
As you can see, that Graphics.Blit operation is killing my frame rate. Unfortunately, I don’t know where that call is. I never call Graphics.Blit in my code so I assume it must be one of those Hidden blits performed during rendering. Furthermore, I don’t do anything in OnRenderImage and I’m not using any image effects. I do have two command buffers that perform actions in AfterImageEffects, but the profiler has those recorded as taking 0.02ms.
I have an effect that draws a window, kind of like portal. It uses three cameras (which may correspond to those three Camera.ImageEffects in the profile). These cameras draw in the order I describe them. The first draws the scene into a RenderTexture (this scene is a single quad). The second does not use the render texture. It creates a z stencil (a cutout based on a shape that uses min z to prevent unwanted fragments from rendering), and then renders a scene (a simple character standing on a cube). The second camera has a command buffer in AfterImageEffects that uses the shape (the quad from the first scene) to composite a portion of that rendered scene into the RenderTexture used by the first camera. The third camera is a dummy that renders nothing. It has a command buffer in AfterImageEffects that draws the aforementioned RenderTexture to the framebuffer.
I tried to keep this first post brief so that you’re not confronted with a huge wall of text. I will respond to requests for more detail as quickly as I can.
I created an example project with an extremely simplified implementation of the effect. You’ll need to install it on an iOS device running Metal to see it crap out. Otherwise it looks like it’s working fine. I need to know why it’s so slow on iOS. I’m a bit new to Unity rendering, but I’m learning.
There’s only one scene and two script files.
There is another question that I’d like an answer to: I want to know why I need to clear my accumulation buffer manually at the start of each frame. I don’t understand why the camera it’s attached to doesn’t clear it at the beginning of the render.
I made a small change to the above project by changing the “DrawToScreen Camera” to use Forward. Reminder, the clear flag was “Don’t Clear” and culling mask was “Nothing”. Simply switching it from Use Graphics Settings (Deferred) to Forward improved the draw speed by about 50ms. So simply using deferred seems to incur a massive penalty, even if drawing nothing.
Another thread pointed me to a table on this page which seems to contradict the table at the bottom of this page (as of 2017.2). Deferred technically works, but maybe the overhead cost of doing a single deferred pass is too high? Is there any way to reduce this overhead, or only pay it once?
You can connect XCode Instruments profiler to your iOS build and see where this Graphics.Blit call comes from. It will most likely show up as the most expensive thing so you’ll be able to find it among other calls.
Thanks for the suggestion, @Kumo-Kairo ! I should have mentioned that I did capture a GPU frame in XCode and there was nothing in there to indicate what the bottleneck might be. Of the three shaders I use, one is ~1.5ms, ~2.5ms, ~3.5ms - certainly nothing that would explain the 140ms indicated by the Unity profiler. The debugger does, however, point to several of the Unity internal calls as having issues but I’m new to this debugger and haven’t had time to dig in and see what they are.
I’ve run the XCode profiler again and have attached a screenshot
Can you try measuring CPU part? IL2CPP target allows very deep CPU profiling, saved us a lot of time and trouble. I believe it’s called “Time profiler” or something like that, there’s a stopwatch icon on that instrument. Maybe something will show up there.
I’ll take a look for the sake of curiosity and report back. As you can see from the gpu times, the setup cost for deferred is so high on even high-end mobile that I will need to switch to forward. Part of me suspected that I’d need to, but I’m still a bit disappointed.