SteamVR running start in GPUView is 2ms, but still not enough for 1.5ms rendering in Profiler?

Intel Core i5-7500, GTX 1060, Unity 5.5.3f1, SteamVR Built May 18, 2017, WIn10 Pro 1703

My project now clearly suffers gpu bubble even with the running start:

So, although SteamVR claims 3ms before vsync for WaitGetPoses to be ready, in fact Unity begin rendering only 2ms before vsync. I don’t know if it’s Valve’s fault that not return poses on time, or Unity not start rendering on time, the fact is my running start time drops from 3ms to 2ms.

Whatever reason it is, more strange thing is that, in Unity Profiler, the rendering only takes ~1.5ms:


Before submitting render depth texture command, it took only approx. 1.2ms, so theoretically 2ms should be enough, but from GPUView’s stats, it’s not.

Any VR specialist could help me figure out what’s going on? Thanks.

And one more question: if CPU wait for running start just for the latest headset poses, why cloth and skinning need to wait for it too? I think they’re both camera irrelevant, so I guess they should be moved before the long wait not to take up precious time of running start?

After checking d3d marks of GPUView in further tests, the reason that GPU keeps idle after VSync is that, CPU is unable to submit rendering commands in 2ms of running start. More specific, the rendering thread by default flushes commands to GPU only when the whole work of frame is done. If too many commands accumulated before present, Unity seems to decide to flush all current commands and then continue the rest of rendering work. Therefore I tried invoke GL.Flush() manually in OnPreCull, OnPreRender, OnPostRender, it indeed has effect reducing the GPU idle time, but the timing is still not good enough. What I need now is interfaces to flush commands at more meaningful timing: after depth texture rendering, after shadowmap rendering , after opaque rendering and after alpha rendering. Pity that although command buffer has a variety of good timings to choose, there’s no way to insert a GL.Flush to the buffer. Also, if Unity can expose a parameter to let me set the interval of GL.Flush, it will help a lot when CPU spends very long time on one specific rendering work, i.e. opaque rendering often takes 1~2ms, flushing at 0.5ms, 1ms, 1.5ms is a good strategy in VR.