Hi All,
I’m looking at the practicalities of developing a low latency Unity application for a time critical purpose. My normal approach would be to implement something with C++/OpenGL which provides a lot of control, but am now considering whether I can do something equivalent using Unity.
In C++/OpenGL, to avoid tearing, VSync must be on and to avoid frames building up between the CPU and GPU, I would use glFinish() immediately after the glutSwapBuffers() command. This provides a low latency application.
So, how would I do this in Unity? I note that there is a GL.Flush() command, but this won’t do the same as glFinish(). Therefore, my question is, how I suspend the rendering of frames to avoid graphics command build up between the CPU and GPU in Unity? Is there a function that will block the call thread until the current frame is rendered?
I appreciate that I can set maxQueuedFrames to 1 but this is not sufficient.
Yeah, that has always been my point of view too. Unity is a brilliant tool for lots of stuff but I am minded to think that when it comes to this type of application, it doesn’t really have the right hooks or right level of control. Someone please tell me I’m wrong though!
Please define what you mean by „low latency“.
I have never heard of that graphics command build up that you mention. The engine and its user generate these commands for a single frame only, then they are executed and the next frame starts with a blank slate. Unless of course command generation and rendering can be on separate threads. But even so, the GPU or the user cannot simply at any one point decide „it‘s enough, render now!“ (exceptions may exist in the form of adaptive quality).
Like, if an app has to run at 120 or even 240 fps and thus has 1/120th or 1/240th of a second time to perform all tasks for any given frame then that is absolutely possible with Unity and most other game engines for that matter.It really only boils down to optimization and to make informed decisions as to how much time can be spent on what parts of the app. Generally speaking: the fewer polys you push the more of them you can render.
If you mean player input latency you should look into New Input Manager to see if they mention anything regarding latency and what you can do, if anything, to improve it or ensure it‘s as fast as possible.
Hi. Thanks for your message. You are right that ‘latency’ can be defined in many ways and sometimes is difficult to quantify/measure. My application takes data from a serial input device and uses this to compose a view within a Unity app. The speed with which the software is able to present an updated view in response to data from this input serial device is critical.
In my experience, most rendering systems consist of a CPU that issues graphics commands into a buffer to be read by a GPU that actions the commands to create a video frame which is consequently displayed on your chosen display/monitor. One serious source of latency is the undesirable build up of commands in the queue between the CPU and GPU. This typically happens when the CPU can run faster than the GPU can execute the commands. In situations where you need vertical sync enabled (let’s say for a 60Hz display device), the rate of the GPU is limited (to 60Hz) and therefore it can be very easy for this back log of commands to happen. As a result, the data used to generate the current displayed frame can be several frames old and this exhibits itself as a latency in the application.
The best way to fix this is to get the GPU to block the CPU until the GPU complete execution of the current frame commands. This way, you can guarantee that the data used to create the current view is relatively new. Using a native application with say C++ and OpenGL, you can use glFinish() to achieve this. So, my question is: how do you achieve this with Unity?
Thanks for the clarification. Realtime hardware monitoring essentially.
From your description that does sound an awful lot like normal operations for a game engine however. Or the other way around: only a flawed architecture (or one not built for the purpose) would allow this buildup of gpu commands to happen.
Or maybe you mean the issue where the app is supposed to render 60 fps but the GPU can‘t render everything in that time and therefore the framerate drops to 30 or 20? But that can only be prevented by ensuring the code is well optimized and the GPU never needs to do anything close to its limits for any given frame.
Naturally - the CPU wouldn‘t simply continue to create or provide more data/commands for the GPU while it is still rendering. At least this never happened in any realtime 3d game engine I‘ve worked with. The CPU doesn‘t simply start creating new commands with every Vsync completion regardless of whether the GPU has finished drawing or not. It may have these commands at the ready for the next time the GPU is ready to receive commands (said queuing) but it should never add more commands for frames even further ahead to the queue while its waiting for the GPU. Because that would either kill the framerate or cause graphics glitches because you can‘t just draw he same object in multiple locations in the same frame.
So I‘m really not sure if maybe I‘m missing the point or forget to consider something due to lack of this kind of realtime hardware programming. But it really sounds like a non issue.
Except … maybe you haven‘t considered that Unity is practically main thread alone (setting DOTS aside for the moment). That means if you receive serial input data on a second thread at 500 Hz you couldn‘t just change the position of a gameobject. You can prepare that data and have it ready for the gameview update script to process it and position objects. If the Update loop runs at 120 Hz you would have the most current data presented visually but you‘d also have skipped over some data updates as the data comes in faster than the rendering occurs. Which isn‘t an issue really. You could get a 240 Hz monitor or disable vsync and you may even get to 500 Hz of visual updates in theory except even on the 240 Hz monitor it will only show about half of these updates because there is simply no visual refresh occuring by the monitor for roughly every other serial data input update (500 vs 240 Hz).
You do have options to multihread unity including object positions using DOTS (Entities).
Thanks for the post. However, I believe there are a few assumptions in your response which I believe not to be true.
Yes, I believe it would. That is exactly the way it works. And it would do so until the CPU-GPU buffer was full and then block feeding the buffer. The CPU and GPU run in parallel exchanging data - they are not synchronized.
This is why there is a QualitySettings.maxQueuedFrames parameter to help control the magnitude of it. Graphics drivers can queue up frames to be rendered. When CPU has much less work to do than the graphics card, is it possible for this queue to become quite large. In those cases, user’s input will “lag behind” what is on the screen.
Yes, it would. That’s exactly what would happen, until the buffer was full.
Obviously, if anyone disagrees with this view, please let me know.
Generally, if the GPU is rendering or displaying frame #100 then the CPU is already working on frame #101. Queue of 1. With 2 it would be able to work on frame #102, increasing the input lag like you said. But a queue length of 1 is always assumed, the CPU is always working ahead of time.
I think what you’re looking for is essentially a queue length of 0 or a synchronized CPU=>GPU pipeline? Is that it?
Not sure if this is possible even with custom engines respectively I wonder if this really decreases the latency with VSync enabled.
Let’s say a Unity app runs at a stable 240 Hz. The input lag will be 1/240th at most.
If a custom engine does the same, but the CPU processed the data in 1/500th of a second and instantly passes the commands to the GPU which itself finishes super fast in 1/10000th of a second, the GPU would still have to wait for the Vsync before it can flip framebuffers and present the newly rendered frame. Therefore the input lag would still be 1/240th for the visual updates (the processing however happens faster, and that may be meaningful in ways I cannot comprehend).
That is exactly what the mentioned API, QualitySettings.maxQueuedFrames, does. Setting that to 1 should achieve what you mention in the original post: it will wait for the frame to be fully displayed before starting the next one (the wait will before sampling input). Can you explain why that is not sufficient?
Since QualitySettings.maxQueuedFrames is only implemented on DX11 and DX12, you can also wait on the result of AsyncGPUReadback which was submitted in the last frame to achieve the same thing on other platforms.
By the way, here is a very good explanation about system latency:
Hi! Did it work in the end?
I also want to ensure I can get the result of current Tick, but I am wondering if QualitySettings.maxQueuedFrames just wait for displaying the newly rendered but not for executing commands of this Tick:(
Thank you!
Hi. I wanted to find something that can stall the CPU until GPU finishes rendering the results of commands of this Tick. Now I use readpixels on the render target of my camera to achieve this (I don’t know if this really works though Thank you for the information and I will check!