Update for Frame Timing Manager

FrameTimingManager

The FrameTimingManager enables you to capture and access frame timing data for multiple frames. Frame timing data includes timestamps for different phases of the frame and the duration of work done on the main and render threads. You can use this information to make adjustments to your application where performance is below your target level.

The FrameTimingManager isn’t enabled by default. To enable this feature, go to Edit > Project Settings > Player and enable the Frame Timing Stats checkbox.

Note: Enabling Frame Timing Manager has a noticeable performance impact.
Note that FrameTimingManager is a part of Dynamic Resolution feature and is required to be enabled for it to work.

What’s changed

  • Added support for DirectX 11
  • GPU time measurements now work in Editor too for most platforms
  • Added Main Thread Central Processing Unit (CPU) Frame time
  • Added Main Thread CPU Present Wait time
  • Added Render Thread CPU Frame time
  • Added First Submit timestamp
  • Added Frame Start timestamp
  • Exposed “CPU Total Frame Time” profiler counter
  • Exposed “CPU Main Thread Frame Time” profiler counter
  • Exposed “CPU Render Thread Frame Time” profiler counter
  • Exposed “Graphics Processing Unit (GPU) Frame Time” profiler counter

More details on each added field are provided below in “Measurements” section.

Measurements
The FrameTimingManager measures a set of important time metrics of a frame, including the following:

  • cpuFrameTime - the total CPU frame time calculated as the time between the ends of two frames, which includes all waiting time and overheads, in ms.
  • cpuMainThreadFrameTime - the total time between the start of the frame and the time when the Main Thread finished the job, in ms.
  • cpuRenderThreadFrameTime - The time between the start of the work on the Render Thread and when the Present() function was called, in ms.
  • cpuMainThreadPresentWaitTime - The CPU time spent in waiting for Present() during the last frame, in ms
  • gpuFrameTime - The GPU time for a given frame, in ms.
  • frameStartTimestamp - The CPU clock time when the frame was started.
  • firstSubmitTimestamp - The CPU clock time when the first job was submitted to the GPU.
  • cpuTimePresentCalled - The CPU clock time at the point Present() was called for the current frame.
  • cpuTimeFrameComplete - The CPU clock time at the point when the GPU finished rendering the frame and interrupted the CPU.

The following diagram explains how each measured time and timestamp maps to a frame.

Important notes:

  • The FrameTimingManager produces results with a fixed delay of four frames because GPU timing results aren’t available immediately.
  • Frame Timing Manager doesn’t guarantee that the GPU has time available. The GPU might fail to return results in time, or might fail to return any results at all. In these cases, the gpuFrameTime is reported as zero.
  • On platforms that don’t allow GPU timestamping, Unity computes the FrameCompleteTime value rather than measure it. Unity computes FrameCompleteTime as FirstSubmitTimestamp + GPU Time. If the GPU fails to provide GPU time, FrameCompleteTime is automatically set to be equal to Present Timestamp.
  • On GPUs which use tile-based deferred rendering architecture (such as mobile platforms), results are less precise because GPU execution is deferred and the execution of rendering phases might be done separately. The FrameTimingManager can only measure overall duration.

Profiler Counters
Instead of the FrameTimingManager C# API, you can read FrameTimingManager values using the ProfilerRecorder API. The benefit of the ProfilerRecorder API is that the FrameTimingManager measurements are only taken when you attach a recorder to the counter and you have control over potential overhead.

using Unity.Profiling;
using UnityEngine;

public class ExampleScript : MonoBehaviour
{
    string statsText;
    ProfilerRecorder mainThreadTimeRecorder;

    void OnEnable()
    {
        mainThreadTimeRecorder = ProfilerRecorder.StartNew(ProfilerCategory.Rendering, "CPU Main Thread Frame Time");
    }

    void OnDisable()
    {
        mainThreadTimeRecorder.Dispose();
    }

    void Update()
    {
        var frameTime = mainThreadTimeRecorder.LastValue;
        // Your code logic here
    }
}

Setup
FrameTimingManager is available for Development and Release Unity players. In the Development player, the FrameTimingManager is always enabled.

In a Release player, you must enable the FrameTimingManager in the Player Settings. To do this, go to Edit > Project Settings > Player and enable the Frame Timing Stats checkbox.

Platform support

Metal
In some cases, under heavy load or GPU pipeline saturation, reported GPU Time might be bigger than the reported frame time while using the Metal API. For example, reported GPU time might be 40ms with stable 50fps with which you would expect GPU time can’t be higher than 20ms.

Why does it happen? The Metal API allows us to measure time at the beginning and end of command buffer execution, which in Unity case near matches frame boundaries. Since tile-based deferred rendering architecture GPUs execute rendering in phases (jobs) instead of doing it immediately, there might be a gap between the execution of different phases depending on GPU resource availability. For example, if the GPU is under a high load, there might be a gap when a job is passed from the Vertex queue to the Fragment queue inside the GPU. That results in the situation when jobs activity time (which defines the frame rate) and the total measured wall clock time are significantly different.

17 Likes

sold!

love the amount of low level detail here but, since you know the internals so well, what i’d like to see in the future is pre-chewed metrics in term of pressure. and ideally the total of these numbers = 1
.cpuPressure
.bandwidthPressure
.gpuFillratePressure
.gpuGeometryPressure

no point in scaling down the frame buffer if .gpuGeometryPressure is higher than .gpuFillratePressure
no point in throttling math heavy stuff if .bandwidthPressure is higher than .cpuPressure

6 Likes
  • “Add FrameTimingManager support to the Advanced FPS Counter” issue AFPS-62 created *
2 Likes

Hey @antonk-unity , thanks for bringing such a powerful tooling to our hands!
I’m having a few questions about it.

Could you please share plans on releasing those, which Unity versions are going to get them, as I’m not seeing 'em in the current public alpha yet? ^^

Another question is those new counters’ availability in the release builds: do we need to have Frame Timing Stats option enabled + active Camera with enabled Dynamic Resolution in scene in order to read those counters in release players?

And last one: is “CPU Main Thread Frame Time” or “CPU Total Frame Time” going to replace existing “Main Thread” ProfilerRecorder in the Internal Category, or they are calculated differently?

1 Like

Yes, unfortunately, it hasn’t made it to 2022.1.0a13 for some reason.
Any public alpha after that should have it available.

Enabling “Frame Timing Stats option” is enough. It’s recommended to use Frame Timing for control logic of Dynamic Resolution but both features can be used independently.

“Main Thread” is a marker and you’re free to continue using it.
The downside of “Main Thread” marker is that it might include gfx present wait time (if gfx sync point is set to AfterScriptUpdate) and editor overheads if measured in the Editor. FrameTimingManager MT time tries to exclude most of sync waits to get a time closer to actual execution time.

Slightly unrelated to Frame Timing Manager, just FYI. If you’re using ProfilerRecorder with “Render Thread” marker, we might deprecate & remove it. It was kind of faked and produces wrong results sometimes. More in the ticket - https://issuetracker.unity3d.com/product/unity/issues/guid/1339387/
In this case, using FrameTimingManager is advised.

2 Likes

Thank you for the detailed reply, @antonk-unity !

That’s great to know!
Do you have any plans on back porting it to the 2021 LTS by chance?

Thanks, that’s what I hoped to hear ^^

Got it, now I see the difference.

Regarding “Render Thread” marker - it’s good to know, I’ve seen it in the dev builds only, and the new CPU Render Thread counter looks much more interesting to me as it allows gathering data in release builds.

One more possibly confusing marker I’ve seen in Release builds is Camera.Render from the Render Category.
It’s available in Release builds starting from the 2021.1.0f1, and now I wonder if it’s something we could use as a fallback instead of Render Thread marker or instead of upcoming “CPU Render Thread Frame Time” counter if user didn’t enable Frame Timing Stats option.

That’s very unlikely. Our release managers view on it is that we backport bugfixes only, as any new feature is a potential instability that we don’t want to introduce in a stable build.

Camera.Render is a tricky marker as it doesn’t measure anything on URP/HDRP. URP/HDRP have their own markers for cameras & render loop. So I don’t think you can rely on it much.

Currently, FrameTimingManager allows you to read data without enabling it in Player settings. If you attach recorder “CPU Render Thread Frame Time” - that will force-enable FrameTimingManager. Although, that’s the part I have mixed feelings about. On one hand, it gives you better control over the feature state, on the other, it means it can be silently enabled by someone and you’ll be paying the overhead without realizing it. Would be interesting to hear community feedback on it.

1 Like

Totally understandable, but as plugin dev I’m sad about this %)

Got it, thanks!

I’m glad we can enable it at Runtime even if it was disabled in Player Settings, and I’d prefer to keep it so, as this reduces friction and allows straightforward usage scenarios.

Although, to make it more clear for everyone how and when it’s used, I’d be glad to discuss these points:

  1. Explicit API to activate FrameTimingManager from the code (API just duplicates setting in Player Settings)
  2. Prevent FrameTimingManager from working if it wasn’t explicitly activated either in Player Settings or through the API (i.e. you’ll not get a valid recorder or will always get zero values for “CPU Render Thread Frame Time” and other FrameTimingManager counters unless you activate it through the Player Settings or API call).
  3. Add log on FrameTimingManager activation (both when build starts with activated toggle in Player Settings and when explicitly activating it through the API) explaining this system is now active and do introduce overhead.
  4. Keep all FrameTimingManager counters always visible in Sampler.GetNames() and ProfilerRecorderHandle.GetAvailable() regardless activation state.

Two tiny questions about this:

  • will it require passing a ProfilerRecorderOptions.GpuRecorder option when starting or creating the ProfilerRecorder for this counter?

  • will it have a MarkerFlags.SampleGpu set in the ProfilerRecorderDescription.flags?

Cheers!

No for both. It isn’t GPU recorder, these are reported by FTM itself.
There is “FrameTime.GPU” marker which is GPU recorder and used to measure GPU time on some platforms. However, it’s better not to subscribe to it as not all platforms use it. For performance reasons, some platforms have custom implementations to exploit native extensions where available - for example, we use kEGL_ANDROID_get_frame_timestamps because it’s much faster than timer queries used by GPU recorders.

Thanks, that’s clear now.

Now I see how FrameTimingManager is going to make our lives easier =)
Thanks for the warn about FrameTime.GPU, seems to be something new in yet unreleased alpha / beta, as I’m not finding it in 2021.2.2 or 2022.1.0a13 among other available markers (even after 5+ frame delay).

From how I feel the concept, it seems ProfilerCounters are more stable and user-friendly rather than ProfilerMarkers and markers recording is somewhat more risky, kind of “low-level, only when you really know what you’re doing”.

It’s here!

2022.1.0a15:

  • Profiler: FrameTimingManager platform reach and frame timing information expanded.
1 Like

Hey @antonk-unity ,

I’ve just gave a spin to the FrameTimingManager update and here is my feedback so far:

  1. Yay, new juicy counters are available now in Editor, Dev and Release builds!

7680514--959905--2021.11.22.gif

  1. New counters work in release builds on my hardware, and I like the fact FTM initializes at runtime even with Frame Timing Stats option disabled, hope this “on-demand” initialization will stay or will be replaced with explicit API call to keep it available!

  2. While tackling a bit with existing FrameTimingManager C# APIs I noticed it does capture all possible stats and there is no way to capture only few of them. Hence, I wonder if recording only few FTM counters using ProfilerRecorders still captures all possible stats under the hood, or it does captures only those which are recorded?

  3. One more kind of confusing part: FTM C# APIs do not allow figuring out how many frames can be captured in the given environment.
    From the docs for GetLatestTimings() numFrames argument: “This should be equal to or less than the maximum FrameTimings the platform can capture.” but how to figure out how much platform can capture?
    Without knowing maximum allowed frames to capture, it seems one of the common scenario would be just passing 1-element array to get the latest frame, so it could be worth adding GetLatestTiming() API to get only 1 last timing capture.

  4. I’m seeing strangely high numbers when trying to record “CPU Total Frame Time” and “CPU Main Thread Frame Time” with ProfilerRecorderOptions.CollectOnlyOnCurrentThread option, like 150ms instead of usual 4-5. That seemed strange for me as I expected values from CollectOnlyOnCurrentThread recorder to be lower.

  5. Just a minor possible confusion I’ve spotted: in my test scene, “CPU Main Thread Frame Time” counter and “CPU Render Thread Frame Time” do not match the Game View Stats panel “CPU main” and “render thread” timings and even showing kind of opposite =D

7680514--960031--upload_2021-11-23_15-29-55.png

I’m guessing that’s because those are calculated way too differently, but similar wording used for counters and stats panel could confuse at the first glance.

2 Likes

Currently, all of them. We’ll separate GPU and CPU counters activation in the future as GPU is the most expensive part.

The original idea of C# API is that you pass N-sized array and the function returns you how many samples you can actually get. This assumes if you want to do some kind of filtering you know how many results you want to handle and API tells you how many it can actually return :slight_smile:

Looks like a bug. Could you please report it as a ticket with the example of code that causes this?

FTM counters return results with 4-frame delay to accommodate GPU results delay and provide CPU and GPU timing for the same frame. Frame stats shows immediate results. However, in the future, we plan to switch everything to use the new FTM.

Thanks for making this clear, looking forward to any news on counters activation separation!

Thank you for the explanation, but it still sounds a bit clunky…

Following your comments, to capture averages from i.e. 100 frames I should:

  1. Allocate array of the desired size (100) and pass it along with desired numFrames (100, again).
  2. Take a look at how much array slots were actually filled and cap initial 100 value if necessary.
  3. Resize initial array if 100 was capped to the lower value.

i.e.:

FrameTiming[] reusableFramesData; // class field

void InitFramesData()
{
    var desiredFrames = 100;
    reusableFramesData = new FrameTiming[desiredFrames];
    FrameTimingManager.GetLatestTimings(desiredFrames, reusableFramesData);
  
    var actualFrames = GetFilledItemsCount(reusableFramesData);
    if (actualFrames < desiredFrames)
        Array.Resize(ref reusableFramesData, actualFrames);
  
    // can finally use / reuse reusableFramesData
}

What would be great to see is an API which just could return numFrames for the current platform, so this could be reduced to:

FrameTiming[] reusableFramesData; // class field

void InitFramesData()
{
    var actualFrames = FrameTimingManager.GetMaxNumFrames();
    reusableFramesData = new FrameTiming[actualFrames];
    FrameTimingManager.GetLatestTimings(reusableFramesData); // overload with 1 argument, uses passed array size
  
    // can finally use / reuse reusableFramesData
}

Please let me know if I misunderstood something here %)

Sorry that was my bad as I passed ProfilerRecorderOptions.CollectOnlyOnCurrentThread instead of
ProfilerRecorderOptions.CollectOnlyOnCurrentThread | ProfilerRecorderOptions.Default :sweat_smile:
So it works just fine now!

Understood, but that opposite difference from the screenshot is constant, i.e. I always see higher value in FTM’s “CPU Render Thread Frame Time” counter comparing to “render thread” from the stats’ panel while running my test scene.

That’s great to hear, it will eliminate any possible confusion.

Hi,
I am trying to take timestamps of frames being display and I thought about using FrameTiming.cpuTimeFrameComplete, it works fine on windows but, on oculus quest (1 and 2) it returns 0. I would like to know if it is a problem with all oculus headsets or in my code and if it will be fixed?

  • The build for the oculus is an android build.

The code below attached to the main camera.

using System.Collections;
using System.Collections.Generic;
using TMPro;
using UnityEngine;
using System.Text;

public class Timing : MonoBehaviour
{
    public TMP_Text screenText;

    FrameTiming[] frameTimings = new FrameTiming[1];



    uint m_frameCount = 0;

    const uint kNumFrameTimings = 2;



    // Update is called once per frame
    void Update()
    {
        ++m_frameCount;
        if (m_frameCount <= kNumFrameTimings)
        {
            return;
        }
        FrameTimingManager.CaptureFrameTimings();
        uint res = FrameTimingManager.GetLatestTimings(1, frameTimings);
        if (res < 1)
        {
            UnityEngine.Debug.LogErrorFormat("Skipping frame {0}, didn't get enough frame timings.",
                m_frameCount);
            screenText.text = string.Format("error frame {0}",
            m_frameCount);

            return;
        }


        screenText.text = string.Format("cpu frame time: {0}\ncpu time frame complete: {1}\ncpu time present called: {2}\ngpu frame time: {3}\nheight scale: {4}\nsync interval: {5} \nwidth scale: {6}\n{7}\n{8}\n{9}\n{10}\n{11}",
            frameTimings[0].cpuFrameTime,
            frameTimings[0].cpuTimeFrameComplete,
            frameTimings[0].cpuTimePresentCalled,
            frameTimings[0].gpuFrameTime,
            frameTimings[0].heightScale,
            frameTimings[0].syncInterval,
            frameTimings[0].widthScale,
            frameTimings[0].cpuMainThreadFrameTime,
            frameTimings[0].cpuRenderThreadFrameTime,
            frameTimings[0].cpuMainThreadPresentWaitTime,
            frameTimings[0].frameStartTimestamp,
            frameTimings[0].firstSubmitTimestamp);
    }
}

In addition, is there a way to access the CPU clock time to compare the FrameTiming.cpuTimeFrameComplete with the current time?

The code looks fine, could you please file a ticket? We’ll take a look
Please include information about the exact model, and firmware of the Oculus device. Also, if you could include logcat logs captured while running the app, that might help a lot.

I would rather store a number of frames captured separately instead of resizing the array.
The number of frames returned is not the maximum number, but currently available number of frames. So it might be different the next time you capture. It might vary for different reasons - for example, something has failed during the capturing process (whenever FTM relies on platform-specific API it might happen)

Thanks, I’ve noted to take a look. Editor’s gfx part is a bit wonky :slight_smile:

1 Like

Ah, that’s clear now, thanks!

Still, if it’s a low-hanging fruit under the hood (and already calculated or stored somewhere), it would be great to somehow figure out “the maximum FrameTimings the platform can capture” to allocate for FrameTiming[ ] not more than the platform can actually capture ^^

I am a simple man. I want to know the time taken to do work on the cpu (main + render thread) and the time spent by the gpu to present the frame. The end goal is to have a normalized quantity of whether the cpu or the gpu is performing better. How do I get this?

It has been said time and again, but the docs don’t actually provide a good example of how to scale dynamic buffers with FrameTimingManager.

3 Likes