It’d be nice if the new API record cpu per thread, gpu etc… ms
For running speed regression tests overnight and get more detail that avg fps.
Hi Laurent,
Currently, GPU Timings can only be recorder with the old Recorder via the gpuElapsedNanoseconds API that was added in 2020.1. Unfortunately, as explained over here too , we built yourselves into a bit of a corner with this old API which we’ll sadly have to abandon once ProfilerRecorder covers all Use Cases that it covered so far.
That means that: Yes, we are working towards supporting GPU timings to be recorded with ProfilerRecorder as well.
ProfilerRecorder already allows you to record ms timings from any thread so I’d like to know what exactly you’d like to do here.
By default, ProfilerRecorder will record data for a sample across all threads. You can limit it to only record the data for the current thread, i.e. the thread you created it on. That would mostly allow you to specifically record data on the Main Thread, a custom C# threads or a Job Thread (with admittedly little control over which Worker thread this would happen on because you can’t specify that for the Job that would start the Recorder), but not the Render Thread, unless you want to modify code in an SRP I guess, or any other Unity threads like the background and loading threads…
However, I have a feeling that, what you are interested in could possibly be gathered in a different way. How exactly would you like to Profile the different threads? Just the time they were busy for? What if there are more threads than cores (or even virtual, hyper threaded cores)? Are you more interested in how active the different cores were or how active the different threads were?
What is possible right now:
Main Thread:
You can Record “Main Thread”, “PlayerLoop”, “EditorLoop”…
Render Thread
You can Record “Render Thread” but it might be more meaningful to record some of the other root level samples such as the Render Main Camera samples (check the correct name depending on the render pipeline you are using in the Profiler). However, the Render Main Camera sample usually also occurs on the Main Thread so you’ll likely want to have 2 ProfilerRecorders recording that sample: one recording all threads, one recording the Main Thread only. Then you can use the measurements from the second one to calculate the
You can also potentially subtract (or track separately) “Gfx.Present”
Job Worker Threads:
Now besides starting a Recorder in a Job, you CAN record “Job.Worker 0” “Job.Worker 1” … but, really, what that is gonna give you is only how long each worker thread was around for until it got flipped to the next frame, roughly in sync with “Main Thread”, so really, this will be the frame time, give or take some if. E.g. if there was a job at the end of the frame, it might have lapsed more into the next frame. The same Worker thread would then just have less time in the following frame since it flipped over later.
You can use ProfilerRecorderHandle.GetAvailable() to get all of these names too and with that, check which threads are actually around and under which name.
Once you know how many threads there were in total, (through the Recorder handles or other means), you can record “Idle” across all threads, divide it by the thread count and thereby get the inverse: how busy the threads were on average.
Thanks Martin.
Now for the cliff notes
Can you point me to code sample, I want to display in the game (and record average to file)
GPU usage%
Main thread usage%
Each core usage%
Hi Laurent,
There is indeed not a single comprehensive sample script that would encompass all of what I’ve outlined above. The documentation pages I linked to and related pages do contain some samples though and we’ll look at extending those.
Also, we do not have any hardware counters yet, so we can’t give percentages of how much unity is utilizing the maximum possible computing power of all the cores and the GPU in the same way as a system monitor would show, in percentages. I was mostly trying to gauge what info exactly you wanted to see here and get a better understanding of the underlying problem you’d like to solve.
What I described would give you a rather imperfect metric as you wouldn’t know which cores are doing what. E.g. little and big cores on mobile as well as frequency scaling can muddy the waters and you have no info on core affinity.
So with what I outlined you could see total frame time, whether you were CPU, Render Thread or GPU bound and how well you utilized multithreading within that CPU time. I guess that’s better than nothing though. To get anything more detailed you’d currently need a platform specific profiler, which would then not be something you can easily get from withing a build or for your overnight testing CI suite…
you have FrameTiming, just not for PC yet
Yeah that’s not a hardware counter either, just a time measurement, and one you can get with Recorder/ProfilerRecorder set to record Main Thread, Render Thread or GPU samples. We’re working on getting GPU timings on more platforms too.
Got it, I didn’t know the difference.
So after having cobbled together my own version of what I want I now understand what you asked and can answer somewhat, at least from the angle of workflow.
what I’m after: display in-game a graph of each core usage and GPU usage. I don’t need more info at this stage because the questions I’m trying to answer is “with Unity’s own multithreading do I even have enough wiggleroom to jobbify my stuff” and “is the game really cpu bound”.
Why in-game? So that when play tester feels the framerate dip all that need to be done is press the snapshot button on the console.
Why not profiler? Profiler connection currently eats up considerable resource on console.
Ah ok, that makes sense. Thanks for sharing that, I’ll add that to our feedback tracker.
So just to elaborate a bit on the hardware counters: those would be stats obtained from low level, platform specific APIs regarding stuff like the current load per core, GPU or CPU load, battery status, clock frequencies, on core vs of core time, that kinda stuff. Basically what the OS and would see as performance stats for the whole machine. We don’t have that yet and can therefore not give you that info via Profiler APIs, that’s usually where you’ll have to start to use Platform specific profilers that collect these and have access to them (on some hardware, access to these is pretty restrictive).
I think for your purposes, there should be enough data that is currently obtainable for this purpose though, at least to get a rough idea. GPU data being the biggest question mark in here, depending on the platforms and graphics API.
SystemInfo.processorCount will give you the logical core count
(on Android this it is the count of active cores).
Find out the active Worker Threads used per device, e.g. obtained via the Profiler or, as discussed via ProfilerRecorderHandle.GetAvailable() to get all of marker names and count only the Job Worker thread root marker names.
Then you set a recorder to record Idle
across all threads and one recording Semaphore.WaitForSignal
, also recorder across all threads. This will give you the idle time
. Set one recorder to record Main Thread
, that’ll get you the frame time
.
now here’s the math:
frame time x core count = available time
Mathf.Max(thread count - core count, 0) x frame time = off core idle time
(fuzzy guesswork but, basically if you have more threads than cores, some threads will be off core and their idle time relatively meaningless)
(idle time - off core idle time) / core count = idle time per core
idle time per core / frame time = core load in %
Now, as said, this is all a bit fuzzy and I’ve ignored loading threads and the like, and marker gaps in the Render thread (that might also depend on the render pipeline and graphics API) and WaitForJobGroupID occurences (recorders can’t tell you if this resulted in Job Stealing or just inefficient waiting time) but that’s roughly it for core load. Since you want to know more about the Rendering times, you might want to untangle the Render Thread timings a bit more anyways. For this
You’d also want to know if you are using Multi-threaded rendering e.g. via SystemInfo.renderingThreadingMode.
If you are using multithreded rendering, then use some recorders set to samples exclusively used by the Render Thread, (or if they are shared with the Main Thread, use two recorders, one on all threads, one only on main, then subtract main thread time). If there is no multi threading going on, just record these samples off on the main thread but be aware that you don’t count that time additional to the frame time.
All of this would get you rendering time, which you might want to use to calculate the core load caused by the Render thread in a more accurate fashion, than just via recording Semaphore.WaitForSignal
.
If you can’t get GPU samples, recording Gfx.PresentFrame and Gfx.WaitForPresentOnGfxThread, together with rendering times and the main thread’s frame time can give you at least some indication as to GPU vs CPU load, ignoring that some of that time might be down to VSync, which will be harder to calculate out of this.
@MartinTilo hey Martin, I’m trying to calculate the CPU/GPU usage as percentage and I couldn’t make good calculation function as you suggested.
can you share me a code sample?
Not really as the stop-gap solution I’ve outlined here is very project, Rendering pipeline and platform specific that requires profiling the build to get an overview of what to measure. The abstract version above is what I could reasonably come up with on the spot.
Also, as a small update, GPU time can now be recorded with ProfilerRecorder API and if it’s GPU bound vs CPU bound vs Render Threads bound high level stats that you’re interested in, this update on FrameTimingManager stats might be relevant to you.