What’s the best way to find out which calls are taking the most time inside of a Bursted job?
I’m not sure how to use the Unity profiler to do this.
Thank you for any advice. ![]()
What’s the best way to find out which calls are taking the most time inside of a Bursted job?
I’m not sure how to use the Unity profiler to do this.
Thank you for any advice. ![]()
Using the profiler, you cant really see “inside” a job and how long individual commands are taking.
Instead, you can cut up a job into multiple different jobs. If you’re using Entities.ForEach or Jobs.WithCode, you can also name the job produced by that lambda function using .WithName(“name”).
Then go to the profiler, turn it on, then scroll down to “Job”.
Click the expand arrow button next to Job to see individual threads and your job durations will be shown.
I didnt name mine so they’re all job0, job1, job2, etc but they’re durations are displayed in parathesis to the right of the name and the (Burst) notification.
You don’t have to split your jobs for this. You can instrument sections of your job using ProfilerMarkers. In 2020.1 you can also add metadata (to provide additional info from within the job to the Profiler) using the ProfilerUnsafeUtility.BeginSampleWithMetadata API.
We’re working on adding a more user friendly and strippable API
layer on top of ProfilerUnsafeUtility but you can already use this API until that is ready. (Note: this API works in jobs, I’m currently not 100% sure that this works with Burst, as I’m not at my computer but I’ll check next week.)
Such great news, thank you!
Would you mind shedding some light on how to use ProfilerUnssafeUtility.BeginSampleWithMetadata in a job? The example on that confluence page stores a pointer to some unknown piece of data that’s passed into the DoWork() function:
metadata[0].Ptr = UnsafeUtility.AddressOf(ref num);
I’m unsure what pointer I should assign in the case of a job.
Ty!
That’s a pointer to the meta data to report with the marker. The type is defined on creation of the Marker where
ProfilerUnsafeUtility.CreateMarker’s last parameter defines the amount of metadata parameters and
ProfilerUnsafeUtility.SetMarkerMetadata defines the first paramerer’s type as ProfilerMarkerDataType.Int32, i.e. int.
So you can pass in any data outlined by the enum ProfilerMarkerDataType, but Blob8 can’t be visualized in the selection tool-tip of timeline view, where you’d see the meta data for all other types. With this you can e.g. report the index or count of the data you’re operating on.
But using the MetaData APIs is just an additional flourish in this. To see which part of the job took how long, the ProfilerMarker.Begin/End or ProfilerUnsafeUtility.BeginSample / EndSample calls should already offer a good starting point.
The example how to use ProfilerMarker with Burst jobs:
class Example
{
static readonly ProfilerMarker myMarker = new ProfilerMarker("Inner job workload");
static void ScheduleJob()
{
var job = new TestJob
{
Marker = myMarker,
};
job.Schedule();
}
public struct TestJob : IJob
{
public ProfilerMarker Marker;
void IJob.Execute()
{
Marker.Begin();
// Workload
Marker.End();
}
}
}
I have 3 questions :
[EDIT]
2) does using the profilerMarker have any performance cost ?
3) if yes (2), is it possible to disable the ProfilerMarker for release builds ?
Answer : Methods Begin and End are marked with ConditionalAttribute. They are conditionally compiled away and thus have zero overhead in non-Developmenet (Release) builds.
Thanks!
Burst doesn’t support static init of string literals yet. It is not the best for usability here - we are planning to fix that for profiler markup.
(Necro but right on topic)
Using Profiler markers, can I see the total time for job execution somewhere? If I go to Hierarchy, I can see that I can select specific threads to get to the timing data for the markers:
But I’d be most interested in seeing how much time in total it’s spending on a given marker rather than how much time it does so on each individual worker (which for me is already 23 threads; someone on a ThreadRipper must be looking at a hopeless situation :)).
PS: Hey Alexey
This is Rene.
Timeline view shows this. Just click on a rect and it will show you instance, thread total, and frame total timings. The last one is what you want.
Sorry to necro but this is an ongoing issue.
I don’t need frame timings of jobs, because my jobs run longer than one frame. The whole profiler / recorder setup seems to only output per-frame-information, but what I actually want is to simply have a debug log at the end of my heavy computation that tells me something like:
PreperationJob 15ms
DataCollectionJob 10ms
ComputationPhaseOneJob 50ms
ComputationPhaseTwoJob 60ms
FinalizationJob 20ms
I’ve just spent an hour searching the frame in the profiler where my computation might have occured and I wasn’t able to find out how long each specific job takes. Also everything is frame data, but these jobs run over multiple frames in the background, they don’t affect my FPS at all, so I don’t quite understand why these two things (frame timings and job timings) are even combined in the same profiler.
I would kill for an in-job-accessible Time.realtimeSinceStartupAsDouble.
For now I manually write to a custom static time field from multiple update loops on the main thread (sync and async using Awaitables), but it’s far from ideal because it’s so unprecise.
Edit: And of course these jobs are combined with a job-handle which I don’t Complete() between each step. Calling Complete() would change the way this operates I think.
You can use QueryPerformanceCounter via a c++ plugin and function pointer. Seemed to be able to get this to work (after trying many things) with Burst enabled on the job by adding BurstCompile to the DllImport even though it ‘warns’ that it is not a known Burst entry point.
In your case of timing job dependencies I’m thinking you could record just the job start timestamp, then compare differences between job start timestamps, and the final Complete for the last job.
Not tested on a parallel job (regarding locking thread) as I was testing single jobs in my code, but tried this to check that it is doesn’t have any errors. I think it will work in parallel?
HighPrecisionTimerPlugin.cpp plugin compile to dll, place in Assets/Plugins:
#include <stdint.h>
#include <intrin.h>
#include <Windows.h>
extern "C" __declspec(dllexport) long long GetCurrentQPC()
{
LARGE_INTEGER li;
QueryPerformanceCounter(&li);
return li.QuadPart;
}
Global (to access from managed/unmanaged - sure there’s a better way though)
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
public delegate long GetCurrentQPCDelegate();
Managed side (call CompileFP to create the delegate function pointer)
[BurstCompile, DllImport("HighPrecisionTimerPlugin", CallingConvention = CallingConvention.Cdecl)]
private static extern long GetCurrentQPC();
private FunctionPointer<GetCurrentQPCDelegate> GetCurrentQPC_fptr;
public void CompileFP()
{
fptrGetCurrentQPC = BurstCompiler.CompileFunctionPointer<GetCurrentQPCDelegate>(GetCurrentQPC);
}
Pass GetCurrentQPC_fptr to the job when scheduling. (Edited: tested working in parallel, using NativeReference)
public unsafe struct TimestampJob4 : IJob
{
FunctionPointer<GetCurrentQPCDelegate> GetCurrentQPC_fptr;
public NativeReference<int> firstThread; // input as 1
public NativeArray<long> timeStamps; // To hold our timestamp chain
public void Execute()
{
// a few threads might see it as 1, but avoiding interlocking every thread
if (firstThread.Value == 1)
{
// only first thread true, changes to 0 and gets the timestamp
if (Interlocked.CompareExchange(ref *firstThread.GetUnsafePtr(), 0, 1) == 1)
{
timeStamps[0] = GetCurrentQPC_fptr();
}
}
// Do Stuff
}
Managed side, to get the time in milliseconds later on can use via kernel32.dll unless in your c++ plugin:
[DllImport("kernel32.dll")]
static extern bool QueryPerformanceFrequency(out long lpFrequency);
long ticks = timeStamps[1] - timeStamps[0];
QueryPerformanceFrequency(out long freq);
double timeInMilliseconds = 1000d * ticks / freq;
Here’s a trimmed down version of what I use:
public readonly struct Benchmark : IDisposable
{
#if ENABLE_PROFILER
readonly FixedString128Bytes label;
readonly long timestamp;
Benchmark(in FixedString128Bytes label, long timestamp)
=> (this.label, this.timestamp) = (label, timestamp);
/// <summary> Start a new benchmark with given label. </summary>
public static Benchmark Start(in FixedString128Bytes label)
=> new(label, ProfilerUnsafeUtility.Timestamp);
public void Dispose()
{
if (timestamp is 0)
return;
float elapsedTicks = ProfilerUnsafeUtility.Timestamp - timestamp;
var ratio = ProfilerUnsafeUtility.TimestampToNanosecondsConversionRatio;
float elapsed = (float)(elapsedTicks * (double)ratio.Numerator / ratio.Denominator / 1_000_000.0);
if (IsBurstCompiled)
Log(elapsed);
else
LogManaged(elapsed);
}
void Log(float elapsed)
=> Debug.Log($"[Benchmark] {label}: {elapsed}ms (Burst)");
[BurstDiscard]
void LogManaged(float elapsed)
=> Debug.Log($"[Benchmark] {label}: {elapsed:0.00}ms");
static bool IsBurstCompiled
=> Unity.Burst.CompilerServices.Constant.IsConstantExpression(1);
#else
public static Benchmark Start(in FixedString128Bytes label)
=> default;
void IDisposable.Dispose() { }
#endif
}
Usage:
using (Benchmark.Start("DoWork"))
{
DoWork();
}
Alternatively:
void DoWork()
{
using var benchmark = Benchmark.Start(nameof(DoWork));
...
}
ProfilerUnsafeUtility.Timestamp is exactly what I was looking for.
The values I am now getting seem very accurate - many thanks, you helped me a lot. ![]()