I’m currently having the problem that the Unity Profiler (allegedly) has so much overhead, that the profiling results are not accurate. I’m not speaking from profiling inside the editor, I’m speaking of profiling a player on the target hardware.
I found that updating UI texts is expensive ( link ). According to [mention|S8VzkRzouqfhbBjleSJfmg==], it only appears to be expensive, because of the profiling overhead and the player being a development mode build (not using script debugging though). Stephan kindly went into more detail here .
I would like to know how I can get reliable profiling data that accurately represent what’s going on in my game? I need to understand where performance is being spent.
The discrepancy of what I measure 1.5ms+ and the suggested 0.1ms real cost is huge and I would like to see/measure it in the profiler and not rely so much on opinions.
I thought the Unity Profiler is capable for doing that?!
Unity Profiler for sure impacts game performance and affects timings. It is instrumentation profiler which works via automatic or manual injection of a code which measures time and writes it out somewhere.
Overhead of a single instrumentation call is very much platform dependent and mostly corresponds to timer latency. On Android it is usually 0.1us-2us per begin/end pair depending on the device. 1000 measurements contribute to the 0.1-2ms on the device. PC/Consoles are around 2-10ns per sample.
Deep profiling increases significantly amount of measurements as every single C# method get profiled. That usually skews data significantly exaggerating timings of methods which have a lot of calls.
As it was mentioned in other forums Deep Profiling usecase is more for understanding what is going on inside some perf critical code and if there is an opportunity to inline some methods or reduce logic complexity.
Do you have an example of this outside of deepprofiling usecase? Are you talking about 1.5ms+ cumulative difference per frame or per call?
Would it be helpful to display an estimated overhead of profiler measurements in the profiler window ui?
I’m aware of the deep-profiling overhead, that’s not the issue I was trying to describe. I was doing a “non-deep” profiling from an Android device and the 1.5 ms are per call, or two calls actually as highlighted in the screenshot below.
Screenshot
In the linked post I added two screenshots. The first one a non-deep profile from the target device, which is the one that is important to me. I want to understand why it should be 0.2ms (as mentioned by Unity dev), but is showing almost 1.5ms (1.28+0.16) actually.
The second image is with deep profile and was only meant to show that the cost comes from that particular subsystem.
I don’t know. How good and reliable would be the estimated value?
Thanks Peter - this is very helpful.
I can see a single Layout call which takes 1.10ms - the profiler overhead of that measurement would be around 0.5-1us. I guesstimate that because of the fact that CPU time is 17ms and typical profiler samples count is 1000-2000 for a frame. This can be verified by adding an empty Update or similar method and checking it’s duration in Profiler.
My feeling is that there is an optimization opportunity for the Layout method. Thanks for the project - it explains the red rectangles for me and I think could be used for the investigation.
If I understand this correctly, the Profiler overhead of a single BeginSample/EndSample pair, such as the Layout call in above screenshot, can’t be used to explain a performance difference of 1ms with/without profiling.
BeginSample/EndSample pairs can sum of to a significant cost if called (very) often. This is when the profiler overhead can distort the actual results, such as when deep-profiling.
If you’re able to remove/subtract the overhead caused by profiling from the actual profiled results, that sounds very useful. This is meant for each BeginSample/EndSample and not only a single global “Profiler Overhead 1ms” label I assume?
We’d have to figure out how to display that. If we’d just hide the time from the samples, it’d not add up correctly to the frame time anymore and in timeline view, timings between threads would be confusing. Maybe on selection of a sample in timeline view, it’d show the time with, without, and just the instrumention overhead?
In hierarchy maybe via a separate column?
Anyways one thing I wanted to check was if you have compared:
Dev build with Deep Profiling support but Deep Profiling turned off
With profiler
With Recorder/ProfilerRecorder just recording that layout sample
Dev build without deep profiling support
With profiler
With Recorder/ProfilerRecorder just recording that layout sample