Profiler: More info on the "GameObjects in Scene" count?

In the Profiler, under Memory, the count for “GameObjects in Scene”, how is this value being obtained and can I get it myself in script?

And is there a way I can get more information on what these objects might be and where they are?

UPDATE: More important question! Why does this value (and other values in this list) switch between 2 different values depending on if the Unity window has focus or the Profiler window has focus??

7877650--1001746--upload_2022-2-8_13-32-53.png

I’m a contractor that’s been brought into a project to help optimse the game, and I noticed this value is going up, and up, and up, and never stops.

I’m still experimenting, disabling parts of the code and re-enabling, to find out what might be causing this value to increase. They have a pooling system that appears to be working, entities are being reused, but this count still goes up every time something new happens in the game.

Hello

For “GameObjects in Scene” via
UnityEngine.GetObjects<GameObject>().Length
For “Total Objects in Scene” via
UnityEngine.GetObjects<GameObject>().Length + UnityEngine.GetObjects<Component>().Length
and “Assets” via
UnityEngine.GetObjects<Object>().Length - (UnityEngine.GetObjects<GameObject>().Length + UnityEngine.GetObjects<Component>().Length)
(… ok, that might be a minor approximation only but, close enough in most instances)

In 2020.3 or newer, you could also use the ProfilerRecorder API and the ProfilerCounter names listed on the Memory Profiler Module documentation page to get these stats. They are also explained there.

Excellent question. If you’re profiling the Editor, it could faultily catch prefabs open in an inspector or if it doubles it might add the value from two frames together? somewhat sounds like a bug or a limitation of Profiling the Editor because Memory behaves weirdly in the Editor.

You can swap from Simple to Detailed and take a capture. In the detailed capture, these objects should be listed under Scene Memory.

You might also want to check out the Memory Profiler Package. We just updated it to 0.5 which should make it easier to investigate why these Object are held in memory. As you describe that you are apparently leaking these, just use the package, take two snapshots while their number is growing, open both in the compare mode and group or filter (“Match”) the Diff Column by clicking the column header and check out all “New” entries.

Thank you so much! This is exactly what I wanted to see.

Well after snapshotting the Memory on the AndroidPlayer, I’m not seeing anything obvious that stands out. Nothing obvious seems to be eating up all the RAM, and apart from (what seems to me to be) a relatively high object count, there isn’t any obvious object types that stand out as being more numerous than others.

Although it makes it hard to tell when I can’t sort the grouped objects in order of how many are in the group. Sorting by Ascending order just sorts them alphabetically by the name of the group.
Although I think I found a way - sorting Descending by “Referenced” seems to give something close to that;

Managed Objects:

Native Objects:

I am not sure if these number are significant. The number of “GameObjects” does seem high for what’s on the screen, but I have no point of reference to know what’s a high amount.

For reference, this is a typical scene in our game;

Yet the GameObject count still continues to increase over the life of the game. The memory snapshot was taken when things were slowing down. Mostly when this happens, the things taking up the most rendering in the CPU profiler is “Rendering” followed by “Other” which while it doesn’t take up as much as Rendering, has some dramatic and frequent spikes, and the most dramatic pauses in the game are accompanied by a spike in the red GC allocations graph.

The specs for the Android phone I’m currently testing on, for reference; ZTE Blade A7 - Full phone specifications

Without anything more obvious to focus on, my current approach is to just work with the other programmer on the team and clean up the way things are being done, starting with the things that happen most in the game. I noticed some bad practice coding that would lead to lots of garbage collection of temporary objects like Vector3, lists, etc.and Instantiate without Destroy, and a bad attempt at pooling that doesn’t really solve the problem.

But any insight on where else I could look to tackle the biggest problems first would be most welcome.

To that end, you might want to un-hide the “Length” column.

My advice from before still stands on this topic:

You might want to turn on Allocation Call-stacks (might need to be turned on in the profiler as you build & run if you aren’t using the latest patch release of whichever major Unity version you’re on, as there was a bug that we fixed that wouldn’t communicate a later stste change of that setting to the player) or deep profiling to find these GC sources and eliminate them. Others in a build is often down to Debug.Log messages. Those don’t get compiled out of release builds either so you might want to look at those. That shouldn’t increase the object count though.

Also, just selecting things in the details of the CPU profiler highlights the contribution of the selected sample stack to each of the categories in the chart so you might want to use that to identify the source of that “Other” usage.

And I’m happy to see you found and pressumably fixed your Texture2D leak :slight_smile:

Thanks man - actually I fixed the Texture2D leak by not profiling the Editor and profiling on the actual Android build :smile: I started paying attention to all the warnings to “profile on the build, because memory behaves weirdly in the Editor!”

And for some reason, sorting by the Length column didn’t work as well as sorting by the References column. :eyes:

Even with Deep Profiling enabled, I can’t exactly see where the work is going in the breakdown of the CPU usage, the % of the children don’t add up to the % of the parent, so clearly there’s some overhead in each parent that isn’t being broken down in the hierarchy. Here’s a screenshot of one of the spikes - what would you take away from this?

The Call Stacks button was a single button, not a dropdown as referred to by the page you linked to, so I guess the version of Unity they are using for this project doesn’t have that feature. It’s 2019.4.34f1

The memory snapshot diff just shows a tonne of different objects that I know we could manage better, and the profiler shows some garbage collection spikes, so for the time being, I think an easy win for this project is to comb through and just reduce the garbage collection and the non-destroyed instances I see being created, so I’ll get onto that. It’ll probably improve performance overall anyway.

We added more options for Call Stacks to be added to other samples than just GC.Alloc samples. I.e. on 2019.4 that toggle is what activates the feature I linked to.

That is what the “Self Time” column shows. That time is there and accounted for. E.g. your spending 13+6= 19ms on culling. But what you focused on in this screenshot does is too much in the details to show the actual distribution of time spend. The shown section only accounts for ~20% of the entire frame

Ah… and the hover text for that button has a lot of useful info too :slight_smile:

Ah thank you, that makes sense now.

Yeah I guess the issue is the performance is lost over many areas in small amounts. That 20% item is just at the top of the list. But I should be right now, it’s all starting to make more sense.

My next burning question is, does it seem odd that “vibrate” would cause such a performance spike? I suspected it may be some quirk of Deep Profiling, since this never showed up before I enabled it. I actually ended up turning off vibrate in the game so it stopped spiking and I can more clearly see things I can actually fix. (Like that prior spike, that was them instantiating a bunch of objects, which I can fix).

This seems to be deep inside the Android SDK so I’m not sure I can do something about this performance spike specifically.

7892176--1004635--upload_2022-2-13_11-15-32.png

That’s a bit too deep in Android specifics for me to comment, maybe something for the mobile subforum?

But: Deep Profiling comes with a small overhead per managed method call, so the more calls, the bigger the overhead.

If incremental GC is on, write barriers are in place to inform it if it needs to redo part of its time-sliced mark and sweep phase. That can add up to 1ms spread out over all managed code executing in a frame. But that would be affecting all frames, though the more managed code executes and the more references and allocations change, the worse the impact.

GC.Alloc samples aren’t timed. Thqt would add too much overhead to them. They just have an alibi time attributed to them but could and normally would easily use more time than the profiler shows.

A phone has a tiny power budget to deal with that it needs to dole out to CPU, GPU, Memory, I/O, Network, or other hardware. It usually can’t give all of these the same priority so, using more of one system can lead to the OS down clocking the others.

So higher memory access and vibrate could just eat up some of that budget. Transmitting the data for that over ADB or Wifi to the profiler takes out a bit of that too btw so, might be worth measuring the frame time and those samples with UnityEngine.Profiling.Recorders without the profiler and deep profiling attached, just to be sure it’s actually a problem

If you want to dive deeper on the android specifics and hardware metrics, maybe a Native Profiler like Arm Mobile Studio might get you further on that.

Thanks man, geez, you really know your stuff. It’s great to know there’s a path to go down if I need to explore this issue further. From what you’re saying I get the impression it’s the kind of thing I shouldn’t chase until I really need to, especially since I haven’t noticed an actual game performance issue during vibration events.

Everything you’ve said has been super helpful and I’m pretty much on my way now, so thank you for your help.

1 Like

Thank you and you’re welcome. Happy to help :slight_smile:

Hi again, just looking for confirmation on these two issues:

This “Other” category in Rendering, the muddy green section at the bottom of the CPU Usage. That’s basically our own code, right? Time to use Profiler.BeginSample() and Profiler.EndSample() to expose where those spikes are coming from?

And these spikes on Gfx.WaitForPresentOnGfxThread, this means the CPU is waiting for the GPU? (That’s what I get from this thread Gfx.WaitForPresentOnGfxThread )


Not sure where to start looking into this one, but I will get to it. The weird part about those last kinds of spikes is they happen when just standing around, nothing interesting is really happening in the game.

So, it’s not scripting code directly, so it’s not your C# code (that’d be blue). But it could be related to calls you make into the native backend and those calls executinf something that doesn’t fit into the other categories, like Logging Debug messages.

It can also relate to some work which you might be inderectly affecting with the content or settings, and in the Editor it’s usually all of the EditorLoop if you’re targeting Playmode.

Adding profiler markers is not gonna change the color about if those are placed around the calls to the native backend via the API, then it could help attribute the Other stuff but you’re better off starting at the top and just selecting things in the Hierarchy or Timeline view and checking how the selection filters the Chart colors. If what you selected contains time attributed to Other in itself (the self time of brown green markers in Timeline view vor example) or within the self time of its child profiler samples, then the Chart will show you how much of that is categorized as “Other”. This way you can narrow it down.

Beyond the above, pleas use Unity.Profiling.ProfiletMarkers instead. They have way less overhead, don’t clutter string memory, are Burst compatible and are for all of those reasons mostly ok to leave in your code, even as it goes into Release. (Begin and End calls are entirely compiled out in release, Auto() returns a null object)

Please don’t take your information on that sample from the forums, but from the Common Marker documentation page.

The forum threads discussing these are full of stab-in-the-dark, random unity bug or performance regression related issues and confusion mingled together under the most surface level expression of an issue that is almost always destinct for each project hitting it. In fact, these threads are the reason that Documentation page exists in the first place.

Case in point: you are not GPU bound and the marker doesn’t mean that either. The “GfxThread” it is waiting on is the Render Thread, which is running on the CPU. In your screenshot you can clearly see that the last frame’s Render Thread work overlaps onto this frames WaitForPresentOnGfxThread. So the GPU hasn’t even finished getting all the Commands from the Render Thread, but as soon as it has, Gfx.Present actually seems to be done with this frame (or the last one) within 4.5ms.

So you’ll likely need to reducing your draw calls batching costs or similar. What exactly I can’t tell you based on screenshotsscreenshots alone, as it’s rather project specific and requires deeper analysis. The Frames Debugger is going to be your friend and guide through some of that, as is the Profiler Timeline view, with a focus on the Render thread and rendering work happening on the main thread.

Ok so script is blue, so it’s not that. Thanks for the heads up!

Ah I edited the post so many times I forgot to mention - NOTHING I select ever highlights that damn “Other” category, which is why I was asking if I need to start profiling our code. Does this sound odd to you? Should SOMETHING in the hierarchy view be contributing to the “Other” category or is it possible it’s not tracked?

I understand the concept of reducing draw calls by encouraging batching. But by “batching costs” are you saying that the act of batching can negatively affect performance? Eg. gathering up the objects to be batched. How would we address this?

I think our biggest issue though is these sudden drops in performance, looks like it’s caused by excessive garbage for the GC to collect.

This profiler screenshot was taken after a period of a few seconds in the game where the framerate dropped noticibly and the game became jerky and sluggish to respond. To me, this looks like the Garbage Collector is overloaded, and had to pause the game to do a lot of clean up. That’s what I take from the spikes in the red GC lines, and the accompanying spikes in the CPU Usage graph. Is my interpretation on the right track?

7922239--1011079--upload_2022-2-24_12-23-16.png

I’m not super certain how to best hit this nail on the head though - I thought to keep periodically taking memory snapshots, and then when the performance drop happens, take another snapshot afterwards, and then compare it to the very previous snapshot, to see what’s “Deleted in B” to see what the GC had to clean up. But this seems like a very laborious and inaccurate way to find the issue.

Actually there’s the call stacks tracking you mentioned way earlier, here’s what I can see. I’m off the clock now for work but at a glance it looks like the DoTween is calling our SupersonicWisdom plugin, which is creating a lot of garbage, is that right?? (Below I have selected a spike in the “GC Allocated” graph, it’s hard to see)

7922239--1011127--upload_2022-2-24_17-10-30.png

7922239--1011130--upload_2022-2-24_17-13-28.png

7922239--1011133--upload_2022-2-24_17-13-51.png

Not even Player loop or …DirectorUpdateAnimationB… Or Physics Fixed update? I can see those attribute to Other in your Screenshot of the Timeline view. The chart is basically the cross section of the main thread timeline colors as seen from below, looking up by a 1-dimensional observer.

Alternatively please make sure you’re on the latest patch releases for your majorunity version. There where some bugs fixed about this quite a while ago.

You should be seeing a spike in the GC Collect category on the CPU Chart then, but you aren’t, so that’s not what’s happening.

The 3KB allocated in that frame isn’t that much either. It’s more indicative of something happening that is creating new things, which could affect performance in different ways. And it goes back to power budgets, i.e. touching multiple hardware components simultaneously and thereby reducing how much power each gets.

Performance slowing down after a while with no other change would sound like Heat Throttling, i.e. down clocking because your phone is getting to hot and is only passively cooled. It does seem like your Rendering stats are changing though so I guess it’s not a static situation, so it doesn’t sound like a super clear case to me.

The time cost for GC.Alloc on the CPU isn’t measured to reduce overhead, they only emit a one of sample and not a begin+end like all the others. The samples are then given a minimal duration by the profiler deserializer. Allocating memory can be slow and take longer than that minimal time but 33 allocs at 3KB don’t explain a 33ms rendering spike. Just check out what the CPU view tells you about what is going on.

It might be easier to catch the new allocations as they happen by following the GC Alloc Column values on the CPU profiler module. Maybe with Call Stacks data collection turned on (ideally from the start of recording). But yeah, this could work in a pinch.

Yep, lowering Draw Calls helps the CPU a bit because it needs to send less commands but it mostly helps the GPU because there are less context switches. Processing the data into batches to be send to the GPU however comes with a CPU performance cost. It ends up being a balancing act: are you GPU bound? Optimize what work the GPU needs to do by preparing stuff better on the CPU side. Are you Render Thread bound but have time left on the GPU, let it do a couple things that are costly to process in a brute force kinda way, and see if that runs better overall.

But just drawing less in general helps both. Though also Culling and some other factors have an effect on the CPU side of things. Just, please look at what the Render thread samples are telling you is taking the time. And then see if you can find out how to attack that.

Ok, so I’m getting the message loud and clear that it’s not garbage collection after all. And that the high representation of Gfx.WaitForPresentOnGfxThread is actually the GPU waiting for the CPU to still be done with the work from the last frame.

Regarding the “Other” category, I was interpreting it wrong, I now understand what I’m looking at.

The next issue is tracking down what is causing the spikes in “Other” is quite cumbersome and if I could zoom into the graph or something it would be a lot easier to see the highlight of the contributing spike. I recorded a video to show you what I mean. Is there a better way I could be going about this?

Here’s another one of those sudden drops in performance. It seems to be a number of things as I scroll the bar across. I’ve been suspicious of “Shadows” because we have them turned on for every little thing which is beyond unessecary, but “Shadows” seems to pop up semi-frequently, while Batch.DrawInstanced takes its place most of the time. But I wonder if having so many shadows causes knock on effects like batching?


I went and disabled shadows and receive shadows on all the Mesh Renderers, and now the performance is a lot better. There are still these regular hitches showing up on the graph but it’s not actually noticeable during game play. They are being attributed to RenderForward.RenderLoopJob mostly, it seems.

I wonder if it’s that power drop thing caused by using several components at once? A question about that specifically - that’s not something we can do anything about directly other than reduce overall workload?

Thank you, that video Illustrates an issue we do know about with the labels and the cursor potentially getting in the way of things. We have some rough ideas of how to change the UX for that to something more useful and better readable, because yeah, the Other time is also hard to distinguish just by the color of the time label. We’re also looking into clearer colors but maybe if you switch to color blind mode using the three dots menu in the window toolbar, it becomes easier.

Also you can rearrange the order of the categories in the legend of the chart, maybe that can help too? Otherwise, timeline view can help because it uses roughly the same color scheme. I say roughly because every color present on the chart maps to Timeline, but Timeline has more colors and all remainders map to Other. Except for grey ones if they have a parent sample, then they take on their parent’s category and color.

I’ve taken a note on that feedback, but for 2022.2 the main (/near only) focus is on the Memory Profiler so it might take us some time to get to that.

I’m not sure. I think there was something about shadows and batching, please use the Frame Debugger or RenderDoc for details. But yes it will probably put a bit of a strain on the Render thread and the GPU. But you seem to still be looking at Hierarchy and you’re missing most of the Rendering work from the whole picture that you should be looking at. You can switch Hierarchy view to show the Render thread but some rendering work is spread across the two and you might want to see the timing between them and see the totals for these added up across the threads in Timeline views selection “tool-tip”.

On mobile devices you’re also always froced into vSync so a slight delay on the CPU to send the work to the GPU could mean that you miss the vBlank and need to wait for 16.66ms on the GPU. That’s also an effect that spans across frames so Hierarchy view really gives you too narrow a window into what’s actually happening when you’re analyzing rendering and GPU work.

Reducing overall workload or being careful about what system you use (downloading stuff from the web, memory and I/O work, GPS, …) is the most you can do about the power and thermal budget. The more time the CPU and GPU have for idle time, the less heat and the less power they need. A rule of thumb there is to leave about 15-20% of idle time on mobile.

To analyze if that is the problem, you’ll need to use a Native profiler like XCode Instruments, Arm Mobile Studio, Android Studio, Perfetto or similar and hope that some indicator for these is represented in these for your device.

Also, since you’re looking at a phone with a Mali GPU, Arm Mobile Studio and the System Metrics Mali package might help shedding some light on the GPU work for you.

Just want to thank you for all the input so far. For the time being we’re switching from performance fixing to other features since it’s running better since I solved a lot of their memory leaks. A lot of the stuff I was asking you about was for a low end phone, working on the performance issues the low-end phone having seems to be too numerous to really attack the actual problems the majority of phones were having.

The only obvious performance problem the majority of phones are having (which we decided to leave for now, since it only happens on some devices and our game only takes up about 300mb RAM) is an in-game frame rate stutter after watching a video ad. I reckon it’s just devices with not enough RAM and Unity having to fire up some of the game’s systems again. I saw a lot of “BatchRenderer.Flush” and “Shadowmap” stuff behind those spikes that would happen right after that ad, and while I don’t know for sure, it makes me think Unity is just re-initialising stuff because some of it was removed from RAM to make room for the video ad.

Does that sound like it might be the case? Here’s some examples;