Poor performance on Android TV

Bringing my discussion over from the Esoteric Software’s Spine Forums in the hopes of getting some resident Unity experts opinions on the poor performance we are getting on Android TV devices.

You can read through my previous posts on this issue here but they seem to be pinning the blame on the Unity Render itself for this poor performance (sub 30 FPS with only 3 skeletons on screen).

These skeletons are quite heavy in terms of bone and vertex counts, but ran at a flawless 60 FPS on the same Android TV device using the same exact SkeletonAnimations in the Cocos2d-x Game Engine. But in Unity we are lucky to hit 30 FPS with a more simply scene setup, and the same exact skeletons.

Why or what could be causing Unity to take so long to render these skeletons? How do we improve the performance without drastically altering the artists workflow and techniques (which worked for another game engine)?

For some extra context the skeletons will never really be moving much (if at all) outside of their animations, we aren’t using HDRP or URP, and have started with the 2D Mobile project template. We aren’t doing any crazy shaders or vertex manipulations on the skeletons, no IK, masking or separators. It really is a quite simple setup and shouldn’t be performing so poorly on an Android TV device (regardless of how bad the hardware is in those TVs) considering our entire game Election Year Knockout used this same general setup for the GameScene for all opponents, the same number of skeletons, and even two of the exact same skeletons (at least until the artists make replacements for those for our sequel Knockout 2).

I can answer any questions and provide more details if they will help to solve this problem. Thank you for any help you can provide.

Make a development build and attach the Profiler to get started.

Alternately perhaps the Frame Debugger can give you some gross insights.

If you’re unable / unwilling to do that then start paring things down using the bisection method until you find what is your resource consumer.

Unity’s built-in renderer receives almost no attention aside from bug fixes, and according to the official developers their own internal testing shows that it’s slower than URP. So I would start there by migrating a copy of the project (or create a quick test project) to the pipeline to see what happens.

https://discussions.unity.com/t/895589/3

Hey thanks for the response! I’ve already done that a bit with some of my finding posted on the Spine forums.

Here are some of my main take-aways, copy and pasted for brevity:

Upon hooking up the [Deep] Profiler to this Android TV device I’ve been testing on, I found that most of the slow-down appears to be coming directly from Spine, specifically:
SkeletonAnimation.LateUpdate
SkeletonRenderer.LateUpdate
SkeletonRenderer.LateUpdateMesh

Which takes nearly 55 ms. to complete each frame!

Here is a screenshot of the profiler showing these results:

This is with three skeletons rendered to the scene.

After profiling without Deep Profiling (as I had done for my very first post), I do indeed see the issue is likely GPU bound, though I’m struggling to see how Unity’s renderer is so much less efficient than Cocos2d-x’s OpenGL renderer. I know that Unity’s renderer is heavy but I still assumed far better performance than this on TV. […] The profiler now is indicating that it is indeed likely GPU bound since the biggest CPU call is now Gfx.WaitForPresentOnGfxThread […]. Why this might be the case I’m not sure, as without any Skeletons it easily manages to hit a stable 60 FPS as well with these scenes.

That is a bit disappointing and disheartening to hear, but I suppose it makes sense if they are trying to push everyone to using their new rendering pipelines. I’ll go ahead and make a test using URP instead of the built-in renderer, and then report back with my findings (whether or not if they are worse or better). I guess I had just assumed the built-in renderer was more performant on lower end devices and that URP was more useful for modern hardware and post-processing stacks.

After doing some more testing these are the results I’ve been seeing, with performance actually being significantly worse using URP compared to the default renderer.

Here are the results copy and pasted from my Spine Forums post for brevity:

Using URP with the default Spine Skeleton shaders only gives us about 15 FPS now, whereas the default renderer with default shaders got us around 30 FPS (albeit unstably). Using URP with the Universal Render Pipeline/2D/Spine/Sprite shader gives us a whopping 6 FPS :hushed: Then finally using URP with the Universal Render Pipeline/2D/Spine/SkeletonLit shader gives us 10 FPS. GPU Instancing doesn’t seem to make a difference here either.:sweat_smile:

In all three of these tests there were no normal maps, emission maps, fragment shaders (outside of any you’ve used for your shaders in Spine’s Unity Package), no light sources, etc. It was quite literally three skeletons, a camera, a global light 2d, and a single canvas with a single TextMeshPro on it to display the FPS averaged over the last 60 frames. About as simple of a scene as we could make for this test.

Edit: I’d also like to add that I just attempted to disable all post-processing, shadows, and lighting in the URP Asset data, and only get about 20 FPS with that setup (though its still unstable with spikes down to 15 FPS).

Back when I tried running the Lost Crypt Demo(in editor, gtx 1060 3gb), I also got massive performance loss because of the 2d skeleton on the character(300fps without skeleton, <200fps with one). I haven’t tried the Dragon Crushers demo(quite a recent demo which has multiple 2d skeletons running at the same time), so maybe check out those projects and see how they run?

Hey thanks for the suggestion. In the Title Scene for Dragon Crashers (the example I assume you meant, since I can’t find ‘Dragon Crushers’) I get a solid 30 FPS. The moment I “Tap to Play” and make it to the Game Scene the frames drop to an abysmal 6 FPS (average of 150 ms PlayerLoop time, ~120 ms of that being Gfx.WaitForPResentOnGfxThread).

So it definitely does not run well at all. It’s just so strange because a virtually identical setup using Cocos2d-x runs at a smooth 60 FPS, but here I struggle to break 30 even with the bare minimum setup… :frowning:

You should allow this thought to gently fall out of your brain because it is meaningless.

It’s like saying “I have two different cars, one works fine and the other makes a different noise and feels different… what is going on here?”

Cocos2D and Unity are not even comparable in any way shape or form. Look to the Frame Debugger for each to see what is actually being shoveled at the GPU. That’s what counts if you are GPU-bound.

Another tell for a gross “Goes to 6 frames per second” change is that you’re doing too much in FixedUpdate(). This is a common mistake to put game logic and all kinds of crud in the FixedUpdate() loop thinking it will make your game faster. It won’t. Put ONLY stuff related directly to the Physics / Physics2D system in FixedUpdate() or you will be very sad. Use Update() for all your game logic, and Time.deltaTime to make it framerate consistent.

I appreciate your concern but I’m not using any physics nor FixedUpdate anywhere in the code.

My minimum reproduction testing environment has three SkeletonAnimations, one Canvas Object, and one TextMeshPro object on it with a single simple script that takes the frame average from the previous 60 frames and displays that to the TextMeshPro (using Update, not FixedUpdate). I did mention all of this in my post just before the one that you quoted from me as well.

If you’re really curious, this is that single script:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using TMPro;

namespace ExNull.GUI
{
[RequireComponent(typeof(TextMeshProUGUI))]
public class FpsCounter : MonoBehaviour
{
    public int MaxFrames = 60;  //maximum frames to average over

    public static FpsCounter Instance { get; private set; }

    private static float lastFPSCalculated = 0f;
    private List<float> frameTimes = new List<float>();

    private TextMeshProUGUI _textRenderer = null;

    void Awake ()
    {
        Application.targetFrameRate = 60;

        // If our Singleton GameObject was already created, ingore any future ones:
        if (Instance != null || transform.parent == null)
        {
            Destroy(gameObject);
            return;
        }

        Instance = this;
        if (transform.parent != null) {
            DontDestroyOnLoad(transform.parent);
        }

        if (_textRenderer == null) {
            _textRenderer = GetComponent<TextMeshProUGUI>();
        }

        lastFPSCalculated = 0f;
        frameTimes.Clear();
    }

    void Update () {
        AddFpsFrame();
        lastFPSCalculated = CalculateFPS();

        _textRenderer.text = "Current FPS: " + lastFPSCalculated.ToString("0.00") + " / " + Application.targetFrameRate;
    }

    private void AddFpsFrame()
    {
        frameTimes.Add(Time.unscaledDeltaTime);
        if (frameTimes.Count > MaxFrames)
        {
            frameTimes.RemoveAt(0);
        }
    }

    private float CalculateFPS()
    {
        float newFPS = 0f;

        float totalTimeOfAllFrames = 0f;
        foreach (float frame in frameTimes)
        {
            totalTimeOfAllFrames += frame;
        }
        newFPS = ((float)(frameTimes.Count)) / totalTimeOfAllFrames;

        return newFPS;
    }

    public static float GetCurrentFPS()
    {
        return lastFPSCalculated;
    }
}
}

Not the most graceful, I’ll admit, as it was slapped together quickly to measure FPS in release builds. But it definitely shouldn’t be the cause of tanking FPS (and indeed testing without the Canvas and it’s TextMeshPro show that it is not the cause of the poor performance).

So to be completely honest with you here, I have no idea how any of what you said is helpful to me or relevant to my situation.

As to this point:

According to your analogy then, we should just switch back to Cocos2d-x then, yes? If they are both cars, and Cocos2d-x is a cheap, beat-up sloppy jalopy and Unity is an expensive, sleek and shiny convertible. Cocos2d-x doesn’t look as nice, and might not drive smoothly, but is fast as all hell and quite reliable, and Unity looks a lot nicer, drives a lot smoother, but is slower than grandma’s Cadillac and can be just as reliable or less so than the alternative. Why then would I choose the shiny convertible, if the sloppy jalopy does just as well? Just for the looks and prestige of driving a nicer car?

Don’t get me wrong, I don’t think that way at all because I don’t personally find that to be a good or useful analogy. They are both completely different beasts, and I get that. I personally vastly prefer development in Unity compared to Cocos2d-x, and like working with C# far more than C++. But they both use OpenGL or Vulkan under the hood, so why would one engine rendering fewer vertices have such a significant decrease in performance compared to another engine rendering more vertices?

What do I need to strip away from the URP or built-in rendering pipeline, and/or settings do I need to change so that I can get performance closer to native engines like Cocos2d-x? Or is this an impossible goal, and SkeletonAnimation meshes in Unity just can’t be rendered efficiently on Android TV devices at all, and to attempt it is to be Sisyphus?

Could you give me more specifics of what to look for? Nothing seemed extraordinarily out of place in the Frame Debugger as far as I could tell, and it didn’t seem like there was an insane amount of overdraw (especially considering the same rendering back-end could render more vertices faster than we see in Unity).

Edit: Here is the output from the GPU Usage in the profiler, showing that Render.Mesh is the call that is taking most of the time in the PlayerLoop: 8980255--1235407--Screenshot 2023-04-28 at 12.15.15 PM.png

So I’m almost certain that it’s either the Spine Runtimes or the Unity Renderer taking so long on the Android TV device, and not our code specifically. I don’t know which it is, but the developers of Spine believe it to be a performance issue related to Unity and not their runtimes.

The GPU ms times it’s giving me don’t really make any sense though, since it’s almost an entire second, but we are getting more than 1 FPS still according to CPU Usage. Oddly the same device with a different project says “GPU Profiling is not available on this device” so the GPU Usage could be bugged out.

Edit 2: In case it’s useful to others to help me debug this issue, here is a screenshot of the Profiler with that very basic scene I was referring to (three Spine SkeletonAnimations, one Canvas, one TextMeshPro, with that FpsCounter script):
8980255--1235449--Screenshot 2023-04-28 at 1.31.45 PM.png

Something I don’t understand here though, is why the Render Thread is waiting so long in the frame before it starts any of the rendering tasks there. I assume it is simply because the rendering tasks in the previous frame were still being computed?

Have you researched this? From my understanding this happens when the cpu is waiting for the gpu to complete its task; there are threads discussing this where people complain about how this wasn’t happening on the previous versions of Unity; I haven’t looked at them in detail though

Yes, from what I can tell through my experiments and research is that this is just a method where the CPU waits for the GPU to finish rendering its frame before moving onto the next update frame. I’m not sure what your question is or refers to however. My question isn’t what Gfx.WaitForPResentOnGfxThread is, but rather is why Gfx.WaitForPResentOnGfxThread is taking so long on an Android TV device with such a simple scene. And the answer to that is still unclear to me :face_with_spiral_eyes:

if cocos2d is running good and unity is running bad why wouldnt you switch for cocos2d? is there something in unity that you need that is not on cocos2d?

What happens if you drop a Capsule in a blank scene and rotate it smoothly along the Z?

float angle = Time.time * 100; // degrees per second
capsuleTransform.rotation = Quaternion.Euler( 0, 0, angle);

If that’s not smooth then you can begin looking elsewhere such as trying a different version of Unity.

In an empty scene, with just a Capsule, a Camera and a Global Light 2D, using URP we do get around 60 FPS on average, while rotating the Capsule using your script to rotate the GameObject.

However it does have spikes in the profiler above 60 FPS, and the Timeline shows that Gfx.PresentFrame is still taking about 10.5 ms on average to render, with just a single capsule and nothing else :frowning:

8985226--1236763--Screenshot 2023-05-01 at 10.32.55 AM.png

Was there any solution found for this problem?