UI Optimisation Challenge?

OK I know I’m challenge mad, but what if Unity put out an open challenge to developers to improve and enhance the new UI system?

They could provide a trail pro code to enable people to use the profiler and improve/enhance the UI system.

With tempting rewards or prizes for the best optimisations/improvements, free Pro versions of 5 maybe?

And it could give the UI team a bit of a rest, at least until the deadline hits and they have to rate the entries!

2 Likes

+1 I think this is a great idea

  • Good idea
  • Good idea.
  • Good idea. already made a lot of optimizations and new controls [ComboBox, TabPanel…],

This sounds like a cool idea, might give people some motivation to develop new controls which might then get adapted into the engine?

1 Like

Haha
People need to give up the idea of Unity giving away 5 pro.
Seems to be a lot of it lately

Makes you wonder why

I thought the whole point of open source is exactly for things like this? Allows other people to add and improve the code…Therefore I doubt they give away any licenses and how much work would be needed to make do for a 1500 dollar pro license…i idea is great but no incentive

With great reward comes great motivations.

1 Like

Actually I think there could be a problem with optimising the UI with the code base provided.

If you run a performance test using text with shadows and dig down in the profiler I found that 15.5% of the time was being taken up with calls to the a List Getting an item in this case the UIVertex triggering a string.memcpy(???) this class does not appear to be in the open classes???

But we could still do the following optimisations.

protected void ApplyShadow(List<UIVertex> verts, Color32 color, int start, int end, float x, float y)
        {
            UIVertex vt;

            var neededCpacity = verts.Count * 2;
            if (verts.Capacity < neededCpacity)
                verts.Capacity = neededCpacity;

            for (int i = start; i < end; ++i)
            {
                vt = verts[i];
                verts.Add(vt);

                Vector3 v = vt.position; // Should be outside of loop
                v.x += x;
                v.y += y;
                vt.position = v;
                var newColor = color; // ditto
                if (m_UseGraphicAlpha) // Should be outside of loop prevents branching and
                    newColor.a = (byte)((newColor.a * verts[i].color.a) / 255);
                vt.color = newColor;
                verts[i] = vt;
            }
        }

So that would give us this …

protected void ApplyShadow(List<UIVertex> verts, Color32 color, int start, int end, float x, float y)
{
    UIVertex vt;

    var neededCpacity = verts.Count * 2;
    if (verts.Capacity < neededCpacity)
        verts.Capacity = neededCpacity;

    Vector3 v;

    if (m_UseGraphicAlpha) // Should be outside of loop prevents branching and
    {
        var newColor;

        for (int i = start; i < end; ++i)
        {
            vt = verts[i];
            verts.Add(vt);

            v = vt.position;
            v.x += x;
            v.y += y;
            vt.position = v;
      
            newColor = color; // ditto
            newColor.a = (byte)((newColor.a * vt.color.a) / 255);

            vt.color = newColor;
            verts[i] = vt;
        }
    }
    else {
        for (int i = start; i < end; ++i)
        {
            vt = verts[i];
            verts.Add(vt);

            v = vt.position;
            v.x += x;
            v.y += y;
            vt.position = v;
      
            vt.color = color;
            verts[i] = vt;
        }
    }
}

It should be a bit faster, don’t have things setup to test it though.

And we could reverse the loop as apparently counting down in C# is slightly faster than up according to dotnetperls.

The other big performance hit appearing in my benchmark is Text.OnFillVBO().

Digging down same issue with UIVertex and string memcpy??? then Vector3.op_Multiply() [can be replaced by unrolling the multiplication to each float element].

Text.GenerationSettings() → get_pixelsPerUnit() is a bit of a hog as for 53 calls we end up with 212 calls when it could be cached.

A bit more digging and the canvasUpdateRegisty.
InternalRegisterCanvasElementForGraphicRebuild() call does a linear search of all elements in the list, this could be improved with a Dictionary or Array based index id system for ICanvasElements. (Note this is only 1.5% of performance issue).

2 Likes

I used something similar in my Cloud System. A few hundred or thousand particles that need to be recalculated in each frame and the loop contained an “if” which could be removed. Getting rid of it showed almost no performance benefit, even though the code was executed a few hundred or thousand times per frame. It didn’t even pay off on mobile devices, which was pretty surprising.

Personally, I would never make that kind of optimization. Slightly faster means in that context, you need to carefully measure it and that needs to be made with enormous numbers, otherwise you won’t be able to measure any difference.
Often when you have a loop, you are also accessing array elements. They completely ignored any kind of array access in their performance considerations, which certainly takes longer than the comparison or the increment/decrement and as such is far more relevant when it comes to improving the performance.

As the UI team will create a new text rendering implementation, it is not unlikely that they will go through it later on or that it even needs to be replaced.

Edit: Just realized that I only have negative comments. Be assured it is not my intention to stop your efforts at all!

I’ve found what is triggering the String.memcpy it appears to be the Vector4 struct used as the tangent in UIVertex, although I’m not sure what element of Vector4 is causing it?

Correction, nope it appears to kick in when a structs memory footprint increases, as soon as you have a couple of Vectors??

OK odd one but just tried changing a UIVertex struct I was testing to a class and the String.memcpy memory profile hits vanish and in deep profile mode the tests show the following:

UIVertex as Struct - 698ms (called 100 times).
UIVertex as Class - 171ms (ditto).

But in non deep profile mode it reverses! LOL

UIVertex as Struct - 0.02 seconds (using realTimeSinceStartup as timer)
UIVertex as Class - 0.04 seconds.

Could the memcpy be caused by said deep profile mode?

1 Like

@Dantus Well in theory it depends what Mono/.Net does to the loop, it could turn a loop with an inner condition that is not dependent on a variable changed within the loop into a branch and two loops. But as far as the CPU is concerned it has to load up the next set of commands and a branch within a loop is a potential performance hiccup.

80/20 rule 80% of your code will not need optimisation as it’s not called often enough, but the 20% that is tucked away in inner loops can provide significant improvements in performance.

Of course the UI team have access to the Open and Closed Source so they should be able to make significant improvements in performance.

Good point, the String.memcpy() entries do have very high Self ms entries, but why does it only appear on structs and not on classes?

In theory it doesn’t matter what Mono/.Net does, because making optimizations to please a specific compiler is usually not a good idea, especially not if the whole architecture will be migrated to il2cpp.
Of course, if something is highly performance sensitive, it needs to be optimized even like that. But in my opinion this is not one of those cases.

Well my optimisation targets were derived from a stress testing program aimed to highlight areas ripe for performance improvements in high usage scenarios.

The optimisations I made are generic, all CPU’s can suffer when performing branch prediction, if they get it wrong it can stall their command pipeline slowing down performance.

IL2CP just takes the intermediate language converts it to C++ and then compiles it. Ideally the C compiler will make optimisations of it’s own, depending on the build parameters you give it. But that is a Unity 5 technology and I’m talking about 4.x builds.

Just to clarify:

100 developers all respond to this challenge and each spends 20 hours designing new controls. One control is chosen and added to the new Unity.

Thereby money lost by developers assuming $20 per hour = 992020=$39,600. (Money which could otherwise have been spent in the asset store???)
Money saved by Unity = 12020 = $400.

Total loss by Unity of $400-$39,600 = $39,200

Better idea: Unity hires 5 top developers and designers for 1 month and creates lots of nice optimized controls. Everyone’s happy.