Actually I think there could be a problem with optimising the UI with the code base provided.
If you run a performance test using text with shadows and dig down in the profiler I found that 15.5% of the time was being taken up with calls to the a List Getting an item in this case the UIVertex triggering a string.memcpy(???) this class does not appear to be in the open classes???
But we could still do the following optimisations.
protected void ApplyShadow(List<UIVertex> verts, Color32 color, int start, int end, float x, float y)
{
UIVertex vt;
var neededCpacity = verts.Count * 2;
if (verts.Capacity < neededCpacity)
verts.Capacity = neededCpacity;
for (int i = start; i < end; ++i)
{
vt = verts[i];
verts.Add(vt);
Vector3 v = vt.position; // Should be outside of loop
v.x += x;
v.y += y;
vt.position = v;
var newColor = color; // ditto
if (m_UseGraphicAlpha) // Should be outside of loop prevents branching and
newColor.a = (byte)((newColor.a * verts[i].color.a) / 255);
vt.color = newColor;
verts[i] = vt;
}
}
So that would give us this …
protected void ApplyShadow(List<UIVertex> verts, Color32 color, int start, int end, float x, float y)
{
UIVertex vt;
var neededCpacity = verts.Count * 2;
if (verts.Capacity < neededCpacity)
verts.Capacity = neededCpacity;
Vector3 v;
if (m_UseGraphicAlpha) // Should be outside of loop prevents branching and
{
var newColor;
for (int i = start; i < end; ++i)
{
vt = verts[i];
verts.Add(vt);
v = vt.position;
v.x += x;
v.y += y;
vt.position = v;
newColor = color; // ditto
newColor.a = (byte)((newColor.a * vt.color.a) / 255);
vt.color = newColor;
verts[i] = vt;
}
}
else {
for (int i = start; i < end; ++i)
{
vt = verts[i];
verts.Add(vt);
v = vt.position;
v.x += x;
v.y += y;
vt.position = v;
vt.color = color;
verts[i] = vt;
}
}
}
It should be a bit faster, don’t have things setup to test it though.
And we could reverse the loop as apparently counting down in C# is slightly faster than up according to dotnetperls.
The other big performance hit appearing in my benchmark is Text.OnFillVBO().
Digging down same issue with UIVertex and string memcpy??? then Vector3.op_Multiply() [can be replaced by unrolling the multiplication to each float element].
Text.GenerationSettings() → get_pixelsPerUnit() is a bit of a hog as for 53 calls we end up with 212 calls when it could be cached.
A bit more digging and the canvasUpdateRegisty.
InternalRegisterCanvasElementForGraphicRebuild() call does a linear search of all elements in the list, this could be improved with a Dictionary or Array based index id system for ICanvasElements. (Note this is only 1.5% of performance issue).