Avoiding duplicated Gfx.UploadTexture when using Texture2D.LoadRawTextureData

I have a simple test script that generates a Texture2D every few frames and updates the texture’s data using LoadRawTextureData. In the profiler, I’m seeing that for every texture created there are two texture uploads taking place in the Render thread. Clearly, one is from the Texture2D constructor. The other is caused by LoadRawTextureData/Apply.

Granted, I don’t know exactly what “Gfx.UploadTexture” means. Presumably, it is related to creating a D3D11Texture and/or a call to UpdateSubresource/Map-Unmap. If someone can shed some more light on what exactly is going on behind the covers of Gfx.UploadTexture, that would be great.

Regardless, the first Gfx.UploadTexture seems redundant. If there is an inefficiency here, as there seems to be, I’m very interested to find a way to work around it.

Setup:

  • 2018.2.5f1
  • standalone Windows

Test script:

#define ENABLE_PROFILER

using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Profiling;

public class LoadRawTextureDataTestScript : MonoBehaviour
{
    private readonly byte[] rawData = new byte[2048 * 2048 * 4];

    private readonly List<Texture> textures = new List<Texture>();

    void Start()
    {
    }

    private int textureId = 0;
    void Update()
    {
        // Every N frames create a new texture.
        if (Time.frameCount > 0 && Time.frameCount % 3 == 0 && Time.frameCount < 301)
        {
            textures.Add(ToTexture(textureId++ + "", rawData, 2048, 2048, 1, TextureFormat.RGBA32, false, 0, false));
        }
    }

    /// <summary>
    /// Convert byte[] to Unity Texture2D.
    /// </summary>
    public static Texture2D ToTexture(
        string name,
        byte[] textureData,
        int width,
        int height,
        int mipLevelCount,
        TextureFormat textureFormat,
        bool linear,
        int anisoLevel = 0,
        bool isReadable = false)
    {
        Profiler.BeginSample("Texture2D.ctor");
        var texture = new Texture2D(width, height, textureFormat, mipLevelCount > 1, linear)
        {
            filterMode = mipLevelCount > 1 ? FilterMode.Trilinear : FilterMode.Bilinear,
            wrapMode = TextureWrapMode.Clamp,
            name = name,
            anisoLevel = anisoLevel
        };
        Profiler.EndSample();

        Profiler.BeginSample("Texture2D.LoadRawTextureData");
        texture.LoadRawTextureData(textureData);
        Profiler.EndSample();

        Profiler.BeginSample("Texture2D.Apply");
        texture.Apply(false /* updateMipMaps */, !isReadable);
        Profiler.EndSample();

        return texture;
    }
}

Do you need to create a new texture every time, or can you reuse the existing one?

I think what you’re getting at is pooling textures, potentially allocating them upfront. That would side step one of the uploads in an amortized sense.

It could help and is probably worth trying.

The script I wrote is a super contrived test case, but it gets to the root of a performance issue that I’m trying to overcome. The real scenario here is streaming textures (of varying sizes) from a server. The varying size complicates the pooling; still seems like a viable mitigation though.

Ideally, if what is really happening here is an unnecessary texture upload, there should be an api to construct the texture with the byte[ ] directly. I’m pretty sure that doesn’t exist though?

After some more digging, I found that there are a couple of seemingly major inefficiencies caused by creating a Texture2D in a script.

For starters, this is the sort of code I’m talking about:

var texture = new Texture2D(width, height, format...);
texture.LoadRawTextureData(textureData);
texture.Apply(false, true);

Time-wise, this can be quite slow for a 2048x2048 32bpp texture. We’re talking something like 15ms to create a single texture as I’ve measured on my machine. That’s time spent blocking Unity’s main thread. Not good!!

More details below…

Issue 1: Redundant memory operations on main thread

The Texture2D constructor allocates a backing byte array, sized to the specified width, height and format. The malloc is followed by a memset, presumably all to 0’s. That’s all sort of expensive, and somewhat redundant if the next thing you are going to do is…

LoadRawTextureData(): From the API perspective, it seems like this method should just be a memcpy from the specified bytes to the texture data, but that isn’t the case. LoadRawTextureData causes another allocation to happen, (again sized to width x height x format) and then presumably the input bytes are written to this buffer. (And presumably the unused buffer has to be freed). Either way, using LoadRawTextureData causes this big additional memory allocation and it’s slow! :frowning:

Edit: The mitigation I described below doesn’t actually work. You cannot modify GetRawTextureData’s returned byte[ ] array. Even after calling apply, any changes to this byte array are not reflected in the texture. In the future, using the NativeArray overload of GetRawTextureData may be faster. Right now, this is hamstrung by the fact that NativeArray.CopyFrom is extremely slow. It sounds like improvments are planned for that in 2018.3 though…

To mitigate this, GetRawTextureData can be used instead. This API returns the texture’s raw byte array. Now when you call this function, there isn’t an alloc, but I do see a memcpy. Presumably copying the bytes from the underlying native array to the managed array. That’s not great, but the memcpy seems to be relatively fast. Anyways, once you get back the byte[ ] array, use Array.Copy to copy the byte data into the texture.

The GetRawTextureData approach is faster than using LoadRawTextureData, at least for 2048x2048 textures on my setup. Again, it seems this saves a malloc. At the very least, it’s more memory friendly. On my setup though, it is ~20% faster than LoadRawTextureData.

csharp~~ ~~var texture = new Texture2D(width, height, format...); var rawTextureData = texture.GetRawTextureData(); Array.Copy(sourceTextureData, rawTextureData, sourceTextureData.Length); texture.Apply(false, true);~~ ~~

When you call Apply() and specify markNoLongerReadable = true, Unity frees the managed memory (as advertised). free’ing isn’t free though, at least in terms of time. It does show up in profiling. This seems reasonably unavoidable though.

Observations and issues:

  • Texture2D should have a way to construct with the byte data directly as well as a flag that specifies the texture data is not readable (rather than having to specify that in Apply). This sort of API should just result in one malloc and one memcpy. No memsets, no unecessary memcpy’s, and no frees (the GC will cleanup the input bytes[ ] if possible)… The profiling I’ve done indicates that memory management is where the time is being sunk, so reducing the mallocs and mem functions would be a big perf win.

  • What is the purpose of LoadRawTextureData? Maybe this is a vestige of older Unity, but it’s slower than just using GetRawTextureData and copying the bytes yourself. Not sure why one would (or should) ever use it.

Issue 2: Redundant texture operations on render thread

According to NVIDIA NSight tool, this is how Unity is uploading textures in the D3D11 renderer.

  • D3D11CreateTexture2D for “Texture A”

// Create temporary texture, fill with initial bytes from Texture2D ctor, copy temp texture to A

  • D3D11CreateTexture2D for “temp texture”
  • Map/memcpy/Unmap initial pixels of texture
  • CopySubResource “temp texture” to “Texture A”

// Create temporary texture, fill with bytes from LoadRawTetxtureData or Apply, copy temp texture to A

  • D3D11CreateTexture2D for “temp texture”
  • Map/memcpy/Unmap bytes from LoadRawTetxtureData
  • CopySubResource"temp texture" to “Texture A”

These steps happen sequentially according to NSight. It isn’t spread out over multiple frames.

Issues with this:

  • For starters, the first temporary texture creation and subsequent copy is completely unnecessary. That data is going to be overwritten in the next set of commands…

  • Second of all, I don’t know why a temporary texture needs to be created at all??? Maybe this is some sort of trick, but why not map/unmap the pixels directly into texture A. That has got to be faster than creating a temporary texture and doing a copy via D3D…

Takeaways
It seems like there is some fairly low hanging optimizations to really improve this scripted texture loading scenario in Unity!

Without those sorts of changes to Unity though, here are some things that would help speed up the scripting side of things…

  • Smaller textures → faster script execution. Memory management is the bottleneck.
  • Use compressed textures if possible.
  • Favor RGB24 over RGBA32 to reduce size.
1 Like

Edit: For transfering texture data in the script, I just tried GetNativeArray() + CopyFrom() instead of GetRawTextureData() + Array.Copy(). As it turns out, the GetNativeArray approach is wayyyyy slower. It took about 10x longer. GetRawTextureData is the clear winner here.

Edit Edit: The GetRawTextureData() + Array.Copy() was never viable anyways. See previous post.

I think this is unfortunately the key problem. They don’t know that. Someone could create a texture2D and use it immediately. They shouldn’t but they also don’t want it to crash, so I suspect they fill in the texture data so it’s not junk. The real solution seems like there needs to be an option on the Texture2D constructor to pass in some bytes, or to tell it not to create the D3D object immediately.

1 Like

Yeah, agreed. That would solve the inefficiency on both the renderer and should help simplify the memory operations preformed by the script.

Thanks for detailing your experiments.

I was going to suggest GetRawTextureData as it seemed to be ideal for your situation. So disappointing to learn that it doesn’t seem to work as advertised and that nativeArray is so much slower. What really surprises me is that LoadRawTextureData appears to be exactly the same process as creating a texture, despite as far as I remember you having to use it on an exact duplicate of the textures data layout/size etc.

Might be worth logging the ‘new texture()’ method as a bug, the whole temp texture and copy seems superfluous. Perhaps its some old legacy code, perhaps its needed for a specific platform and can be optimized for others. Either way unless for some odd reason its more optimal doing it that way this might just be some poor coding on Unity’s part that could be fixed.

Sadly none of this fixes your issue. As bgolus says the best we can hope for is a dedicated method like CreateTextureFromRawData() or something. Though honestly I always assumed LoadRawTextureData was meant to be doing that or even be a more optimal way of doing it.

So it looks like you are back to texture pooling to at least spread some of the costs ( e.g. initialization for texture creation, then later frames for uploading new texture data). So a couple of thoughts;

  • What about using a large mega texture to copy smaller ones into and out of?

  • Graphics.CopyTexture, but still stuck with creating textures.

  • What about using computeShader and computeBuffers to upload the texture data?

  • Check out Texture2D.LoadImage - only works with png or jpg, probably same steps as you found before.

Unfortunately I can’t really see how any of these would actually be an improvement or even how it might be used, without resorting to using renderTextures.

Obviously you could write a plugin and do the texture upload yourself, but well, yuk, not something i’d like to do.

I wonder how the megaTexture plugins on the assetstore work? I’m sure things like LoadRawTextureData were introduced to help with that. Though from your results I can’t see how.

I filed a bug today with these findings. Attached a sample project for them to inspect and NSight results to show the D3D11 inefficiencies. Hopefully that gets some traction.

Yes, if you read between the lines in their docs it seems like the GetRawTextureData() that returns byte[ ] isn’t intended for updating the texture; rather just taking the bytes out of the texture to load it into another (or serialize it for later use). The GetRawTextureData() that returns a NativeArray does seem like it should work with updating the texture as that is the exact sample they have in the docs. Unfortunately, NativeArray copying is way too slow to be of use in this scenario. (Getting the NativeArray via GetRawTextureData is really fast; the CopyFrom is what ends up being crazy slow)… I suppose you could do the copy in a job instead of on the main thread if you are willing to wait and bog down the system! Either way, hope it’s fixed in 2018.3.

My conclusion was that the native plugin route is currently the only way to do texture streaming efficiently (and maybe that’s how the mega texture assets work as well)… Anyways, “yuk” perfectly sums up how I feel about having to go down this route. Oh well!

2018

Let me guess this never was addressed and Unity is still a trash turtle at this?