After some more digging, I found that there are a couple of seemingly major inefficiencies caused by creating a Texture2D in a script.
For starters, this is the sort of code I’m talking about:
var texture = new Texture2D(width, height, format...);
texture.LoadRawTextureData(textureData);
texture.Apply(false, true);
Time-wise, this can be quite slow for a 2048x2048 32bpp texture. We’re talking something like 15ms to create a single texture as I’ve measured on my machine. That’s time spent blocking Unity’s main thread. Not good!!
More details below…
Issue 1: Redundant memory operations on main thread
The Texture2D constructor allocates a backing byte array, sized to the specified width, height and format. The malloc is followed by a memset, presumably all to 0’s. That’s all sort of expensive, and somewhat redundant if the next thing you are going to do is…
LoadRawTextureData(): From the API perspective, it seems like this method should just be a memcpy from the specified bytes to the texture data, but that isn’t the case. LoadRawTextureData causes another allocation to happen, (again sized to width x height x format) and then presumably the input bytes are written to this buffer. (And presumably the unused buffer has to be freed). Either way, using LoadRawTextureData causes this big additional memory allocation and it’s slow! 
Edit: The mitigation I described below doesn’t actually work. You cannot modify GetRawTextureData’s returned byte[ ] array. Even after calling apply, any changes to this byte array are not reflected in the texture. In the future, using the NativeArray overload of GetRawTextureData may be faster. Right now, this is hamstrung by the fact that NativeArray.CopyFrom is extremely slow. It sounds like improvments are planned for that in 2018.3 though…
To mitigate this, GetRawTextureData can be used instead. This API returns the texture’s raw byte array. Now when you call this function, there isn’t an alloc, but I do see a memcpy. Presumably copying the bytes from the underlying native array to the managed array. That’s not great, but the memcpy seems to be relatively fast. Anyways, once you get back the byte[ ] array, use Array.Copy to copy the byte data into the texture.
The GetRawTextureData approach is faster than using LoadRawTextureData, at least for 2048x2048 textures on my setup. Again, it seems this saves a malloc. At the very least, it’s more memory friendly. On my setup though, it is ~20% faster than LoadRawTextureData.
csharp~~ ~~var texture = new Texture2D(width, height, format...); var rawTextureData = texture.GetRawTextureData(); Array.Copy(sourceTextureData, rawTextureData, sourceTextureData.Length); texture.Apply(false, true);~~ ~~
When you call Apply() and specify markNoLongerReadable = true, Unity frees the managed memory (as advertised). free’ing isn’t free though, at least in terms of time. It does show up in profiling. This seems reasonably unavoidable though.
Observations and issues:
-
Texture2D should have a way to construct with the byte data directly as well as a flag that specifies the texture data is not readable (rather than having to specify that in Apply). This sort of API should just result in one malloc and one memcpy. No memsets, no unecessary memcpy’s, and no frees (the GC will cleanup the input bytes[ ] if possible)… The profiling I’ve done indicates that memory management is where the time is being sunk, so reducing the mallocs and mem functions would be a big perf win.
-
What is the purpose of LoadRawTextureData? Maybe this is a vestige of older Unity, but it’s slower than just using GetRawTextureData and copying the bytes yourself. Not sure why one would (or should) ever use it.
Issue 2: Redundant texture operations on render thread
According to NVIDIA NSight tool, this is how Unity is uploading textures in the D3D11 renderer.
- D3D11CreateTexture2D for “Texture A”
// Create temporary texture, fill with initial bytes from Texture2D ctor, copy temp texture to A
- D3D11CreateTexture2D for “temp texture”
- Map/memcpy/Unmap initial pixels of texture
- CopySubResource “temp texture” to “Texture A”
// Create temporary texture, fill with bytes from LoadRawTetxtureData or Apply, copy temp texture to A
- D3D11CreateTexture2D for “temp texture”
- Map/memcpy/Unmap bytes from LoadRawTetxtureData
- CopySubResource"temp texture" to “Texture A”
These steps happen sequentially according to NSight. It isn’t spread out over multiple frames.
Issues with this:
-
For starters, the first temporary texture creation and subsequent copy is completely unnecessary. That data is going to be overwritten in the next set of commands…
-
Second of all, I don’t know why a temporary texture needs to be created at all??? Maybe this is some sort of trick, but why not map/unmap the pixels directly into texture A. That has got to be faster than creating a temporary texture and doing a copy via D3D…
Takeaways
It seems like there is some fairly low hanging optimizations to really improve this scripted texture loading scenario in Unity!
Without those sorts of changes to Unity though, here are some things that would help speed up the scripting side of things…
- Smaller textures → faster script execution. Memory management is the bottleneck.
- Use compressed textures if possible.
- Favor RGB24 over RGBA32 to reduce size.