(Case 1192489) Building Asset Bundles decompresses textures

I’ve been investigating some slow build issues and one thing I’ve been coming across for a while is console spam about decompressing unsupported PVRTC textures when targeting iOS.

Now, for a while I assumed that to be harmless but I just profiled the Unity process to find 50% of its time spent in DecompressPVRTC during asset bundle builds.

I guess I can only summarize this question as: Why?

These textures are already in the AssetDatabase with the appropriate target format for the build, why would the build process have to decompress and recompress them?

Function Name    Total CPU [unit, %]    Self CPU [unit, %]    Module
 + Unity.exe (PID: 14548)    178079 (100.00%)    0 (0.00%)    Multiple modules
| + ntdll.dll!0x007ff9ef16a271    177280 (99.55%)    0 (0.00%)    ntdll.dll
|| + kernel32.dll!0x007ff9ee267974    177280 (99.55%)    0 (0.00%)    kernel32.dll
||| + Thread::RunThreadWrapper    113653 (63.82%)    0 (0.00%)    Unity.exe
|||| + GfxDeviceWorker::RunGfxDeviceWorker    113217 (63.58%)    0 (0.00%)    Unity.exe
||||| + GfxDeviceWorker::RunExt    113217 (63.58%)    0 (0.00%)    Unity.exe
|||||| + GfxDeviceWorker::RunCommand    113217 (63.58%)    2 (0.00%)    Unity.exe
||||||| + GfxDeviceD3D11Base::UploadTexture2D    113109 (63.52%)    1 (0.00%)    Unity.exe
|||||||| + TexturesD3D11Base::UploadTexture2D    113106 (63.51%)    5 (0.00%)    Unity.exe
||||||||| + TexturesD3D11Base::UploadAll2DData    112619 (63.24%)    0 (0.00%)    Unity.exe
|||||||||| + TexturesD3D11Base::Upload2DData    112615 (63.24%)    5 (0.00%)    Unity.exe
||||||||||| + ConvertCompressedTextureUpload    92416 (51.90%)    3 (0.00%)    Unity.exe
|||||||||||| + DecompressNativeTextureFormatWithMipLevel    92347 (51.86%)    0 (0.00%)    Unity.exe
||||||||||||| + DecompressNativeTextureFormat    92347 (51.86%)    3 (0.00%)    Unity.exe
|||||||||||||| - DecompressPVRTC<0,1>    83299 (46.78%)    50252 (28.22%)    Unity.exe
|||||||||||||| - DecompressETC2_RGBA8_RGBA8888    9034 (5.07%)    0 (0.00%)    Unity.exe

I don’t really get what business the build process for an unrelated platform has uploading all textures to the local GPU, let-alone using the lossy native format to do so.

Hm. I may be wrong but I think I found what is going on. Further time profiling came up with a significant amount of time being spent here:

Function Name    Total CPU [unit, %]    Self CPU [unit, %]    Module
|||||||||||||||||||||||||||||||||| + ContentBuildInterface_CUSTOM_WriteSerializedFileAssetBundle_Injected    37947 (63.83%)    0 (0.00%)    Unity.exe
||||||||||||||||||||||||||||||||||| + BuildPipeline::WriteSerializedFile    37919 (63.78%)    0 (0.00%)    Unity.exe
|||||||||||||||||||||||||||||||||||| + BuildPipeline::BuildReferenceMap::ConvertToInstanceIDToBuildAsset    26731 (44.96%)    14 (0.02%)    Unity.exe
||||||||||||||||||||||||||||||||||||| + AddBuildAssetInfo    26484 (44.55%)    4 (0.01%)    Unity.exe
|||||||||||||||||||||||||||||||||||||| - CalculateSortIndex    19384 (32.61%)    2 (0.00%)    Unity.exe
||||||||||||||||||||||||||||||||||||||| + PPtr<Object>::operator Object * __ptr64    19382 (32.60%)    2 (0.00%)    Unity.exe
|||||||||||||||||||||||||||||||||||||||| + PersistentManager::ReadObject    19380 (32.60%)    0 (0.00%)    Unity.exe
||||||||||||||||||||||||||||||||||||||||| - PersistentManager::ReadObjectThreaded    12791 (21.52%)    0 (0.00%)    Unity.exe
||||||||||||||||||||||||||||||||||||||||| - PersistentManager::LoadAndIntegrateAllPreallocatedObjects    6585 (11.08%)    1 (0.00%)    Unity.exe
||||||||||||||||||||||||||||||||||||||||| - PersistentManager::RegisterPartiallyLoadedObjectInternal    2 (0.00%)    1 (0.00%)    Unity.exe
||||||||||||||||||||||||||||||||||||||||| + PersistentManager::Lock    1 (0.00%)    1 (0.00%)    Unity.exe
||||||||||||||||||||||||||||||||||||||||| - PersistentManager::Unlock    1 (0.00%)    0 (0.00%)    Unity.exe
|||||||||||||||||||||||||||||||||||||| - GetTypeWithoutLoadingObject    7037 (11.84%)    10 (0.02%)    Unity.exe

Now, I can only guess what CalculateSortIndex does from its name, is it supposed to dereference the object PPtr, which results in the object being loaded into memory? It would explain why Unity would be spending 50% of its time decompressing assets it has no business decompressing.

Any ideas, @Ryanc_unity ?

This could very well be the cause of this .

Any updates on this @Ryanc_unity ? The affected project is now available to you to test (ask @unity_bill for it), I cannot really reference that in an actual bug report. Looking at AddAssetBundleInfo, it seems to go to the trouble of calling GetTypeWithoutLoadingObject() only to call CalculateSortIndex() immediately afterwards which loads the object anyway, so this does not seem intentional to me at least.

Ok, I have now traced this. The codepath that triggers the PPtr load of the object in CalculateSortIndex() is if the object is a ScriptableObject/MonoBehaviour. Don’t ask how I found out. With the new information, I created a repro project that triggers the offending code path and submitted a bug report (Case 1192489). The annoying thing is that with the size of project the repro is, the build does not take very long. Profiling it, however, clearly shows the textures being decompressed from PVRTC into GPU memory.

Wow, I did not see this ping. Sorry about that.

So PVRTC ya, that format only has hardware support on the iOS devices itself, so a slower software fallback has to be used in the editor to load/save that format thus the msg every time we load one of those textures from disk for a build.

For a build, we have to have the object data loaded so it can be written to the final output location. Though in this case it sounds like it’s doing it excessively in SBP, so this will need to be checked and fixed if this is true. I also just checked the source on latest trunk for CalculateSortIndex and don’t see any reason it should be triggering an object load for a texture at this point, so will need to do a bit of debugging there.

1 Like

I suspect I’m missing some internal implementation detail here, but shouldn’t data that the editor has already prepared for the target platform be loaded as is, rather than fully deserialized? Those assets are fully loaded into GPU memory, that is what causes the PVRTC decompression, which doesn’t make much sense to me, especially considering it ends up spending quite literally the vast majority of the time doing just that rather than useful work.

I dug into it with a native debugger and the disassembly, and it triggers a load if the object is MonoBehaviour/ScriptableObject. In this case, the textures are dependencies of that (specifically, our ScriptableObjects are UMA material overlays) and hence get loaded. With it doing this, I had a build fail out of memory after spending 27 hours mostly in that method.

Texture data, and other render-able data, goes over to the GPU thread and uploaded to the GPU during any load operation. This includes loading for a build as there isn’t a special loading path for this case.

Ah, that makes a bit more sense then. MonoBehaviour/ScriptableObject don’t have full type information in native unless they are loaded as they are just a representation of a scripting object type. So in this case the scripting object loads, and loads it’s references recursively.

If the scripting object in question that is being loaded is being used as a mapping or lookup table (For example, contains an array of textures you might swap out depending on some runtime constraint) I would suggest switching those direct references out for a weak reference type such as the AssetReference type in the Addressables package.

Ugh. Surely that cannot possibly scale to huge projects? Plus, it is exceedingly ugly by any measure.

Sure, but why does CalculateSortIndex() need to load the object at all? Obviously, since Unity does not export the relevant symbols, I cannot tell exactly what it is that it is reading from that type, just a raw structure offset, but that does seem to be quite an excessive operation to carry out at that point.

That would require:

  1. Loss of automatic dependency bundling.
  2. Loss of automatic loading of dependencies.
  3. Async loading where async loading is at the very least inconvenient.

In short, at that point we would lose all the niceness of having Unity’s dependency handling at all. I suspect this is also what is causing our Fast Mode addressables catalogue to take more than a minute and a half to build, allocate astronomical (>16GB) amounts of memory.

If you are curious about the layout of our project, your team should have access to it, all of the legal stuff has been sorted out as far as I am aware.

At least for our CI, I’m currently testing building our bundles with -nographics. My theory is that if there is no GPU thread, GPU texture uploads and hence decompression from PVRTC won’t occur.

@AlkisFortuneFish It took me a bit to hunt this down as I couldn’t remember the class name. Not 100% sure if this will work for your situation, however there is this UnityEditor type added in 2019.3: LazyLoadReference<T>.

This was added specifically so scripting types in the Editor did not have to immediately load the reference in that field, however it still works just like a normal object reference for all other systems, and falls back at runtime to normal object reference behavior. Basically it was added to solve a very similar problem to what you have, but in the asset import pipeline. So if you are on 2019.3, try changing the Texture2D? references in your scriptable object to LazyLoadReference instead and see if that improves the decompression at build time as a result.

I just run a quick test in the repro project and it would very much look like this would work. Unlike AssetReference and friends, this ticks all the boxes, it is detected as a dependency in the AssetDatabase and is synchronously available for both editor and runtime deserialization use.

I cannot test it on our actual project just this moment, since our 19.3 port branch is out of date and our mainline is currently 19.2, but I am optimistic this is going to improve both build times and, more importantly, our absurdly long times to enter play mode in Fast Mode.

So, @Ryanc_unity @unity_bill , now we’ve had our release I’ve had the time to upgrade the project to 19.3 and addressables 1.6.0, reverting all my addressables customisation minus the PackTogetherByPath mode. As @Ryanc_unity suggested, I modified our UMA dependency references to use LazyLoadReference<T>. This has resulted in sub-10s Fast Mode enter play mode times, with domain reloads now being the largest cost, rather than addressable catalog generation.

There is currently an issue where the Groups window slows play mode times by 50s by recalculating its tree view as the catalog is being generated, which I would treat as a bug, but it’s a massive QoL improvement already.

Glad this hit the mark. The second issue you mention of the groups window slowdown I think @unity_bill and co know about it and have fixes either getting ready to go out or in the works. I’ll poke them about this thread and have them follow up.

yes, we know about it. We have an improvement coming in the next release (this week ish,1.7.something) that will cause the tree to only build the visible nodes. After that (1.8.? 1.7.more?) we’ve got some plans to expose options that can make it even faster.

@AlkisFortuneFish Hi! Can you help me with native debugger? What do you use to profile?
I maybe have the same problem but I want to check it.

One question related to this is that, is it possible to bypass or accelerate the texture compression procedures? Every time I build iOS on a Mac computer, there are loads of

WARNING: ASTC texture format is not supported, decompressing texture

showing in the log. Although this log is harmless and the images will be properly compressed, it takes a lot of time. What I want to do is to bypass this or force it to use a faster compression method when I build test app packages. Unity used to invoke PVRTexTool for texture compression, but now (2018.4.x) it doesn’t! Does it still call some external tool (exe) which I can hack a little bit?

The reason you are getting those is probably what Ryan said above. The engine has already compressed the assets to the target texture format, so when it loads them in order to write them in the asset bundles it actually goes through the same code path that loads assets in general an loads them onto VRAM, having to decompress them to do so. Try to batch build with -nographics, see if it helps.

From the Unity 2020.2 changelog:

  • Build Pipeline: Added: Added ContentBuildInterface.GetPlayerAssetRepresentations API to return the asset representations without triggering a load of the asset itself. Improving performance for certain build cases.

From the SBP 1.8.4 changelog:

  • Updated CalculateAssetDependencyData to use a new fast path API for working with Asset Representations in 2020.2 and onward.

Is this what I think it is, @Ryanc_unity ?

@AlkisFortuneFish maybe? The short is that the new api allows us to gather the asset representations a LOT faster without triggering asset loads in most cases. On a 40GB project with 1204 FBX files (most notorious asset for large asset representation counts), gathering the representations on a just opened project took 5378ms with the old approach, and 135ms with this new api.

This still doesn’t resolve having to write out a bunch of (imo mostly useless) entries into asset bundles for those representations. I do have some other ideas for that which are on my list after the current loading performance improvements being worked on.