Runtime texture compression using compute shader

I am doing some runtime texture generation, and I’ve been trying to think on how I can cut down memory costs (doing virtual terrain texturing). This is for mobile btw. What I am thinking is if I could somehow compress the render textures to etc2 in a compute shader, I could keep everything on the GPU and save on memory. Has anyone had any kind of experience with this, or related? Just trying to think of any solutions to save on memory, as it would probably save us at least 50 mb or more if we could compress to etc2 at runtime.

ETC2 is fairly slow to compress. Several seconds to minutes per textures, and that’s on desktop GPUs. The fastest compute shader based compressors for it are … faster … but can still take seconds per texture, again, on desktop GPUs.

There was a paper released just a few months ago that proposed a method to bring ETC2 compression down to potentially less than a millisecond. QuickETC2. It’s plausible you could implement that in c# yourself.

Or try implementing the techniques there in a compute shader yourself.

Alternatively there are existing ETC1 real time compression assets.

1 Like

I’m playing with compute shader compression myself right now (though with BC7 format, not ETC2), and so far it’s in the “not really that much faster than doing it on the CPU, but a world of pain in shader compilers, precision mismatches, GPU driver issues” etc.

Runtime compression to ETC/ETC2 is actually there in latest Unity versions (i.e. via Texture2D.Compress), but that’s only starting with 2021.1. Underneath, that runtime compression is the same compressor that bgolus linked to above (“etcpak”, just not the QuickETC2 branch yet).

3 Likes

I was actually looking at Betsy. It’s like 5 passes just for ETC2. I think ETC2 has the most passes. Mostly I am wanting to do ETC2 because it is supported on both iOS and Android. And getting one type of compression to work would be enough headache on it’s own for me. I actually hadn’t seen QuickETC2, I will definitely take a look at that. Yeah I saw the ETC1 asset, but I would need ETC2, unless I wanted to use 2 textures instead of 3. I am packing color, normals, and roughness into 2 textures right now.

Ideally I can just get texture size in memory down at least a bit. I am using 50 512x512 tiles as well as a 4k basemap for the terrain in a quadtree right now. At 2 textures per tile and basemap. I am thinking that’s about the minimum I am gonna get away with, I will definitely need a bit more than, maybe for additional tiles, definitely for more terrain textures. Uncompressed as well as with other textures I am using for the terrain, it’s putting me at about 200 MB in runtime memory :confused: Especially for bandwidth limited mobile devices, it isn’t great, although I can probably make it happen if I made sacrifices elsewhere. It’s too bad even on 3GB iOS devices, only about 1.2 GB of that is useable in game.

Ahh interesting!! I remember before Texture2D.Compress() only worked with DXT. So are there additional options now or how does that work? And do you have any speed benchmarks or any notes there? The only thing there is having to pull the texture from the GPU to the CPU, compress, then send back. And I imagine the texture compression is thread blocking?

1 Like

EDIT :
Oh wow, QuickETC2 is realllly new!

are Texture2D.LoadImage() and ImageConversion.LoadImage() both blocking? Since this is for the terrain texture updating, I really need to offload as much off the main thread as possible.

Also, terrain virtual texturing for reference. Still some bugs to fix there…

When I said a few months, I really did mean a few months. September is when I think it was first published, and it’s not yet been merged into the main branch of etcpak or any other etc2 encoders that I know of. I’m assuming the quality isn’t great compared to more traditional etc2 encoders. Similar to how the real time DXT1/5 encoder is no where near the quality of the editor encoder. Plus you’re going to be recompressing already compressed assets, so that’s a whole extra level of badness.

The QuickETC2 etcpak branch achieves two things:

  1. Vanilla etcpak performs ETC2 compression by encoding each 4x4 block using ETC1 and planar compression. The error metrics are then compared and the better one is used. QuickETC2 introduces a heuristical selector, so that only one selected encoding is performed for each block. This makes the encoding ~2x quicker at generally the same image quality. Obviously, there will be cases where the heuristic is wrong.

  2. QuickETC2 also introduces the missing T and H block encoding modes, which greatly improve the resulting image quality, at generally the same speed as vanilla etcpak ETC2 mode (often faster).

The image quality comparison is presented in figures and tables at [PDF] QuickETC2 | Semantic Scholar

The changes on the branch are provided as a reference implementation by paper’s author and will be eventually used as a base for mainline scalar/SSE/AVX/NEON implementation.

3 Likes

@Aras ?

Really appreciate the detailed response here. Very helpful.
Do you have a rough idea of how etcpack for ETC2 with NEON would compare to QuickETC2 with regular cpp? Given that QuickETC2 doesn’t have a NEON implementation yet.

I am looking for a solution that I could implement asap, and trying to figure out what makes the most sense right now.

I don’t actually know, just know that runtime compression recently got ETC support, since I remember seeing that work being done.

This all sounds pretty cool, my reply here is mostly remind myself to investigate this further ( especially to test 2021 feature ) and keep up with the thread.

Sadly I can’t offer anything more in terms of implementing this on GPU, but its something I would be interested in since I have an app that saves user designs and have to create a thumbnail for them (512x512) which when viewing all of their saved designs can easily eat up memory when uncompressed.

For that app many years ago I did write ( or rather converted ) a PVRTC compression algorithm using c#. The source was particularly badly implemented ( maybe for ease of learning ) taking 100’s of MB and tens of seconds to convert just 512 square texture. I was able to massively reduce both, but it was still too slow for use as it had to save the design and thumbnail when user switched apps, for which the OS gave you limited time. In the end it was mostly a ‘for fun’ project since its use would be limited to iOS so never did anything more with it.

Your post reminded me of this and that perhaps with the advancements of mobile devices if a new common format would be viable. From my quick research ETC on iOS is supports from A8 core and up, and I assume for Android any opengl 3.0 device. So that should cover most devices from the last 4 years at least, which is around my cut-off for supporting the apps I make for clients, so a good fit.

I see this in the documentation now here : Unity - Scripting API: Texture2D.Compress

Are there any plans to backport the iOS and Android part of this to 2019.4 LTS?

We don’t have any plans to backport this.

Are you sure? I just tried running compress on a texture in OpenGL mode, and it compressed to DX1 making the textures all white in OpenGL… How am I supposed to choose what compression mode is used, there is only a bool quality parameter given to compress?

Edit: Seems it is working when running in WebGL, but not when running in OpenGL mode in the editor. Probably just a bug.

All OpenGL devices support DXT1, it’s required by the spec. It’s one of the few things OpenGL actually requires. If your textures are showing white on desktop OpenGL, that’s a bug you should report. But it should be compressing to DXT1 in that case.

OpenGLES doesn’t require DXT1, but does require ETC. You can’t run GLES on most desktop GPUs natively, so if you’re in the editor running an Android or iOS project, the editor is still running the game using Direct3D. If that’s showing white textures, that too is a bug that you should report as either it should still be compressing to DXT1 in the editor, or it should be compressing to ETC and immediately decompressing into an uncompressed texture for display on desktop hardware since that hardware is unlikely to support ETC.

Basically it’s unnecessary to have a switch because it’s rare that a platform supports both. Usually it’s explicitly dictated by the API, or at least the GPU being used (as is the case for WebGL, which doesn’t explicitly require either).

@TOES
I ended up making a plugin out of the cpp code in QuickETC (not ETC2 branch). It works just barely fast enough for my needs. However, a few days ago I stumbled upon this writeup here : https://zhuanlan.zhihu.com/p/327045410 , which was quite the revelation to find.

I will probably look into this soon and switch over to using some solution similar to theirs, as it was my initial goal to have it done on the GPU rather than CPU, for a few reasons.

1 Like

It’s actually not required by the OpenGL spec; EXT_texture_compression_s3tc is a really old extension but it’s not required or part of OpenGL core (mostly for IP/patent reasons).

1 Like