Texture2DArray bad performance on Oculus Quest

Hello,
For a time, I have been working on Oculus Quest on an impostor system that uses GPU instancing and Texture2DArrays (compressed in ASTC6x6) to populate the scene with planes with an animated texture:

Instancing works like a charm, managing to put thousands of impostors on screen without much effort, which is impressive for a standalone headset like Quest.
The main issue we are falling on is that, for whatever reason, TextureArrays are very expensive to access on Quest.
We use the arrays as a standard spritesheet to select which frame of the animation we want (everything on a shader). We initially had two arrays, one for albedo and another for normals, and was after dropping the normal array when we realized how expensive they are.

My question is, why are they? Isn’t texture arrays supposed to avoid context changes on GPU? Would it be better to have a gigantic texture, instead of little ones?

It’s there an alternative to send GPU multiple frames of an animation without use Arrays? I heard of Texture3D, but could it be less performant on Quest?

Sprites are packed into an Atlas in unity, not a Texture array. This would be the more optimal way to go about this. Pack your frames into an atlas/sprite-sheet and offset the UVs to sample different frames.

Also the performance of arrays will partly depend on how the value is being calculated to determine which element in the array to sample. Are you computing that in the fragment or vertex shader? Or most optimally, is the index value being set as a shader property each frame?

The problem I found with this approach, is that the array we use is incredibly extense (from 30 to 60 slices, 500x500 pixels each one). If we try an atlas, I fear it will break the texture size limits of Unity.

That’s what we are doing, we send it from CPU via C#.

Something we thought about: We are using SampleTextureArray on a surface shader, and we were planning to move it to the Vertex function, extract there the texture (since the mesh is a quad, it will only do it four times, way less than one for each pixel) and pass it to the surface.
However, we have found no way to entirely extract the texture from an array slice. The only function associated with TextureArrays is Sample, which returns a color.

Unity supports 8k, and now even 16k in 2020.1. The Adreno 540 in the Quest also handles at least 16k.

A single 8K texture can hold 256 of your 500x500 textures. Even a 4096px texture would be good enough for your purposes, it will hold 64 of your slices.

You are using UNITY_SAMPLE_TEX2DARRAY(array, uv) , correct?

And no you can’t pass the reference of the texture grabbed from the array in the vertex program to the fragment program, as there’s no reference binding in the per-vertex struct, and if anything it would just slow the system down, creating more data to push through the pipeline. There isn’t really a cost to accessing the texture at a specific index, as long as that index is already precomputed.

One thing I was thinking perhaps is that you’re hitting bandwidth and/or VRAM size limits if you have multiple sets of these 30-60 slice 500px textures. But I can’t find any hard numbers on the VRAM in the Quest nor the Adreno 540, just that the Snapdragon SoC has a sysram size of 4GB. So if you’re exceeding that limit, then that means much slower drive reads to stream new slices in that can’t fit into memory. Are you able to profile the Quest’s hardware from within Unity to test that?

Turns out that the Arrays have nothing to do, and what was killing performance was the discard operation we were doing

2 Likes