Efficient way of reading from a RWTexture2D in Compute Shader

I’m trying to find an efficient way of reading a RWTexture2D from the GPU to the CPU. I have a relatively small 1024x1024 Render Texture which is assigned to the compute shader.

On the CPU side when I attempt to read it back with ReadPixels it is really slow.

Texture2D destTex = new Texture2D(widthInt, heightInt, TextureFormat.ARGB32, false, true);

RenderTexture.active = rTex
// Really slow 0.5 seconds
destTex.ReadPixels(new Rect(0, 0, widthInt, heightInt), 0, 0);

// process results
Color[] pixels = destTex.GetPixels(0);

// do other stuff here

How can I efficiently read the Texture back from the GPU?

Make a direct request for the relevant data instead of copying it into the CPU side of a RT. You can use an async method that waits for the request to finish and then does something with the result once ready instead of stalling the main thread until the request is done.

I did see this though this code is inline with a whole bunch of dependencies and my code has to run multiple times in a loop. I have no problem with stalling the main thread (it is not rendering in real-time so frame rate is not applicable).

Are there any sample of how this method can be used inside say a for loop?

Actually it looks like AsyncGPUReadback is slower than buffer.GetData(…) which I’m currently using. I had hoped using a RenderTexture might be faster, but that seems to suffer from an equally slow ReadPixels method.

Yeah if your texture is fairly large the ReadPixels is going to be a little slow, that’s one of the unfortunate downsides about transferring data from the GPU to CPU, it’s one of the slowest things to do.

You can speed up your code a bit after the copy though, instead of using GetPixels() you can use the newer tex.GetRawTextureData<Color32>(); which will give you direct access to the texture array data to iterate over.

In fact, as long as you don’t regenerate your Texture2D object, this collection reference will continue to point to the current CPU side texture data. So you can set up this reference on startup, just do your ReadPixels and now you just iterate over that already assigned collection.

private static NativeArray<Color32> CopyRT(RenderTexture rt, ref Texture2D toTexture)
{
    if(toTexture == null) { toTexture = new Texture2D(rt.width, rt.height); }
    else if(toTexture.width != rt.width || toTexture.height != rt.height) { toTexture.Resize(rt.width, rt.height); }
    //currently assuming you've already set rt as the active
    toTexture.ReadPixels(new Rect(0,0, rt.width, rt.height), 0, 0);

    return toTexture.GetRawTextureData<Color32>();
}

private static void Example(NativeArray<Color32> nativeTex)
{
    foreach(var col in nativeTex)
    {
        Debug.Log(col);
    }
}

But a question would be, do you even need this to be on CPU side? You can use Graphics.CopyTexture() to copy data between RTs on the GPU side very quickly instead of bringing it to the CPU side.

Yeah, I’m updating terrain detail data. The slowest bit is a very lengthy 0.5 seconds to copy either a buffer or a RT back to the CPU. Would be nice if we could just get Unity to push it to the terrain detail data on the GPU. The GetRawTextureData for NativeArrays could be useful in the future.

What about the terrainData.CopyActiveRenderTextureToTexture and Heightmap methods?
Or getting the references to the Textures for the terrain data you want to overwrite and using Graphics.CopyTexture?

So yes, they are available for splatmaps and heightmaps but not details (grass).

Is it the texture of the grass objects themselves you want to modify? Or the layout of the grass?

The grass patches. I’ve had working compute shaders for ages but the biggest bottleneck is now really just reading back the data from the compute shader to the CPU and then updating the terrain detail layers. Copying the data from the GPU is very slow - and I wonder why.

Happy to take say a 50ms or even 100ms hit per 1K texture but 500ms seems weird.