ReadPixels stalls the pipeline in Unity 5

I try to read pixels from a RenderTexture at random frames in my game for terrain generation. No matter what I try, it causes a serious slowdown and it doesn’t matter if I read 1x1 texture or 1024x1024, the pause in execution is pretty much the same (~40 ms).

If, just for testing, I put the ReadPixels in a for loop (0…100), only the first call blocks the execution. Later ones are extremely fast (~0.1 ms), so the first one is definitely stalling the gpu pipeline. It is not caused by my code, as I render to the source texture few frames earlier.

I’ve tried to run it in Update, FixedUpdate, LateUpdate, coroutines with WaitForEndOfFrame etc. with just minimal differences. The only exception is the first frame of the game - it’s fast there.

I remember reading in Unity 5 change log that rendering is now more parallelized, so possibly that’s the problem? I’ve used this method in Unity 4 and didn’t had such problems.


Have any of you used ReadPixels in Unity 5 and profiled the performance?

Any help is much appreciated. Thanks.

I’ve just tried doing the same work with compute buffers and while they are faster in simple use cases, in complex scenes they block even more.

Both methods seem to be unusable in real world scenarios. Unity would benefit from asynchronous versions of these as both DX and OpenGL have such API.

I just encountered the same problem. Any solutions for this yet?

I would think ReadPixels is equivalent to DX11 CopyResource, i.e., the GPU does the copy. You can verify this with a frame capture tool like Intel GPA.
I’m curious what you do with the RenderTexture/Texture post the ReadPixels. If you edit the read values and send that to the GPU, I’d expect a stall because the resource update has a dependency.

Thanks for the reply.

I use this data to set the heightmap of the terrain, but that’s probably irrelevant as I do that few frames later and my code look like this:

Stopwatch s = new Stopwatch();
s.Start();
RenderTexture.active = targetTexture;
tex.ReadPixels(new Rect(0, 0, 1, 1), 0, 0);
s.Stop();
Debug.Log(s.ElapsedTicks / 10000.0f);

Debug.Log above reports execution time of 40ms. If I make my scene ridiculously simple it may go down to 15ms, but even that is far beyond what others at the forum report in some older threads. The looped test I mentioned earlier looks like this:

for(int i=0; i<100; ++i)
{
Stopwatch s = new Stopwatch();
s.Start();
RenderTexture.active = targetTexture;
tex.ReadPixels(new Rect(0, 0, 1, 1), 0, 0);
s.Stop();
Debug.Log(s.ElapsedTicks / 10000.0f);
}

The first iteration lasts for 40ms, while the later ones are very fast, but that’s hard to interpret as that may be caused by some kind of caching by the display driver.

I’m using this to convert a RenderTexture to a Texture2D in order to use it as sprite. If the texture’s size is fullscreen there is a noticeable lag everytime this is done. However it works better at smaller resolutions.

@Martin1 In the end it will still be used on the GPU right? Can’t you somehow just use the RenderTexture? ReadPixels is the last resort.

To be honest I don’t know exactly how Sprites work in Unity. However I couldn’t get a SpriteRenderer to work with a RenderTexture. Maybe Sprites have to be processed on the CPU in order to work with PolygonColliders?

If you have to use sprites then you are probably doing it the right way, but maybe you could just place a quad into the scene with your RenderTexture set in its material?

This would work if I just wanted to display the sprite, but the intention behind all this is to create polygon colliders. I’m still in the prototyping phase, so I’m not sure if this is the way to go. Afterall the polygon collider also isn’t working as expected.
So I’m also looking for completely different approaches to this in my thread:
http://forum.unity3d.com/threads/create-colliders-for-dynamic-2d-terrain.339558/

Getting data from the Gpu WILL cause a stall if the same texture is used that frame! As the GPU is NOT done when you ask for it, hell, it might not even have started doing what you told it to!
What you want to do:
You tell the GPU to create the texture,
the NEXT FRAME you get the texture. (Make sure the texture is NOT used that frame!).
If you need to use the texture at the same time, then tell the GPU to copy it to a render texture, and read that the frame after.

It could even with this potentially still stall, and in that case you could try using a compute shader to copy the data out to such a struct, and see if those cause stalls or not.

Well, once again, it’s probably not because of caching, but because then the GPU is locked, and letting you read! (Or at the very least done and not using it, or waiting for it to be created).

@Zicandar Thanks for the reply. Actually, that’s how I do things at the moment, but still the stall is very consistent. The RenderTexture and Texture2D are created at the start of the app and all operations on them are separated by a few frames to let the GPU keep up.

The strangest thing is that the more frames I wait after rendering to the RT the longer the stall gets during the ReadPixels call (up to 100ms). I have no idea what may cause that. That’s the opposite of what I’ve expected.

The only thing I can think of in that case is cache?
Have you tried with compute shader as a go-between? (Output to a compute buffer perhaps even?)

Yes, I’ve tried them too and it was even slower in a full scene :eyes: I’ve directly profiled ComputeBuffer.GetData with a .NET Stopwatch.

Well, somehow streaming programs manage to get that data without THAT much issue, perhaps google for such?

my profiler in unity 4 reports 2.5ms on every ReadPixels i do, it’s not much but i would like to reduce this time

the render texture i am reading from GPU is very small 206x102 pixels i also tried to read the next frame, and wait till end of frame and all that with no results, if i read just one pixel it takes 2.5ms as well

no idea how to speed this up

what is this… Pixel Buffer Options…thing? can we do that in unity?

http://stackoverflow.com/questions/24495410/how-to-read-a-pixel-depth-value-without-stalling-the-pipeline

You can asynchronous pixel transfers via Pixel Buffer Objects (PBOs). When you issue a read call without PBOs, the pipeline is flushed and the CPU has to wait for the GPU to finish rendering and transfering the data.

That is my general worry, and I’m not sure how this is treated on DirectX.
2.5ms is actually a very small amount of time for a readback, as you have to remember that it’s in physically different memories if using a dedicated graphics card. Often 15-100ms spikes can be seen!
Are you sure that the render texture is NOT BOUND during the entire frame you try to read from it? As if it’s used at all, (either reading OR writing), you can run into problems.

Try double buffering the render texture. You’ll get frame data a frame late but it might improve matters.