Read Pixels from GPU to CPU

Hello .. What is the best / fastest way to read color values from a TEXTURE or TEXTURE2D that is only on the GPU. Seems like all the GetPixel functions return what is in CPU memory, (which is all grey for me) as the TEXTURE I want to read is generated at runtime.

I thought a CopyTexture() from TEXTURE to TEXTURE2D may allow me to use the various GetPixel methods, but it only seems to copy the texture from GPU to GPU and the CPU version of the TEXTURE2D is just grey.

I found some method which copies first to a RENDER TEXTURE, then uses READ PIXELS ... I got that working but it is way too slow.

And I see TEXTURE has a GET NATIVE TEXTURE POINTER ...
And I see a ASYNC GPU READBACK avenue that looks promising ...

I don't need to modify the TEXTURE at all ... just need to read colors from it as fast as possible.

Seems simple, but have been messing with this for a couple hours and am lost. What am I missing?

Thanks for any advice.

GetPixel() only reads from the CPU side data, which for most textures will be entirely blank as once a Texture2D is uploaded to the GPU the CPU side data is flushed unless the asset is explicitly setup not to (Read/Write Enabled for imported textures, or .isReadable for textures created via script).

CopyTexture() will copy the data between textures, but not between the GPU/CPU barrier. The data on the GPU is copied to another GPU side texture representation, and if both textures are readable on the CPU then it'll also copy the data on the CPU side to the other texture's CPU side representation.

Calling ReadPixels() on a RenderTexture is the primary way to get data back from the GPU to the CPU. As you noticed, this is slow. There are multiple reasons for this. But the short explanation is everything about how GPUs and real time rendering is designed to work is for data to go from the CPU to the GPU and not the other way around. Calling ReadPixels() requires the GPU to stall and stop doing everything it's doing to copy data back to the CPU. If it's busy doing something else when you call that function, you're stalling the CPU waiting for the GPU to finish what it's doing, then you're stalling the GPU while it's copying data back. It's a bad time. If you're lucky about when you're calling the function, or your GPU is fast enough to have already finished rendering the previous frame by the time you call it, and the amount of data you're reading back is small (I.E. you only grab a single pixel rather than the entire image) it can be fast... but usually it is not.

The native texture pointer is a red herring. It's for letting you pass references to textures to other executables or native plugins that are handling their own GPU rendering. It's still a pointer to a reference of the data that's on the GPU, not the data itself as that's still on the GPU and not the CPU.

Async readback is indeed the best option for getting data back from the GPU to the CPU. It's technically always going to be slower than ReadPixels(), but it's letting the GPU finish everything it needs to and letting it copy data over when convenient. Depending on how GPU heavy your game is, and a few other factors, this can take between one and several frames before the data is returned. If you need the data that frame you will have to use ReadPixels(). If you can wait a few frames, use async readback.

Here's a reference project showing how to use it:
https://github.com/keijiro/AsyncCaptureTest/tree/3d8dd3667c4c06f6149496b46c6b0a7acf90cd87
Note, this link is explicitly not to the latest version of this repo, because the latest version has been updated to use a new class introduced in Unity 2023. If you're using Unity 2023, use the latest version, otherwise use the above linked version.

However, one last note. If you're copying data from the GPU to the CPU, it almost always means you're doing the wrong thing and there's another better way you should tackle the problem. About the only reason you should every be copying data from the GPU back to the CPU is if it's data you need to save to disk, like a screenshot or the results of some expensive GPU side process for baking data. Often times if it's something that's time sensitive the optimal solution is to just do the calculations you want to copy back from the GPU on the CPU.

7 Likes

Thanks for all that great info. At least I know I am not missing something obvious now. Seems like reading from GPU into CPU is not a simple process if decent speed is needed.

I don't need this for a game. I am doing a sports training "replay" system where I am reading from multiple external cameras, storing the last several seconds worth of video footage from each, then replaying them in a loop right after. It is a teaching tool. I started with Unity's WebCamTexture, which showed promise, but was too slow to handle multiple cameras. Then I found a capture asset on the Asset Store that was much faster, and got my system up and running with 3 cameras at 90+ fps. So all good there.

Only problem is I have to manually "start" and "stop" each replay with space bar, which is OK ... but I thought if I could detect a ball moving through the frame of CAMERA 0, then I would know automatically at what point the newest "replay" should start and end based on that timing. As in ... I just saw the ball travel through the frame, I know I need to play footage starting 3 seconds ago and ending in 2 seconds .. and maybe grab some speed and location info from it as well if possible .... but it has to be done most every frame as the ball can travel fast through the frame.

The capture asset I found stores the camera footage on GPU textures, but the CPU side is blank. It turns out to have a GetPixels32 function, which I didn't see originally. It performs reasonably well considering, although it gave me fits at first. So that is what I am trying to make work now ... I just thought there must be a way to "see and process" the camera frame without copying anything into a new buffer to speed things up .. but doesn't look like that is possible with Unity?

Not super sure how to go about the detection though. Right now, I am holding a reference frame with no ball, and just trying to compare each new frame to the reference frame and find differences. The slight noise in each makes it difficult. And the ball could be any color, so I can't just look for a certain color. I just began to try to desaturate the frames and blur them slightly before comparison, but so far my crude comparisons don't yield good results, and take me down to 6 fps. Hahahaha. More work to do.

I have looked a little at OpenCV as I think it is designed to do stuff like this, but the learning curve looks steep and not sure I have the time to wade through it in full right now ...

Any ideas anyone??? Or am I stuck with SPACE to start, SPACE to stop each time ... LOL

Thanks!

I'm guessing that capture asset is a native plugin, meaning it's handling the camera footage itself outside of Unity which is why it's so much faster. The GetPixels32 function it has is probably a way to copy the data from it's own c++ code into Unity's c# memory. And that might be the only and fastest option you have to do that. However processing video on the CPU is going to be slow as heck even if the data copy was fast. You should absolutely be using something like OpenCV, or even just compute shaders, to handle this. Image processing on a video to find a ball is kind of what OpenCV exists to handle. Though basically anything OpenCV is capable of you can do with compute shaders instead if you wanted, though you won't get all of the benefits of the preexisting algorithms OpenCV includes. Otherwise detecting an object in a video feed is not a simple thing. There's a reason why computer vision is a doctorate level in academia.

2 Likes

I ended up taking 3 seconds at startup to create an “average” REF frame to compare the live footage to. Averaging the RGB values over time seems to have eliminated the noise issue I was having without any per frame processing, and gives me a more stable REF frame to compare to. When the live footage rolls in I compare it to the REF. I only look at the side of the frame where the ball will enter. I use a separate low res camera at 60fps to detect the ball and only compare every 2nd or 3rd pixel to REF.

When there is no ball in the frame it finds between 0 and 10 pixels out of range … When the ball enters it jumps to 100+, so I know when the ball enters. Is crude and hacky, but actually works pretty well for my purpose and maintains 80+ fps even on my older PC. If I hit space it records a new 3 second REF image just in case it starts to see false positives, but really haven’t had to use it yet in my early testing.

From here should be trivial to find center-ish of ball if I want to draw a trail or something at some point, but am happy with the result so far. I have an automated, looping instant replay. When I have more time I would like to look into OpenCV or compute shaders and do this project more professionally, but as this project is just for me I should be good for now.

Thanks so much for your input. You obviously have a lot of knowledge!

2 Likes