Hi !
I’m trying to use unity as a simulation tools. I need to use a lot of cameras and by a lot i mean something like 48+, the more i can have the better. For now this is on a single machine.Every render gonna be sent over network in a local environment.
Each camera render at a low resolution like 640x360 and my goal is to do 3-5 fps. (This helps a lot.)
Since last week, I’ve been working and testing some stuff. I was able to understand the basics of going from camera.Render to RenderTexture to Texture2D to GetRawTextureData. I use a coroutine to split the workload on more game frame which should run at 30 fps or more (Main camera), we have a ~200ms of frame time for the cameras so it’s a must.
One of the big challenge was Texture2D.ReadPixels, it’s slow since it force a Gpu-Cpu sync… Can’t we bypass that some how ? Can’t i just retrieve data from de RenderTexture.colorBuffer ? Even then, there is no magic, and there is going to be a sync anyway. How could i optimize that for multiple Texture2D ? For now i read them in sequence, most of the time it’s the first one taking a lot of time, others are pretty fast, which make sense… Could i try to use 1 big Texture2D and tiled it for each camera ? But this way i can’t mitigate the sync overhead over multiple frames…
I’ve been able to get something acceptable running in the editor at 4-6 fps with 48 cameras on an empty scene. But it doesn’t include the “last” part GetRawTextureData. Now the bottleneck is the GC with 32MB Alloc which force a Collect that can take 20-40+ ms… Is there a way to pass a byte array so i can reuse it ? Obviously it’s not in the doc, but i don’t understand why you can’t do that ! It would help a lot with this kind of work…
I found this article, it talks about how they solved the same problem of GetPixels. But they don’t explain how they did it, just that they were able to run de GC in a background thread… I tried it but maybe my implementation is terrible cause it doesn’t help at all. Since i have a “huge” amount of memory to take, i think the GC is called multiple times in the same compute frame. If i try the same approach and mitigate the impact over multiple frames, it makes my compute frame time go too high…
So what do you guys suggest me to do ? I would like to stay away from native C++ code, but if it’s the best solution, i can look into it.
(Btw idk if i’m in the right section of the forum :/)