AsyncGPUReadback spikes?

Are these spikes to be expected from an async readback?
100x100 default fractal readback to float


Here is the code:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Experimental.Rendering;
using Unity.Collections;

public class SimpleComputeShader : MonoBehaviour {
    public ComputeShader shader;
    [SerializeField] Vector2Int size;

    NativeArray<float> data;
    ComputeBuffer buffer;

    void OnEnable()
    {
        buffer = new ComputeBuffer(size.x * size.y, sizeof(float));
        StartCoroutine(AsyncExtract());
    }

    IEnumerator AsyncExtract()
    {
        while (true)
        {
            //compute
            shader.SetInt("width", tex.width);
            shader.SetInt("someValue", (int)Time.time);
            shader.SetBuffer(0, "data", buffer);
            shader.Dispatch(0, tex.width / 8, tex.height / 8, 1);

            // extract
            var request = AsyncGPUReadback.Request(buffer);

            yield return new WaitUntil(() => request.done);

            var dataz = request.GetData<float>();
        }
    }

    void OnDestroy()
    {
        buffer.Dispose();
    }
}
#pragma kernel CSMain

RWTexture2D<float4> tex;
RWStructuredBuffer<float> data;
int width;
int someValue;

[numthreads(8,8,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
    tex[id.xy] = float4(id.x & id.y, (id.x & someValue)/15.0, (id.y & someValue)/15.0, 1.0);
    data[id.x+id.y*width] = tex[id.xy].y;
}

just for reference, saw this in twitter,

2 Likes

No definitely not expected.
Are you sure those spikes are due to async readback ?
What platform are you on ?

the spikes disappear when i remove the request, you can try
platform is win10/i5 8400/gtx1060

DX11 or DX12 ? Do you have by chance a minimal project that reproduces the issue ?

I don’t know which dx, the player is set to auto API
here is the minimal project

3357482–262772–BUG async spikes.7z (1.51 MB)

did you try this? II get this on that line

        tex.SetPixels32(buffer.ToArray());

Maybe something’s off with my windows install because I don’t imagine he would release a buggy project.

If you open the output_log.txt file, that is created by the player, you should find something like this:

On Windows 10, the player log file is stored at:

I logged a bug so you can get more information on the system I’m using 988876

So I profiled the minimal project you posted and wasnt able to reproduce the spikes you’re talking about (I tried with a 980 and a 1060 on win10 with 2018.1.0b2)

However I suggest you dont rely on profiling directly in editor (as there can be spurious spikes due to the editor env) but attach to a development build instead. See if you can reproduce the spikes by profiling a dev build.

Note that in the project you posted, you’re reading back 4MB of data, this is not free. Depending on your bandwidth it should take between 0.5ms and 1ms to transfer and probably as much for the cpu copy/conversion to the request buffer.

thanks for taking a look.
you’re right, I’ll keep that in mind. I’ve always only profiled c# which count in the 10ms so any spikes caused by the editor was burried in this, gpu is another beast :slight_smile:

What do you suggest?

In the image below, this is a 1000x1000 float array that’s being downloaded to the cpu. I can see 1 ms of rendertexture which I’m not asyncing but I don’t see any such slowdown from downloading the array itself, what would it look like in the profiler?

from the build I see time spent rendering but where is the array transfer?

by the way, what’s the favorite way to get rid of that RenderTexture.SetActive ?
self answer: Request(RenderTexture)
How do you combine requests, say I want the data and the rendertexture but don’t want to do two requests?

Your profiler screens with the dev build look ok to me now.

Yes you should see async readback markers on main and render threads (CPU profiler), but it probably lacks some GPU markers to have them appear in the GPU profiler.

About not seeing the transfer time, it can possibly be overlapped with other computations (async DMA transfer while the GPU is still working). However adding markers will disable parallelism when profiling.

it’s fine the way it is because it’s very visible in the cpu and indeed this aint free

How would you download 4Mb from the gpu, in unity?
I’m wondering if this is possible with the way you do things: slow down the transfer to the cpu so the request may take 2x as long but impact stuff much less, kind of like doing a calculation across more frames with a coroutine.

Or alternatively reduce the precision, I did just that, using half, half2 and half4 in the compute shader but … maybe the size of data transferred is determined on the c# side. Is there a low precision vector4?

    public NativeArray<half4> data;

Sure the cost is linear with the size of your data. Make sure to readback only what’s needed, in the form you need to read it back.

Normally the copy/conversion cost is on the render thread but it seems you’re using single thread mode.

If really the cost is too high, you can always slice your readbacks (you can pass some window/box parameters to you request), then assemble your texture/buffer with n frame readbacks on cpu. For instance instead of reading 4mb, you read 500k for 8 frames. i.e.more latency to absorb the cost even more.

3 Likes

what’s the most economical way to pass a 2D array of values?

do you mean turning on render jobs would help?

I’m also seeing performance spikes, but I don’t think think it’s AsyncGPUReadback.Request() anymore.

I see them both in editor and in standalone builds. In the standalone build, I got huge perf spikes every other frame. In editor, it appears to be more random, but the spikes are still there. The offending routine always appears to be under Camera.Render() / Gfx.WaitForPresent, which can range from not appearing at all to 20ms, and I see as high as 83ms in there.

I’m on Direct3 11.0 [level 11.1] (according to output_log.txt), and the Renderer is NVIDIA GeForce GTX TITAN X.

Also noticing the profiler says the GPU is always using 0.00ms of time now.

I tried removing my AsyncGPUReadback.Request’s, and it had no effect. I think it’s primarily from loading up the GPU with a bit more than it can bite off, and then it goes into some kind of degenerative state, presumably because the next frame, I’m trying to render the same thing and it hasn’t finished the last frame yet?

Is there some way to make renders synchronous so that I can see how much time they’re really taking?

Hey Dave.
I’m confused, does it have something to do with async readback ?
Maybe you can file a bug report with a repro project.

Here is the page that explains how Unity Technologies wants us to report bugs: