App crashes when evaluating model with GPU compute on some Android devices

I am able to run my model on my laptop, and on CPU on mobile, but when running with GPUCompute the app crashes on a Motorola G9 Play (Android 11) device. However, the app runs on a Pixel 7 Pro (Android 13) with GPUCompute without issue.

When the app crashes the final outputs are

... (lots of layers)
01-12 04:53:00.567  5608  5652 I Unity   : Conv - name: Y, inputs: [/conv_act/Mul_output_0, conv_out.weight, conv_out.bias], fusedActivation: None, group: 1, strides: [1, 1], pads: [1, 1, 1, 1], dilations: [1, 1], autoPad: NotSet, kernelShape: [3, 3], fusedActivation: None
01-12 04:53:00.568  5608  5652 I Unity   : Unity.Sentis.DefaultVars

which appears to suggest that model evaluation has succeeded (“Y” is the name of the output), and reading from the GPU is the problem.

More adb logs from the time of crash:

01-12 05:15:41.194  8218  8401 W Adreno-GSL: <gsl_ldd_control:553>: ioctl fd 80 code 0xc040094a (IOCTL_KGSL_GPU_COMMAND) failed: errno 35 Resource deadlock would occur
01-12 05:15:41.194  8218  8401 W Adreno-GSL: <log_gpu_snapshot:462>: panel.gpuSnapshotPath is not set.not generating user snapshot
01-12 05:15:41.195  8218  8401 W Adreno-GSL: <gsl_ldd_control:553>: ioctl fd 80 code 0x400c0907 (IOCTL_KGSL_DEVICE_WAITTIMESTAMP_CTXTID) failed: errno 35 Resource deadlock would occur
01-12 05:15:41.195  8218  8401 W Adreno-GSL: <log_gpu_snapshot:462>: panel.gpuSnapshotPath is not set.not generating user snapshot
01-12 05:15:41.196  8218  8401 W Adreno-GSL: <gsl_ldd_control:553>: ioctl fd 80 code 0xc040094a (IOCTL_KGSL_GPU_COMMAND) failed: errno 35 Resource deadlock would occur
01-12 05:15:41.196  8218  8401 W Adreno-GSL: <log_gpu_snapshot:462>: panel.gpuSnapshotPath is not set.not generating user snapshot
01-12 05:15:41.196  8218  8401 W Adreno-GSL: <gsl_ldd_control:553>: ioctl fd 80 code 0x400c0907 (IOCTL_KGSL_DEVICE_WAITTIMESTAMP_CTXTID) failed: errno 35 Resource deadlock would occur
01-12 05:15:41.196  8218  8401 W Adreno-GSL: <log_gpu_snapshot:462>: panel.gpuSnapshotPath is not set.not generating user snapshot
01-12 05:15:41.204  1271  1351 E BufferQueueProducer: [SurfaceView - com.DefaultCompany.PixDiffusion/com.unity3d.player.UnityPlayerActivity#0](id:4f7000038e4,api:1,p:8218,c:1271) dequeueBuffer: attempting to exceed the max dequeued buffer count (2)
01-12 05:15:41.205  8218  8401 W vulkan  : dequeueBuffer timed out: Function not implemented (-38)
01-12 05:15:41.246  8218  8401 E CRASH   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
01-12 05:15:41.246  8218  8401 E CRASH   : Version '2022.3.17f1 (4fc78088f837)', Build type 'Release', Scripting Backend 'mono', CPU 'armeabi-v7a'
01-12 05:15:41.246  8218  8401 E CRASH   : Build fingerprint: 'motorola/guamp_retailen/guamp:11/RPXS31.Q2-58-17-7-3/ad9c24:user/release-keys'
01-12 05:15:41.246  8218  8401 E CRASH   : Revision: 'pvt'
01-12 05:15:41.246  8218  8401 E CRASH   : ABI: 'arm'
01-12 05:15:41.251  8218  8401 E CRASH   : Timestamp: 2024-01-12 05:15:41.246966678+0000
01-12 05:15:41.251  8218  8401 E CRASH   : pid: 8218, tid: 8401, name: UnityGfxDeviceW  >>> com.DefaultCompany.PixDiffusion <<<
01-12 05:15:41.251  8218  8401 E CRASH   : uid: 10674
01-12 05:15:41.251  8218  8401 E CRASH   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr --------
01-12 05:15:41.251  8218  8401 E CRASH   : Cause: null pointer dereference
01-12 05:15:41.251  8218  8401 E CRASH   :     r0  00000000  r1  c1bcaa60  r2  00000002  r3  00000400
01-12 05:15:41.251  8218  8401 E CRASH   :     r4  c4755358  r5  00000001  r6  0000023e  r7  00000000
01-12 05:15:41.251  8218  8401 E CRASH   :     r8  00000000  r9  00000001  r10 00000008  r11 bd180104
01-12 05:15:41.251  8218  8401 E CRASH   :     ip  00000001  sp  bc24be70  lr  00000100  pc  c7218ca8
01-12 05:15:41.252  8218  8401 E CRASH   : backtrace:
01-12 05:15:41.252  8218  8401 E CRASH   :       #00 pc 00a9fca8  /data/app/~~-ijLScJ941dgXfC5k-WHGg==/com.DefaultCompany.PixDiffusion-6v8IDz-WmvfxXPmk4gRRBQ==/lib/arm/libunity.so (BuildId: 72096e5a0ef6b55e448480e2451becd1c044b449)
01-12 05:15:41.252  8218  8401 E CRASH   :       #01 pc 00a9ae50  /data/app/~~-ijLScJ941dgXfC5k-WHGg==/com.DefaultCompany.PixDiffusion-6v8IDz-WmvfxXPmk4gRRBQ==/lib/arm/libunity.so (BuildId: 72096e5a0ef6b55e448480e2451becd1c044b449)
01-12 05:15:41.252  8218  8401 E CRASH   :       #02 pc 00a58580  /data/app/~~-ijLScJ941dgXfC5k-WHGg==/com.DefaultCompany.PixDiffusion-6v8IDz-WmvfxXPmk4gRRBQ==/lib/arm/libunity.so (BuildId: 72096e5a0ef6b55e448480e2451becd1c044b449)
01-12 05:15:41.252  8218  8401 E CRASH   :       #03 pc 00a8d634  /data/app/~~-ijLScJ941dgXfC5k-WHGg==/com.DefaultCompany.PixDiffusion-6v8IDz-WmvfxXPmk4gRRBQ==/lib/arm/libunity.so (BuildId: 72096e5a0ef6b55e448480e2451becd1c044b449)
01-12 05:15:41.252  8218  8401 E CRASH   :       #04 pc 00a87bbc  /data/app/~~-ijLScJ941dgXfC5k-WHGg==/com.DefaultCompany.PixDiffusion-6v8IDz-WmvfxXPmk4gRRBQ==/lib/arm/libunity.so (BuildId: 72096e5a0ef6b55e448480e2451becd1c044b449)
01-12 05:15:41.252  8218  8401 E CRASH   :       #05 pc 00bc3f55  /data/app/~~-ijLScJ941dgXfC5k-WHGg==/com.DefaultCompany.PixDiffusion-6v8IDz-WmvfxXPmk4gRRBQ==/lib/arm/libunity.so (BuildId: 72096e5a0ef6b55e448480e2451becd1c044b449)
01-12 05:15:41.252  8218  8401 E CRASH   :       #06 pc 00bc32e1  /data/app/~~-ijLScJ941dgXfC5k-WHGg==/com.DefaultCompany.PixDiffusion-6v8IDz-WmvfxXPmk4gRRBQ==/lib/arm/libunity.so (BuildId: 72096e5a0ef6b55e448480e2451becd1c044b449)
01-12 05:15:41.252  8218  8401 E CRASH   :       #07 pc 00bc305b  /data/app/~~-ijLScJ941dgXfC5k-WHGg==/com.DefaultCompany.PixDiffusion-6v8IDz-WmvfxXPmk4gRRBQ==/lib/arm/libunity.so (BuildId: 72096e5a0ef6b55e448480e2451becd1c044b449)
01-12 05:15:41.252  8218  8401 E CRASH   :       #08 pc 004c8d57  /data/app/~~-ijLScJ941dgXfC5k-WHGg==/com.DefaultCompany.PixDiffusion-6v8IDz-WmvfxXPmk4gRRBQ==/lib/arm/libunity.so (BuildId: 72096e5a0ef6b55e448480e2451becd1c044b449)
01-12 05:15:41.252  8218  8401 E CRASH   :       #09 pc 0008170b  /apex/com.android.runtime/lib/bionic/libc.so (__pthread_start(void*)+40) (BuildId: 9d4f6aa585db1e76cb15e0aa4299910e)
01-12 05:15:41.252  8218  8401 E CRASH   :       #10 pc 0003a50d  /apex/com.android.runtime/lib/bionic/libc.so (__start_thread+30) (BuildId: 9d4f6aa585db1e76cb15e0aa4299910e)

The model is here: small_unet.onnx - Google Drive
It is a very small (6MB) unet model (from HuggingFace diffusers, a Unet2DModel to be precise) with fixed input sizes (“X”: float (2x2x32x32) and “T”: int (2)) and output (“Y”: float (2x2x32x32)).

GPUPixel also runs without crashing on mobile but gives totally incorrect results on both mobile and laptop - this is an unrelated issue.

I am using Unity 2023.2.5f1, Sentis 1.3.0-pre.2 (but previously tried with Sentis 1.2 and Unity 2021.2). Have tried building with both IL2CPP and Mono backends (though only IL2CPP on the Pixel as it needs ARM64).

Any help would be really appreciated, thanks!

What does SystemInfo.supportsAsyncGPUReadback gives you on the failing platform?

As a test, try bypassing our code and the internal asyncdownload request and get

var tensordata = ComputeTensorData.Pin(output_tensor);
tensordata.buffer -> that gives you the underlying ComputeBuffer

from which you can either download directly or call the async download regularly

That should tell you if it’s still the download the issue or not

Hi, thank you for the quick response! supportsAsyncGPUReadback returns True (as does supportsComputeShaders).

I looked into the ComputeBuffer method and have realised the situation is a lot weirder than I previously realised. It’s probably not the download; even if I remove all read-related code and just run worker.Execute(inputs) and return, the app crashes! But it crashes a few frames after, something I initially didn’t realise. I call worker.Execute() and exit Update(), there are about 5 more calls to Update() where I do nothing, and then the app mysteriously crashes.

I tried reading the output before the crash but it appears to be entirely 0s

TensorFloat output = worker.PeekOutput("Y") as TensorFloat;
ComputeTensorData tensorData = ComputeTensorData.Pin(output);
float[] modelOutput = new float[batchSize * channels * height * width];
tensorData.buffer.GetData(modelOutput);
// modelOutput is full of 0s

(apologies if the above is not the correct usage of ComputeBuffer, I am not familiar with it)

EDIT: I tried using AsyncGPUReadback, similar result. I run the network on the first frame, Update() is called 3 more times, then the callback fires: the length of the returned array is correct (8192) but all the elements are 0, and then two more Update()s later it crashes (the number of frames is probably irrelevant but is surprisingly consistent).

Motorola G9 has a Adreno 610 and the Pixel 7 a Mali-G710 MP7 that’s a big difference in GPU capabilities.
I’ll put the issue on a out of memory (either GPU or CPU) or a timeout because it take too long to execute the model…
Did you try slicing execution on the Motorola? It will lessen a bit the pressure on the gpu

1 Like

Oh, you absolutely got it. Thank you so much, I was totally stuck on that for ages!! Should’ve tried that earlier, I didn’t realise you could time out… Thanks for the link, v helpful example. I was confident it wasn’t a memory issue as the model is so tiny, but it is still very deep (600 layers now I check…) so it must have been timing out.

It is a bit scary that you can time out like this and totally crash the app with no error messages anywhere.

I guess I’ll hardcode a number of frames to split over, but it feels slightly sad to do this when different devices have very different capabilities. Do you think there would be much overhead from splitting over multiple (say, 10) frames on a high end phone when it is required for the low end one?

Anyway, thanks again!

Nice!
Yeah we should automate this for you. Give it a model and figures out the best slicing scheme.

Typically I’d expressed the # of layer per frame as a percentage of the total layer count.
And then tie this percentage to the hardware capabilities of the device.
So lower end hw you run only 5% of layers and higher end hw you run 100%

But we should automate that and provide a hw dependant execution scheme.
I’ll keep it in mind

2 Likes

Added a task for automated slicing. It’s known internally as Task 410

1 Like