The Barracuda documention has badly outdated example code for Scheduled Execution:
Tensor ExecuteInParts(IWorker worker, Tensor I, int syncEveryNthLayer = 5) {
var executor = worker.ExecuteAsync(I);
var it = 0;
bool hasMoreWork;
do {
hasMoreWork = executor.MoveNext();
if (++it % syncEveryNthLayer == 0)
worker.WaitForCompletion();
} while (hasMoreWork);
return worker.CopyOutput();
}
This code uses multiple deprecated APIs (worker.ExecuteAsync(), worker.WaitForCompletion()).
The documentation for IWorker gives a completely different approach for asynchronous inferencing:
IEnumerator ImageRecognitionCoroutine() {
//[...]
using (var input = new Tensor(imageToRecognise, channels:3)) {
// execute neural network with specific input and get results back
var output = worker.Execute(input).PeekOutput();
// allow main thread to run until neural network execution has finished
yield return new WaitForCompletion(output);
//[...]
}
}
I also found a third method described in a Github issue:
IEnumerator ImageRecognitionCoroutine() {
//[...]
using (var input = new Tensor(imageToRecognise, channels:3)) {
yield return worker.StartManualSchedule(input);
var output = worker.PeekOutput();
//[...]
}
}
First: Unity team, please update the documentation so that the “Model execution” page doesn’t use deprecated code.
Is there a preferred method of asynchronous execution? Any additional information on working with the second and third methods described above?
For context - we are trying to use Barracuda on the Microsoft HoloLens 2 for inferencing from a low-resolution 3-channel image (currently testing with GPU inferencing). We’ve found that the second execution method described above (using yield return new WaitForCompletion(output)
) takes about 1.1 seconds to run inferencing on our test network on the HL2. This would be acceptable for our use case, but the execution appears to be completely synchronous (the application freezes until inferencing is finished). The third execution method described above (using StartManualSchedule()) does work asynchronously on the HL2 (the application does not freeze), but inferencing takes 8 seconds, which is way too long.