I see in the ML blog that there is an allocation per episode, baracuda is the cause, is this something you’ll end up turn to 0 alloc? if it’s even possible
Hi @laurentlavigne ,
My understanding is that metrics in the blog post ML-Agents v2.0 release: Now supports training complex cooperative behaviors are excluding alloc from Barracuda.
These metrics exclude the memory used by Barracuda (the Unity Inference Engine that ML-Agents relies on for cross-platform inference):
However at inference Barracuda is indeed allocating some memory in most case 88 Bytes per layer, aka the output Tensors. We definitively want to get rid of that however, 0 alloc is the goal!
FYI @laurentlavigne 0 alloc at inference have been added to our backlog (no ETA however for now).
Thanks for the heads up @fguinier . You’re right, I misquoted. Is it 88 byte/layer/frame? I’m sure this isn’t stopping anyone, just good to know ahead of time what to expect when I test performance.
Yes, that is current expectation 88B/layer being executed, so if you execute model everyframe 88B/layer/frame :).
As a note: You might see some layers allocating more for some networks, those are usually quick fix. Feel free to report them we did a pass on recent releases (1.3/1.4) but i’m sure some remains.
Will do.
Are there any published projects using barracuda? I’d like to get a sense of what’s possible.
@laurentlavigne you can check out these projects from Keijiro Takahashi
Hand Pose, Blaze Palm, Blaze Face, Iris tracking
Hey guys, any chance to get some eyes on the GC allocs? I went through the code and there’s tons of low-hanging fruits.
I hacked at some allocations and went from 12.7kb to 1.1kb per frame (executing Keijiro’s BlazeFaceBarracuda). And I don’t even know a thing about Barracuda, I just followed the deep profiler.
@apkdev how were you able to reduce the allocation per frame by yourself? I’m still encountering crazy gc allocations like you were.
I profiled the code, looked for the allocations and got rid of them, case by case. I didn’t post the code because these were low-quality workarounds. They seemed to work well on BlazeFaceBarracuda but I could have introduced some bugs, didn’t have the time to do extensive testing.
If you’re gonna do this, make sure you test each change; it’s easy to break something and then you have to backtrack a lot. AFAIR it was mostly classic GC fixes: don’t create temporary objects just to call one method, cache arrays for method arguments, basic stuff.
If you ask me, Unity should have a rule that per-frame GC allocations are considered bugs that need to be fixed before a new package version is published. Maybe even have tests to catch regressions - it seems doable with the performance testing package.
Thanks! Yeah I’m looking through the code now, it looks like for my use case (v3.0.0) the garbage comes from 3 places: SharedArrayTensorData.Download() where they initialize a new array every time, TensorCachingAllocator.Reset() where m_BusyTensor.Keys.ToList() is used to allocate a new array, and TensorCachingAllocator.AllocTensorInternal() where a new Tensor is created.
I totally agree 100% around your comment around per-frame GC allocations. This is the last major source of GC allocations in my project.
@fguinier do you happen to know if these GC allocations are fixed in the upcoming v4 release?
Hello, we are making some GC improvements. @airoll could you share your example again with us if you still experience this issue? The old link seems to be removed.