I trained a neural network using ML-Agents, training was without any problems on PyTorch side, but trying to infer the trained network using Barracuda is giving me a whole list of errors.
My RL agent uses 2 Buffer Sensors (one to encode teammates info, one encode room features info), 1 Grid Sensor with resnet (to encode neighborhood spatial info of the agent), and a vector observation. The agent setup is in the attached file (“Agent Setup.jpg”).
Upon adding the .onnx file created by ML-Agents into the Editor, I see errors regarding “Cannot reshape array of size 453152 into shape with multiple of 15232 elements at Unity.Barracuda.TensorExtensions.Reshape. The full error stack is in *”*Error Stack_on onnx file added.jpg". Inspecting the onnx file I see the warning “model detected as NCHW, but not natively in this layout, behaviour might be erroneous”.
When using only Buffer Sensors or only Grid Sensor, inference has no problem. It is when both are used together that Barracuda seems to fail. I have uploaded my trained model in “NNModel Onnx File.zip”
I am using:
ML-Agents Release 18 (uses Barracuda 2.0.0)
PyTorch 1.7.1
Unity Version 2019.4.1f1
Windows 10 OS
A bunch of good news about Agent3_2_472_r18_resnet_and_attn.onnx :
On Barracuda 2.0.0 and up NN import without any problem according to my test.
On MLAgent 2.0.0 (and thus Barracuda 2.0.0) import without any problem according to my test too.
On MLAgent 1.8.0 (and thus Barracuda 1.3.1) import fail as you describe above.
On bleeding edge Barracuda inference match reference ONNX runtime (appart from RandomNormalLike node more on this below) I expect this to be true since Barracuda 2.0.0
My guess is that you are using ML-Agent 1.8.0/Barracuda 1.3.1 thus and that the import bug was fixed along Barracuda 2.0.0 (itself used by ML-Agent 2.0.0)? Does it make senses and is it possible for you to give it a try with ML-Agent 2.0.0?
As a side note: ML-Agent 2.0.0 is a verified release while ML-Agent 1.8.0 is a preview package.
Final note: Barracuda can’t match RandomNormalLike for two reason: seed is not defined by model and is up to implementation + actual implementation of the random distribution is not standard and is up to inference library, however replacing RandomNormalLike by Identity made inference match.
I was using ML-Agents 2.1.0-exp.1/ Barracuda 2.0.0-pre.3 when I encountered the errors above. See versioning I screen captured from my package manager:
I updated my project’s Barracuda to 2.1.0-preview and got the same results as @WaxyMcRivers . So on my machine, at least, it was Barracuda 2.1.0-preview that resolved the errors.
local test on package manager. Seems that we have:
ML-Agent 2.0.0 → Barracuda 2.0.0 → model import fine
ML-Agent 2.1.0-exp.1 → Barracuda 2.0.0-pre.3 → error on import
ML-Agent 2.0.0-exp.1 → Barracuda 1.4.0-preview → error on import
ML-Agent 2.0.0-pre.3 → Barracuda 2.0.0-pre.3 → error on import
Also as you said Barracuda 2.1.0-preview → model import fine
So it seems that Barracuda 2.0.0 or 2.1.0-preview are the minimum version with the fix. Witch match the behavior you are seeing + is a good news as it means both official and latest version contain the fix.
However the documentation about the dependancies between ml-agents and barracuda seems indeed wrongly (as you poitned out Releases · Unity-Technologies/ml-agents · GitHub) I will raise with ML-Agent team.