I used FoodCollector env.
I trained with pytorch outside of Unity and converted the saved model file to onnx.
And it was imported into the Unity Editor(in BehaviorParameters).
The results are not the same. It is similar. but behaves a little weird.
What should I consider to do the exact same inference?
I know this is a feature that is not officially supported.
but I think it would be nice if it could be converted to some extent.
I confirmed that the pytorch output and onnx output are the same.
‘action’ value is logprobs. There is a little confusion when sampling at this point.
I checked input vector(vector, ray) correct order. I confirmed that it is (ray, vector) * stacked.
As a result of tracking the value a bit more, I found out that onnx outputs ‘-infinity’(in UnityEditor).
(Perhaps the reason it looks weird is that it doesn’t perform the action properly only when infinity is outputted.)
On the other hand, in the pytorch model, the logprobs values seem to be fine.
The output check mentioned above means that the same output comes out when the same value is entered.
The overall output shape is similar (-100 ~ 0), but suddenly -infinity comes out.
checked that the output is the same, and an error occurs in the final output, It can only guess that the input type is slightly different.
Where should I look more? Does mlagents preprocess the input?