Hello,
I am using Barracuda to integrate my Unity-ML trained agents into my game.
However, looking at the implementation of the discrete action output, I am confused.
I see that the ApplierImpl.cs ultimately grabs the output tensor and converts it into the ActionBuffer.DiscreteActions form.
I also see that the output is put into the Eval function, where it samples a multinomial distribution based on info from the discrete_actions output from the network.
Here and onward I am confused. Since I am using barracuda only, I would need to implement the inference output modification from the discrete_actions tensor to the DiscreteActions[ ] array.
What do I need to do, to turn the network output into discrete actions?
I believe I figured it out…
I definitely think there should be a formal guide for going from ML-Agents to Barracuda-Only if ML-Agents isn’t meant to be used on games that only used trained networks (as I have been made aware of),
Specifically, the discrete_actions tensor output, and how to decode this information into what ML-Agents would end up giving you.
I couldn’t find any specific information in the docs about how this was done, and I ended up going into the source code to grab what was necessary, modifying the code for simple decoding of the output tensor given the array of branch sizes.
I’m sure that by now, many hundreds of others have gone through this process… am I not searching the right google terms, or is everyone shipping their built game code with ML-Agents inside it?
Judging from your other thread, sounds like you might not be doing this anymore. But just in case anyone is reading this in the future:
The output from Barracuda is the log-probability of each action branch being performed; we convert this to probabilities, compute the cumulative distribution function, and then sample based on that.
Up through Release 12, the code in DiscreteActionOutputApplier.Apply was pretty convoluted and made lots of temporary allocations and copies. It’s a lot simpler as of Release 13 (out now!) and doesn’t make any allocations after the first time it’s called.