Approach for requesting multiple decisions before execution

Hi all,

Pretty new to ML training and I’m struggling to figure out how to approach my problem.

I currently have an environment with a NN brain with two branches (Move & Attack type) which both have a size of 4. The game style is grid-based & we-go.

At the moment my flow is pretty typical: Request Decision → Get decision from both branches → Execute Branch 1 Move → Execute Branch 2 Attack → Set Reward

The way the game flows for the player is different to how I’m currently training, they have to make their 4 decisions for the round upfront and have to input those decisions during a preparation phase, rather than choosing moves one at a time.

What I am trying to figure out is how to replicate this behaviour for the NN (if possible) and request it’s 4 decisions upfront in the preparation phase and then execute them sequentially and reward based on the outcome of the output of those 4 decisions as a group.

My only thought so far has been expanding the branches to convert 1 output into multiple actions but that could create large branches and I’m not sure how that would affect performance.

Any thoughts/suggestions appreciated!

Hi,

If the player has to make decisions 4 at a time, it sounds to me like you actually need one decision with 4 different actions. If a single decision is 2 branches of size 4 then maybe you need to have 4 x 2 branches of size 4. We careful though, ML-Agents is not able to train with episodes of length 1. I would not worry about performance since you will be running the network only once instead of 4 times.
Another option would be to generate them sequentially, but you will need the observations to be different and include the consequences of previous actions somehow. If you do, I think it is okay to add the rewards for the 4 actions at the end of the episode.

Thanks for the advice Vincent, I’ll have a play around with adding in additional branches and see how it affects the training, it seems like the neater solution. Worst case scenario I’ll just under train the NN or adjust the observations to compensate for it being able to make it’s decisions more often than players.