Hello everyone,
I’m working on a project using ML-Agents to train an agent in a board game. In my game, each move consists of several properties, such as type, origin cell index, direction, and target cell index.
I have defined my discrete actions using multiple branches like this: discrete_actions_branches: [5, 37, 6, 37]. Each branch represents a different property of the move. However, not all combinations of values across branches are valid moves, depending on the current state of the board.
I need to mask specific combinations of values in discrete action branches to prevent the agent from performing invalid moves in the game. Is there a way to achieve this? If not, do you have any suggestions on how I should define my actions or modify my approach to handle the complexity of valid and invalid moves?
Any help or guidance would be greatly appreciated. Thank you in advance!