Strategy board game AI

Hello, im a CS student and im my game dev course in the university i decided to develop a version of the popular game Stratego.( this version is much more complex). I need advice with using the ML agents because i didnt see any code example of game like i want to develop. I will glad for any help ,especially with those questions:
How to model the states of the game?
How to make a good reward function?
Do i need to use two agents one who plays and the other one set the tools in the board before the game?
Thank you all.

Hi avihay107, your implementation will vary wildly with how your game is written, but at a high level:

Your states should be the simplest representation of the board/game that is sufficient to play it. For instance, if your game takes place on a grid, you could represent it as a NxNxM matrix, where NxN is the dimensions of the board and M is the number of different types of pieces that may be on it.

A good reward function is sufficiently dense (i.e. a reward is given frequently enough) that the agent can “stumble upon” rewards in random play, but not overly dense as to influence gameplay. You may, for instance, give the agent a small reward for taking an enemy’s piece and a big reward for winning; however, it will then be predisposed to try and take as many enemy’s pieces as possible and not other strategies.

Note that Model-free Deep RL (what is used in ML-Agents) is a reactive technique (i.e. the algorithm chooses its next action based on what it sees), and does not plan ahead. It may have limited strategy ability compared to algorithms such as MCTS (Monte-Carlo Tree Search), which can plan ahead.

Stay tuned for some improvements for board games in ML-Agents - and good luck with your project.

Thank you very much for the comment.
So,if i understand you correctly, the states are all the posible actions that the agent can reach? And in reward function i need to think about all the posabilities and rank them positive or negative?.
And how to do the delay that the agent does only one step and wait until the other player play his turn?

The state isn’t the possible actions, the Agent (a neural network) takes the state as input and based on that information, chooses an action. So it needs to have enough information so that it can make the right decision. For the reward function - if you’re building a game, a useful way to think about it is a “score”. If you’re a human player, what should the game reward you (or punish you) for doing, so that the player with the largest score wins the game?

For the last question, you can call RequestDecision manually - see an example of that in our Bouncer environment: https://github.com/Unity-Technologies/ml-agents/blob/master/Project/Assets/ML-Agents/Examples/Bouncer/Scripts/BouncerAgent.cs