I am currently trying to use ML-Agents to learn and play the board game Kalah (rules here for those interested but not strictly needed for this question). I am currently using the most basic board possible that still allows for some decision-making, having 2 pits per side and 1 marble per pit. If the Agent (player 1) moves the marbles in the first pit (bottom left), they are guaranteed to lose as whatever move the second player makes will secure them a win. If the Agent moves the marbles in the second pit (bottom right) they gain a free move and are secured a win by performing the only valid move left which is the first move. With enough training, the Agent should move the marbles in pit 2 100% of the time as doing that secures them a win, and doing the only other move guarantees them a loss, despite this even with around 250000 games of training it averages a 50/50 win loss rate. My current working theory is that, as a win requires the Agent to make both moves once, it is evaluating them equally and essentially tossing a coin as to which it does first, even when I specifically give it a reward for moving the marbles in the second pit first this does not change. My question as a whole is, can ML-Agents learn the importance of the order of moves, have I set something up incorrectly or is this something the Agent will never learn? Attached are pictures of the board, my results are 250000+ games and my general setting, code files can be attached as well if they will prove useful. Thank you in advance!
What’s does your code look like. Seems like a low amount of actions and observations possibly.
You need to make sure that Agent can know if it is on move 1 or move 2. If you have observations for that, then I think it should work