Hello Guys!
I am currently using ML-Agents to create agents that can play the game of Connect Four by using self play.
I have trained the agents for multiple hours, but i the agent are still too weak to win against me. What I have noticed, is that the agent will always try to priorize the center piece of the board, which is good as far as I know.
This is my Behaviour Parameters component for both agents
Here are the observations that are collected and what happens what actions are processed
I figured, that the value 1 should always represent the own agents, while -1 represents the opponent. Once columns are full, i mask this column so that the agent cant put any more pieces into the column. After inserting a piece, the win conditions are always checked. On win, the winning player receives +1, the losing player -1. On draw, both receive 0.
Here is my training config:
Here are my questions:
- When looking at ELO in chess, a rating of 3000 has not been achieved yet. But my agents are already at ELO 65000, and still lose. Should ELO be somewhat capped? I feel like ELOs with 5 figures should already be unbeatable.
- Is my setup sufficient for training connect four? i feel like since I see progress I should be alright, but it is quite slow in my opinion


