Two-player turn-based boardgames (TicTacToe, ConnectFour, ...)

I’m a game dev student working on MLAgent implementations for simple two-player turn-based board games. Having no result at all on 7x6ConnectFour and 4x4ConnectThree I’m now training a TicTacToe agent but I expect bad results too. I wonder if I am doing some thing wrong or if this is just not really possible (having found nothing but similar failed projects on github).

If anyone has a good sample project that could be cool!

I have some questions:

  • What are good hyperparameters for something like this?
  • Is it better to train with self-play or just train a starting- and a responding agent?
  • Is it bad to train two agents at the same time?
  • Is it bad to train against a random move agent?
  • Is there anyway to use a CNN/RNN for Connect four instead of the default NN?

I’ll kick this over to the team for them to have a look, and forward any insight they share.

@TreyK-47 Thanks man!
Btw, there are still some bugs in the timing of the CollectObservations and CollectDiscreteActionMasks functions, sometimes they are called in the wrong order (after the OnActionRecieved). I’ve seen someone report the problem back in 2018 where someone from Unity said it was going to be low-priority, but it feels weird it is still there after two years while it makes the framework feel very unfinished.

No problem! As for those bugs, could you submit some new reports for us for them? That way we can take another look.

@TreyK-47 Do I just do a normal bug report or is there a special place for MLAgents bugs?

Also I noticed that a trained network doesn’t always play the same move in the same situation (most clear with varying first moves). How should I interpret this result? Does this mean the network values the different played options equally (which seems very unlikely) or is there something else going on?

I probably have too much hidden_units. Either way, I just noticed that Unity/python is not using the hyperparameters of my .yaml file, but uses some other parameters: (maybe these look familiar, if anyone has a clue if this actually means it is not using my .yaml hyperparameters and what a fix could be?)

2020-07-25 01:29:01 INFO [stats.py:129] Hyperparameters for behavior name TicTacToeBehaviour-1:
trainer_type: ppo
hyperparameters:
batch_size: 1024
buffer_size: 10240
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: False
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: None
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
init_path: None
keep_checkpoints: 5
checkpoint_interval: 500000
max_steps: 500000
time_horizon: 64
summary_freq: 50000
threaded: True
self_play: None
behavioral_cloning: None

probably u need to rename the behaviour in your yaml to the behaviour name of your unity brain. Just a quick little thing you could try

@seboz123 Thanks man, that worked out fine!

As for the bug reporting process, yeah, the process detailed here: Unity QA: Building quality with passion