Hi, my agent requires an initial round of training for learning some physics related behaviour, like self-balancing. Once it has that figured out, it needs to compete against others. Apparently it’s possible to start training without self-play, pause at some point, add self-play params to the config yaml and then resume without getting any errors. I just wanted to make sure this is a viable option and self-play will work as expected if it is added later on like this. Thanks!
My intuition is that it should be OK, since self-play just pits policies against one another and doesn’t change the architecture of the network. I’ll poke our research guys and get back to you.
1 Like
confirmed. it’s ok.
Great, thank you!