I am trying to understand the working of self-play’s training loop. And to do this, I created the following diagram. So can anyone confirm that the below diagram is correct for the following example values?
self_play:
window: 10
play_against_latest_model_ratio: 0.5
save_steps: 50000
swap_steps: 50000
team_change: 200000
Diagram:
