Working of self-play training loop?

I am trying to understand the working of self-play’s training loop. And to do this, I created the following diagram. So can anyone confirm that the below diagram is correct for the following example values?

self_play:
window: 10
play_against_latest_model_ratio: 0.5
save_steps: 50000
swap_steps: 50000
team_change: 200000

Diagram:

1 Like

Hi @dhyeythumar

This looks correct to me. Cool diagram!

1 Like