Episode length going up and down each 200k steps

I am making a modified version of the SoccerTwos environment, using the POCA algorithm along with self-play for training, and I found out that the episode length is increasing but after 200k steps it has a steep decline thinking it has something to do related to the team swap each 200k steps. What could the problem be, or is it expected behavior and I misinterpret it as a problem?
Also, the ELO is increasing over time but not steadily, having times where it goes up by 40 points and then decreasing by 50, but in the long run it is getting better (just really slowly) having the only rewards +1 - steps/MaxSteps and -1 for losing.