Continue at last step for loaded training?

I’m having 3 different Runs (different Unity & Yaml Config settings) for my agent training. How can I ensure when continuing a past run training of these (using “… --train --load” on the blue run as pictured; it’s using linear_rate, by the way) that it will continue from the last step it stopped, instead of jumping back to the very left in the graph?

Thanks!

Hey Philipp - that’s strange, jumping back shouldn’t happen with --load. What does your model’s checkpoint file look like?
https://github.com/Unity-Technologies/ml-agents/issues/1047#issuecomment-409660869

The latest checkpoint file now contains:

model_checkpoint_path: "model-135397.cptk"
all_model_checkpoint_paths: "model-50000.cptk"
all_model_checkpoint_paths: "model-100000.cptk"
all_model_checkpoint_paths: "model-135397.cptk"

I’m still having problems with this, does anyone know what to do to continue the training exactly where it left off?

As it is, the Steps counter resets to zero everytime I use load, even when I know it does load the neural network (based on its performing level). When I then pick “relative” in Tensorboard it helps a bit – at least it displays the lines chained side by side – but it still feels like sometimes, the training cumulative success takes a brief but heavy fall before it recovers (I reckon that might be because it measures the training rate differently, as it thinks it’s on step 0 again, and not say 100k).

(Now also cross-posted to StackOverflow)

when creating an agent with ml agents for the first 9 million steps everything went well, but after that the agent became even worse than before. How can I go back to a certain step. My chart looks like this : Zrzut-ekranu-2022-11-03-213643 hosted at ImgBB — ImgBB