How does "initialize-from / init_path" actually work?

I attempt to train a Unity ragdoll to perform some task. As the video, the ragdoll is capable of running.
But I failed to start another training with the trained running model with “initialize-from” in command or “init_path” in configuration.

Even I tested it, with the same environment and configuration where the running model was just born. The peformance just dropped quickly and stucked in poor result. As the log below, the first 30000 step was good with the trained running model, but it became worse with the new training progressing.

 INFO [tf_policy.py:118] Loading model for brain GurrenBattle?team=0 from ./models/GurrenRun/GurrenBattle.
INFO [tf_policy.py:143] Starting training from step 0 and saving to ./models/GurrenChase2/GurrenBattle.
tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 30000. Time Elapsed: 20.532 s Mean Reward: 76.215. Std of Reward: 30.253. Training.
INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 60000. Time Elapsed: 33.016 s Mean Reward: 66.936. Std of Reward: 33.256. Training.
INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 90000. Time Elapsed: 45.812 s Mean Reward: 18.489. Std of Reward: 31.908. Training.
INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 120000. Time Elapsed: 60.078 s Mean Reward: 0.992. Std of Reward: 1.093. Training.
INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 150000. Time Elapsed: 76.910 s Mean Reward: 0.119. Std of Reward: 0.780. Training.
INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 180000. Time Elapsed: 94.182 s Mean Reward: -0.162. Std of Reward: 0.550. Training.
INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 210000. Time Elapsed: 111.804 s Mean Reward: -0.309. Std of Reward: 0.469. Training.
INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 240000. Time Elapsed: 127.550 s Mean Reward: -0.288. Std of Reward: 0.774. Training.

The docs noted

But I am confused about how to use initialize-from / init_path correctly after trying. Could anyone share how does “initialize-from / init_path” actually work? and how to use it correctly?

Many thanks

(ML: Verified Package 1.0.8 Unity: 2020.3.34f1)

Sincerely,
Sherlore

I am not the most expert in this, but I have had success using initialize-from when I make the environment more complex. In other words, I train the first time in a simplified environment, and then train a second time in a more complex environment using initialize-from to start from the first model. This sometimes works better than training the first time using the complex environment, going step-by-step to create the more complex model.

I hope that comment is helpful. :slight_smile:

That sounds really great. To train a model for a complex scenario is exactly what I try to do with initialize-from. In your case, it seems to work nice and intuitively. your comment is helpful information. Thank you!

May I ask what version of Unity ML did you use to train the above model?

And if you trained a model A in an environment X, then if you start a new training with initialize-from model A and in exact same environment X.
Does the new training keep the trained performance and continue to improve? Or its performance would drop quickly as my case?

I am using ML Agents version 1.0.8.

The example that I have been using starts with a nearly empty environment – just the agent and a moving target – to train the initial model. I then add more obstacles to the same environment and train with initialize-from the first model. The new training did keep the trained performance from the first training but may drop off a bit at first because the problem has become harder. After training the agent improves at avoiding the new obstacles.

I have not yet tried applying the same model in a completely different environment but will try that soon. It will be interesting to see whether the models are generalizable to a new environment or will require retraining, in which case I would try initialize-from again.