Hi guys,
It seems that pausing/resuming is not working well for me.
I interrupted my training after 46h. When resuming it (with no change at all in the code or configuration, just pause and resume), something is clearly different in my agent’s behavior. You can see the massive drop in cumulative reward in the following pic (blue lines after resuming).
It’s weird because it doesn’t start from scratch either (like if I used the --force flag, which works well by the way). It’s like if the training went back in time, getting levels of reward seen after 2-3 hours of training. Then, after only 4-5 hours of re-training gets back to levels of reward only achieved in the original training after 20 or 22 hours.
What am I doing wrong?. The command I use to resume is:
mlagents-learn.exe configuration.yaml --run-id=STAND_STILL_SPRING1 --resume
Is it enough with that, just adding the --resume flag? Anything else I should have in mind?
Thanks a lot in advance
PS: Analyzing console output, seems it properly loads the model and resumes training from the correct step number (see below), but still model fails to resume where it was paused:
2020-09-08 20:22:34 INFO [tf_policy.py:218] Loading model from results\STAND_STILL_SPRING1\WalkerDynamic.
2020-09-08 20:22:34 INFO [tf_policy.py:246] Resuming training from step 43697698.
PS2: My training settings:
behaviors:
WalkerDynamic:
trainer_type: ppo
hyperparameters:
batch_size: 2048
buffer_size: 20480
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: true
hidden_units: 512
num_layers: 3
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.995
strength: 1.0
keep_checkpoints: 5
max_steps: 90000000
time_horizon: 1000
summary_freq: 30000
threaded: true