Massive reward drop after resuming training

Hi guys,

It seems that pausing/resuming is not working well for me.

I interrupted my training after 46h. When resuming it (with no change at all in the code or configuration, just pause and resume), something is clearly different in my agent’s behavior. You can see the massive drop in cumulative reward in the following pic (blue lines after resuming).

It’s weird because it doesn’t start from scratch either (like if I used the --force flag, which works well by the way). It’s like if the training went back in time, getting levels of reward seen after 2-3 hours of training. Then, after only 4-5 hours of re-training gets back to levels of reward only achieved in the original training after 20 or 22 hours.

What am I doing wrong?. The command I use to resume is:

mlagents-learn.exe configuration.yaml --run-id=STAND_STILL_SPRING1 --resume

Is it enough with that, just adding the --resume flag? Anything else I should have in mind?

Thanks a lot in advance

PS: Analyzing console output, seems it properly loads the model and resumes training from the correct step number (see below), but still model fails to resume where it was paused:

2020-09-08 20:22:34 INFO [tf_policy.py:218] Loading model from results\STAND_STILL_SPRING1\WalkerDynamic.
2020-09-08 20:22:34 INFO [tf_policy.py:246] Resuming training from step 43697698.

PS2: My training settings:

behaviors:
WalkerDynamic:
trainer_type: ppo
hyperparameters:
batch_size: 2048
buffer_size: 20480
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: true
hidden_units: 512
num_layers: 3
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.995
strength: 1.0
keep_checkpoints: 5
max_steps: 90000000
time_horizon: 1000
summary_freq: 30000
threaded: true

Hi @graphicDNA ,
Coincidentally, there was a github issue posted about this. One of my colleagues is looking into this now.

1 Like

Thank you for raising this. This is an issue that bubbled up on the ML-Agents repo this morning Performance drop when resuming training · Issue #4459 · Unity-Technologies/ml-agents · GitHub. We are aware of this bug and will have a fix out shortly.

1 Like

Thanks a lot guys, will look forward for that fix.

The behavior I see in the agent is like if the model was partially correct, but for some reason some of the output values received in OnActionReceived(float[ ] continuousActions) were wrong.

I can see that because I’m training a NN to control a chatacter’s joints, and after resuming, some of the joints seem to behave correctly, while others don’t.

Maybe some of the values in that array are coming in a different order than in the previous training?

I mean, correct values but wrong order. Something like that would definitely be compatible with the weird behavior I’m experiencing.

Thanks a lot.

Update: Seems to have been fixed on master here #4463.

1 Like