When do you stop training?

At what point should i accept that no more significant progress will be made?
My cumulative reward seems to be at a plateau?

behaviors:
My Behavior:
trainer_type: ppo
hyperparameters:
batch_size: 512
buffer_size: 10240
learning_rate: 0.0003
beta: 0.00005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: false
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory:
memory_size: 256
sequence_length: 64
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
gail:
strength: 1.0
gamma: 0.99
encoding_size: 128
demo_path: Assets/Demonstrations/KayakDemo.demo
keep_checkpoints: 5
max_steps: 50000000000
checkpoint_interval: 100000
time_horizon: 64
summary_freq: 50000
threaded: true

depends what the highest reward possible is, just because it’s reached a peak in it’s current understanding of the environment doesn’t mean it’s stopped learning, it could very well find that next breakthrough and go up again - this really depends on the difficulty / variation(s) of the task(s) you’re trying to get it to achieve and the reward(s) given.

for a well trained brain i would expect the reward to be more flat so it can consistently achieve the best result it can but it doesn’t really matter when you stop, you can always carry on again if the training hasn’t reached the point you need after testing the brain.