Storing experiences & reusing them for future Trainings.

Hello,
i have just started learning ML-Agents.
I like it how easy it is to have just a one-command “mlagents-learn” to start the whole learning process.

It starts the game, the game requests decisions, delivers observations, get’s back actions - and the game tells a reward.
Those are just recorded numbers. how to we call this ? could not find a term in the topology - i call it “Experience” for now.

My bottleneck was not the performance of the Training Algorithm. It is the problem that every time i am trying new training parameters, the Training starts over again, completely ignoring existing experiences from previous runs.

What i would like to do:
Storing the experiences during a run,
reusing them for future Trainings.

Furthermore it would be beneficial to take a look into the data, to ensure that the NN get’s fed correct data.

I was surprised that i could not find any existing support for this in mlagents,
a clear hint that i might do something existentially wrong.
Or the Feature is useful, exists, and i just could find it.

My Training Results show also that i am doing something fundamentally wrong, and don’d really understand the process:
While the extrinsic value goes up and up, it suddenly drops to 0% success, and the algorithm keeps doing the same stupid shit actions (x: -1 y: -1), without exploring new Options, or just reusing some of the remembered action.

But it’s a strange simple network anyway.
1 useless input neuron (sends a stable 1),
2 Output : Continuous actions, X & Y



5836507–619219–gotStupid_config.txt (442 Bytes)

Hi SurfingNerd. Thanks for the feature suggestion. Unfortunately due to the nature of reinforcement learning training, it is a non-trivial problem to re-use experiences as you have described. Because of the nature of the RL signal, the data being learned from should typically be as close to the current policy learning from that data as possible. The greater the divergence, the less reliable the data.

Instead of saving the data from previous training runs, it may be better to reload from a previous checkpoint and use new hyperparameters.