I’m curious about how initialize from works. If we take the Walker example, and initially create a model where the agent can walk to a target location. If I was to make the environment more complex, lets say for example, the agent needs to search a set of rooms for the target, without hitting a wall or touching the ground - the reward system would change from (matchspeedreward * lookattarget) to (+1 for touching the target & -0.001 for each step). Would I be able to use the weights of the previous model to train this? or would it be too big of a change?
I’m just trying to figure out what is possible with --initialisefrom, and what can/can’t be changed. I’m aware that the observations, actions, and hyperparams can’t change.
As far as I understand any changes beyond the technical limitations of keeping the parameters the same are experimental choices that may or may not provide the desired results.
Training with the existing model may force your AI to re- or unlearn too many things, but more likely it’s going to give it a boost so that it is able to more quickly adapt to the changed environment.
I retrained the walker without changes with its previous 4 hour run and trained it for another 8 hours. The result was essentially the same from observations alone, but according to tensorboard stats the reward scores were significantly lower. Upon closer inpsection the retrained AI did have the occassional issue where it walked very slowly - probably preferring to play it safe.