I’m training an agent in an environment very similar to the basic example one with 16 training areas. Eleven of these training areas are able to reach the goal, while the other 5 fail. When I use the brain (the .onnx file), it doesn’t reproduce the performance of the best agent. Is there a way I can choose the brain from the training area with the highest reward?
Unless you’ve set distinct behaviour names for your 16 agent instances, there should only be one trained model / brain. If it’s always the same areas that succeed or fail, then there is probably some inconsistency in your agent’s observations. Is it maybe observing global instead of local positions? You’ll need to make sure observations don’t depend on which of the areas an agent instance is located in.
1 Like
Ah! I realized I was training with absolute position instead of local position. That’s exactly it, thank you!