Hi!
I trained an agent that plays Capture The Flag, what it does so far is capture the enemy flag and bring it back to the allied flag. While training, I am using 8 instances of the arena. The model is trained properly and can successfully capture a flag. But in inference, when having only 1 instance of the arena, the agent does not behave properly. If I use 8 instances for the inference as well, the agent behaves properly, exactly like in training.
Could something be done about this? Is it known? What is happening?
I marked this as bug but Iām not sure if it is a bug.