For non-turn based training (most of the examples provided) I can parallelize training by including multiple training environments into a single scene. This works fine with the single Academy instance that manages its own stepping.
In a turn based environment where I manage stepping myself by calling Academy.Instance.EnvironmentStep() after every turn, it seems that I cannot parallelize in the same way. The single Academy instance is shared across all environments and, of course, the different environments might go at different speeds.
Hmm… but now that I write this. I guess I just need one manager that waits for all environments to take a turn, and then advance the step. And that should work find. All the environments will move in sync, but, of course, the moves will be different.
Does that seem reasonable? Is that the right way to go about this?
Hi Dan - you won’t need to include multiple training environments in a single scene. In the trainer, you can use --num-envs=N to spin up multiple environments during training. This should make it simpler. See link to mlagents-learn params:
Thank you. Sorry, I should have mentioned that I knew I could do that. But the envs=N option seems much more heavy weight, as each N brings up an entire unity process. While I could easily train ~20 separate instances inside one Scene, I don’t think my machine would like me launching 20 copies of the process at the same time.
Ah i see. If that is the case, you may need to implement a solution in the way you described it and once all the turns are complete, call the academy step. How are you currently implementing the academy step?
void Update() {
If (player1Turn) {
player1.RequestDecision();
Academy.Instance.EnvironmentStep()
}
If (player2Turn) {
player2.RequestDecision();
Academy.Instance.EnvironmentStep()
}
}
Which lets me externally set when it is each player’s turn. So you can see how having multiple instances of this won’t work. Instead something like this, in an external GameManager…
void Start() {
// launch multiple instances of learning environment from prefab
}
void Update() {
// check all instances until they report turn done
Academy.Instance.EnvironmentStep()
}
// And each instance would so something like:
void Update() {
If (player1Turn) {
player1.RequestDecision();
turnDone = true;
}
If (player2Turn) {
player2.RequestDecision();
turnDone = true;
}
]