I’m working on a simple tic tac toe project which contains a game manager and 2 agents. The game manager calls requestdecision() on each agent in turn. The trouble is that I can’t seem to find a way to call Academy.Instance.EnvironmentStep() without triggering the EnvironmentStep called recursively error.
If I don’t call environment step, then the results from the agents OnActionRecieved function don’t match it’s Heuristic function.
Can you post the full callstack of the RecursionChecker exception? And maybe some of your code?
You should probably be calling Academy.Instance.AutomaticSteppingEnabled = false;
during your setup, and then your game step would look something like:
Agent currentAgent = (m_CurrentPlayer == PlayerX) ? m_AgentX : m_AgentO;
currentAgent.RequestDecision();
Academy.Instance.EnvironmentStep();
bool didSomeoneWin = CheckForWin();
if(didSomeoneWin) { ... }
else {
// Switch players for the next step
m_CurrentPlayer = m_CurrentPlayer == PlayerX ? PlayerO : PlayerX;
}
(I’m just guess about your variables and methods, but that’s how I’d set it up).
It seems that playerX would have the same output as playerO’s previous output on occasion, which I solved it by adding a slight delay (0.1s) before calling requestDecision() between decision requests.
Calling request decision then environment step with automatic stepping disabled would just result in the application freezing for me. I think it may be related to this:
https://github.com/Unity-Technologies/ml-agents/issues/4991#issuecomment-785344327
I’ll try taking another crack at it after the multiagentgroup is released.
There’s a known issue where a masked action has a small (somewhere around 1 in 100K or 1 in 1 million) chance of being selected by the trainer anyway. So that might explain what you’re seeing, but I’m still skeptical.
I don’t understand how that could make a difference. And I don’t think that issue is related; certainly not to the exception you initially reported or the application freezing.
I wrote a simply tic-tac-toe example scene and agent, and was able to train it and see some increase in ELO (1344 in 8 minutes). Can you take a look and either see if that approach works better for you, or modify it to reproduce the errors that you’re seeing? It’s on this branch: https://github.com/Unity-Technologies/ml-agents/tree/tic-tac-toe
I uploaded the project to github in case you wanted to take a look at it:
Link
The game manager calls for a request decision after every agent player action (end of turn). If I call the request decision (through request agent decision the order of the 2 agents gets out of sync a bit. If I add a bit of a delay by calling Invoke(“RequestAgentDecision”, 0.1f); (in GameManager → GetAgentDecision()) it works fine, but trains slowly due to the delay.
I’m going to dig through your code to see if I can figure out where I went wrong.