I’m making a chess AI through self-playing.
There are two players A and B, and A plays first.
When A plays, everything works normally. The program calls SetMask(), and then OnActionReceived().
When B plays, it calls OnActionReceived() before SetMask(). (The 3rd and 5th line in the screenshot below)
And even weirder, when player B makes his first move, the agent uses player A’s mask. After his first move, player B still calls OnActionReceived() before SetMask(), but he uses the mask from the previous round.
Here is my agent setting:
Player A:
Player B:
Well I spent few hours and finally found where the problem is. I should WaitForFixedUpdate
before I request Agent’s decision.
Glad, you were able to solve this. Out of curiosity, how come you are using stacked observations for a Chess AI?
In general, I’d be really interested to see how this turns out.
My observation is simply the whole chess board. It’s like an image and each pixel represents a single cell of the chess board. The reason the number of observations is 100 instead of 64, is that I’m using a 10*10 chess board
The stacked observation idea comes from AlphaGo’s architecture. AlphaGo used 17 stacked observation (which is the current state + past few moves) for training. So here I also used stacked observations, but a relatively small number.
This is a screenshot I took from a YouTube video (AlphaGo Zero Tutorial Part 3 - Neural Network Architecture) And I believe this is true.
When it comes to the training performance, I don’t really know if I benefit from this stacked observation. Actually, I tried different settings, like a stack of 3, a stack of 4 etc. But there isn’t really that much difference between different stack numbers.
The real problem for me is that, at a certain point (say after 1 million steps) both players become stagnant, they stop exploring new policies and more often they just repeat a certain (crappy) strategy. So it’s hard to tell whether training with stacked observation gives me a better policy.