I have a board game that is fairly complex and have completely remodeled my observation space architecture 3 times.
The first was to create an observation for every square (40) with around 400 one-hot “characters” that can be in each square
The second was to only create an observation for each “character” and not for each location, reducing the observation space down to 14 but the amount of “characters” including their location up to 16000 (Is that really inefficient for One-hot?)
Then I thought that separating data will be better learning for the Agent, so I put the data in to “Type”, “Location” and “Trait” using 3 observations per possible piece (13 max, 39 Observations) and ignoring empty spaces, my initial thought was that it would be able to better learn the effects of each, all type 1 characters move like this, all trait 3 pieces are immobilized etc. But then again opens up more opportunity for the Agent to confuse the reward / punishment across the observations and even attribute traits of one piece to another.
I ran 1.6 million test games on the first structure and got already pretty good evidence that it was learning well. Before committing to trying to get 10’s or hundreds of millions of games tested for my Agent. I want to make sure the best option is in place
There must be some best practice that skews either towards separating every possible combination in to its own “character” (16000 variants) or increasing observations to matrix out the variables or some middle ground? My three structures are maybe all three extremes although I could definitely matrix out variables further and fill a quaternion (Which for MLAgents is one Int and 4 positions x,y,z,w worth of information) But would increase observation size to 65.
I also had in my 13x3 architecture just 39 individual intergers every observation call, I got to wondering (this is probably very ignorant) if I made them a vector3, would the Agent group the info as relevant to one piece of information as the ‘type’, ‘location’ and ‘traits’ should only affect the consideration of that piece.
I promise I did look for these kinds of answers! Help a noob out! ![]()
TLDR:
How many is too many for one-hot?
Matrixing data vs Single giant list?
Observation lean or “Character type” lean
Can vector3 / Quaternion work well for grouping character data in a matrix
TIA