I’m trying to use the ML-Agents for a simplified Game of Splendor. And I have a problem with how to set up the correct behaviours to do a turn.
For now there are only two actions each turn the agent should take
- Pick currency (either 2 or 3 according to some rules, can be from different stacks)
- Buy a card (from 12 different ones)
The only close example I could find from the provided ones is the Match 3. And this is where I’m not sure how to proceed. The match 3 example is using as far as I can tell a single discrete action and the branch size equals to the move. The problem is, the actions I need are either picking currency OR picking card. If I have two discrete actions, they are both giving results.
Now if I change it to only one discrete action with a branch size of 2 (for either picking currency or picking card), I run into the problem of getting an actual value for this action. Do I use
-
3 continuous actions for everything
-
4 (3 for the currency picking and one for the card)
-
6 (2 for 2 currency, 3 for 3 currency and 1 for the card)
(the action branch could be divided into branch size 3 to include 2 and 3 currency as an option)
The currency is divided into multiple stacks and directly given as input for the agent. My main problem is how to get a list of enums/ ints back that I can build the stack I want to pick from the vectors. The observation is added as hot encoding for the enums and the amount normalized. And as output I would like to have in the best case a List (or int for that it matters).
public enum Currency {Black=0, Red=1, Blue=2, Green=3, White=4}
private float NormalizeValue(float currentValue, float minValue, float maxValue) {
return (currentValue - minValue) / (maxValue - minValue);
}
public override void CollectObservations(VectorSensor sensor) {
var amount = Enum.GetValues(typeof(Currency)).Length;
for (int index = 0; index < amount; index++) {
sensor.AddOneHotObservation(index, amount);
sensor.AddObservation(NormalizeValue(gameLogic.boardCurrency[index], 0, 8));
}
}```
Code example is only for the Currency, card is done in similar way.