Hi everyone, i’ve been experimenting with the self play concept and the idea of a cooperative/competitive environment.
First version of the env is basically a team based “food collector” (the one in the ml-agents examples).
2 teams, 3 agents each.
agents health decrease with time, there’s food all over the ground, last team standing wins the match.
Agents can shoot lasers to freeze other agents in place.
Reward is really simple, agent gets -1 if health drop to zero, each agent of the winning team gets a +1.
(probably it’s a good idea to also give a -1 to the loosing team).
My question is about the agents with 0 health. How should i handle them while the episode is still in progress?
right now, i just use gameObject.SetActive(false), and i reactivate them OnEpisodeBegin().
How does, being deactivated, affects the training of the agents? How does it perceives what happen when inactive? How can it interprete being deactivated for most of the match, and then getting rewarded at the end of the episode? i don’t even know if it can get AddReward() while in that state.
Should i leave them on the ground, change tag to “deadAgent”, add an isDead bool to the observation space and just ignore their outputs while isDead (as i do for isFrozen)?
put them in a cage till the end of the match ??? like hockey penalty box
Hello. If you deactivate an agent, then it no longer sends or receives observations, actions, and rewards. As long as you punish the dead agent before deactivating it, the reward should be received. The issue with keeping the agent around is that it might learn the wrong relationships between the observations and actions.
that’s what i was thinking. Punishing the agent before deactivation obviously it’s not an issue (as for reactivating it before final reward).
but the problems remain. How an agent “interprete” being deactivated?
any suggestion on how to handle them properly?
If you put an enum state for being dead and alive as an observation (or boolean) that will solve the problem. Just freeze the agent so it doesnt go anywhere, it can continue to train but it will know that its dead. Therefore it will seperate the states between being alive and dead.