I have a small square area environment that has 2 agents shoot each other. If an agent touches the wall, it loses, the other win, end episode. If an agent dies, it loses, the other win, end episode.
At first, I trained the agent without self-play to shoot another agent with the same policy to speed up the training. It works very well.
After that, I add self-play to the config file and change learning_rate_schedule from “linear” to “constant” and --resume training on that run-ID.
Here is my current reward code:
//when an agent die
public void AgentDie(GameObject whatAgent){
float rewardPenalty = penaltyRate * (float)resetTimer/(float)MaxEnvironmentSteps;
if (whatAgent == GreenAgent){
GreenAgent.GetComponent<Agent_Fighter_Demo>().AddReward(-1f);
RedAgent.GetComponent<Agent_Fighter_Demo>().AddReward(+1f-rewardPenalty);
}
else{
GreenAgent.GetComponent<Agent_Fighter_Demo>().AddReward(+1f-rewardPenalty);
RedAgent.GetComponent<Agent_Fighter_Demo>().AddReward(-1f);
}
ResetScene();
}
//when an agent touch wall
public void AgentTouchWall(GameObject whatAgent){
if (whatAgent == GreenAgent){
GreenAgent.GetComponent<Agent_Fighter_Demo>().AddReward(-1f);
RedAgent.GetComponent<Agent_Fighter_Demo>().AddReward(+1f);
}
else{
GreenAgent.GetComponent<Agent_Fighter_Demo>().AddReward(+1f);
RedAgent.GetComponent<Agent_Fighter_Demo>().AddReward(-1f);
}
ResetScene();
}
//when a bullet hit an enemy
Shooter.GetComponent<Agent_Fighter_Demo>().AddReward(damage/maxHealth);
After many attempts to reshape the reward and train, my agents always tend to not shoot each other anymore, they just shoot to the air. Even I tried in long term, trained overnight, the result is the same.
Does anyone know what is wrong with my problem?
Why did my agent not hitting each other anymore?
Does this cause because of my reward shape?