Reward self-play, how can an agent understand for what actions?

Usually, the “Reward” tells the agent that his action, which just happened, is correct.
But if you use the recommendations for self-play. A positive reward is given immediately to the whole team after the completion of a certain condition. But as a separate agent, he will understand what his actions led to such a victory?

Hi @LexVolkov ,
From the self-play documentation:
The reward signal should still be used as described in the documentation for the other trainers and reward signals. However, we encourage users to be a bit more conservative when shaping reward functions due to the instability and non-stationarity of learning in adversarial games.

Since agents on the same “team” are executing the same policy, you can assume that rewards are working as they normally would for the current policy execution.

Hi @LexVolkov ,

What you’re identifying is a very hard problem faced by multi-agent scenarios (and also just sparse reward reinforcement learning in general) known as credit assignment. Consider the scenario where two soccer agents are on the same team, and one scores while the other is doing something nonsensical that does not impact the game play. You are correct, the agent might learn to associate this nonsense with goal scoring/reward which would be incorrect. This makes it a hard learning problem, and to combat this, we train the agents for many timesteps and use many samples per update. The idea is that if we use many, many training samples, we can in some sense “average out” nonsense actions like this.

TLDR; Team level rewards can create credit assignment issues for the individuals, so we use very large batches to “average out” these issues.

Hope this helps. Let us know if you have any more questions.

1 Like

Thank you, this is what I wanted.

That’s about the trainer configuration settings.
Although there is an official description. But to Me as a simple user, poorly versed in a deep understanding of neural networks. It is difficult to understand what settings affect what the agent is. I would like more details, or examples.

1 Like

Hi @andrewcoh_unity That was a really good answer but I am agree with @LexVolkov that documentation or links for configuration setting of the neural network could be really helpful for beginners like us.

I’ve been just talking with someone who spend a ton of time experimenting with ML-agents and we both agreed that it would be incredibly useful if, even for the existing examples, Unity could do a hyperparameter search and provide detailed graphs for each run, for every parameter. It takes a ton of work on environments and agents to get any intuition about this, so a learning resource like this would be very helpful for many people, not only beginners.

1 Like