I’m currently working on my PhD thesis and want to use Unity ML agents to train 2 different agents at the same time. The first has to select a way to render an image, and the other should learn to navigate according to what he sees.
I wanted to know if it’s possible to train two agents with two different behaviors at the same time.
Hi @Julien_Desvergnes ,
You can train multiple different behaviors by clicking on the agent in the Hierarchy, looking in the insepctor, and settings the BehaviorName property in the BehaviorParameters component.
Thanks for your answer, my question was more about the possibility of training 2 different agents at the same time not in two consecutive runs, but actually I found out that it work pretty well!
I have another question, could I inject human input while the training is running, such as press a key when a particular situation appears?
In my work we are now considering human reinforcement learning (If you want to know a little more: https://www.ijcai.org/Proceedings/2019/0884.pdf ) and I need to inject some human input during the training.
Hi @Julien_Desvergnes ,
You could always add code to your game which takes user input. I’m just not sure how it will interfere with the rest of the RL loop. Would you make it a sparsely used observation? Would you override actions? Would you want to feed that user input back to the trainer? How would you do so?
Thanks for your answer, I want to use that feedback to modify the total reward before the reinforcement step. I think it correspond to your proposition : “feed that user input back to the trainer”.
@Julien_Desvergnes How did you make it train two different agents at the same time? I know I can get the code working for each, but how did you make two configs run with mlagents-learn simultaneously? Just running two configs from two console windows?
@Julien_Desvergnes
Feeding human input is what Gail is for. Take a look at the GAIL reward config options.
Essentially you pre-record the human input. Then configure Gail to feed this at a configurable strength into the training. You can either have it use mimicking or generative adversarial models. Its quite powerful.
Take a look at the implementation details of GAIL in Unity ML
Relevant to my interests. What about 2 different behaviors?
Say i have 1 agent. Let’s say he learns walking, and standing up from a fall. But doing both in 1 NN is just too complex to properly make work. So we make it 2 NN’s, one for walking, and one for getting up.
If code detects a fall, i would like it to switch over to the 2nd behavior, and if that fails, i reset back to the start, but if it succeeds within parameters, it will switch back to walking behavior.
I imagine swapping out the NN file at runtime. But i doubt it will be that easy since the training CMD will time out on either of the 2 behaviours if they don’t see any feedback from unity for a while, correct?
However if you really want to have two agents one for walking and one for getting up. Just train them separately.
For the getting up agent. Just have something push the agent over in different ways each time it stands up. Reward it for standing up successfully and then start the next episode by pushing it over. Just don’t push it over in exactly the same way each time cause otherwise it will only learn one way of getting up.
For the walking agent reward it for not falling or penalize it for falling or make falling equal to a lose (-1 reward and end episode.
Having said what the agent needs to learn in terms of for example maintaining balance is common for both walking and getting up. So I’m not sure that splitting this into two agents is a good idea. There is shared / common learning in both activities.
There definitely is, but i am reasoning about what to do when tasks get too complex for one agent to perform. At some point it becomes so hard for the agent to distinguish noise from signal, or to reward them properly at the right points, that it becomes hard to get it to learn without near infinite time and steps needed to do so.
Actually now that I think about it. As long as the unity instance is running the academy will be stepping. As long as the academy is stepping then ML Agents Python side will NOT time out.
So you can freely switch between training one agent vs training the other.
It would result in empty steps though wouldn’t it? They would both be “continous” models. If nothing is gathered during a step, it will surely fall apart? I should jus try this i suppose.
Well a step for an agent brain happens when a decision is requested by an agent of that type. A step does not happen just because the academy steps or an agent of a different type performs a decision request.
You would have to manually trigger the decision requests since you are effectively alternating between agents. The auto decision requester component cannot be used here.