How to record a "good demo" for imitation learning?

I trained a PushBlock sample model with gail, and the input demo file was recorded by myself. But it just hanged around and never push block into green area. The result model from ExpertPush.demo by Unity behaved normal, so the difference must be the way I play this game. Does there anyone success to train this sample with self-made demo? Any tips? :face_with_spiral_eyes:

Usually the key is "more" demos - the ExpertPush has quite a few. Also, try turning off use_actions = true in GAIL, and if that isn't sufficient turn off behavioral_cloning. I'd also try increasing the stack size in the Behavior Parameters in the Unity editor.

One of the issues is that in human demos for the cube environments, there tends to be a lot of stopping (i.e. stop and turn to point the cube, then go). So the agent learns to imitate the stopping, but because the policy has no memory, it will just stop forever. The changes above should make it less sensitive to this.

1 Like

Thanks for reply! I'll try these tips. :)

I'm also stuck a bit. I have a scene like pyramid example, even a bit more complicated. However the main task for agent is to reach the target crossing the entire arena by diagonal.On arena there are obstacles, so the agent should find a way to the target avoiding obstacles. Agent and Target have constant coordinates. Plus I use rayperception and several rewards, penalties. For any collision with a wall or other obstacle agents is punished, it is also punished by time. However it is awarded if coming closer to the target and if reaches the target.
So I try to record several episodes of demo for this scene (~10) and then start training using the same parameters as pyramid example... and no result. I ran training on 9 cloned arenas for 8 hours and still agent cannot reach the target in fair manner. It is often stuck between a wall and tree and etc. I tried to increase number of layers in the network, but did not help. I do not know... probably there are should be some balance between numbers of rewarsd/penalties and IL. Like more demonstrations - less awards... any ideas will be appreciated. Thanks! I can add more details if anybody interested to help me.

I had this, when I forgot to put the desired tag in the ray sensor. And the agent was simply blind.

I have another question on this topic.

I am recording a demo for a very long and difficult task (even for a human to do) so I sometimes fail the task. Is it ok to record some failure episodes? Or do I need to start all over even if I have a single failure?

The majority of my recorded episodes in my demo recording are successful. 90%+

Could my Trained Brain ever reach 100% success rate with this Demo? Or would it make the same mistakes I did and only ever reach 90% success?

As long as the demonstration contains the Agent getting a negative reward for failure, it should be fine.