Hi, I use the BC only to train my agent in 150,000 steps, I get the fairly good results. And then I use gail + extrinsic reward and continue training. The mean reward immediately goes down to negative and the agent hit the obstacle repeatly. Why did this happen? It seems that the agent forget everything it learns with BC and start learning from the beginning. Or perhaps my demonstrations is not good enough? My game is crane path planning in 3d space, the goal is to find the target.Here is my config file.
Hello! I haven’t had this problem. Can i see your C# code about “crane path planning”? My email is 1164072013@qq.com