making cars recover from stuck position

just after some pointers really, all the tutorials seem to be about training your car to navigate a track, which i can do, however they never seem to cope with being stuck against a wall (say, after an impact knocks them off course).

by putting complex obstacles patterns in the way for training, they just learn to avoid them as expected.

am i asking too much of it? what would the the normal way to deal with such things? switch models when detecting we are stuck for one that’s trained to get out away from a wall, and then switch back again?

Should be as easy as set the “max step” on the Agent, not in the training configuration to a reasonable amount of steps, after that the agent will reset and start over.
https://docs.unity3d.com/Packages/com.unity.ml-agents@1.0/api/Unity.MLAgents.Agent.html

I forgot to ask and assume you are applying a penalty when they collide with an obstacle?

Side information:
I start with 500 on mine, cause a car can get from any check point to the next check point in under that many steps. Anything larger means they failed. By doing this they find a solution to getting to the next point faster, but they don’t train to deal with being stopped very well because they don’t have time before they get reset.

After a few million steps when the cars can get around the track successfully more often than not, I then increase it to 1500. They then have time to figure out more solutions to being wrecked, though they wreck fewer times. My opinion is that by knowing what they are supposed to do (get around the track) to get higher rewards, they also know what they are trying to accomplish from being wrecked (get back to going around the track). Where as if you just have long training steps all they know is they aren’t getting rewards, but they may also not know what they need to do to get back to earning rewards.

I started by not only having a penalty but also ending the episode on wall impact, but this meant they didn’t know how to deal with it when they didn’t die.

so now I have a large penalty for hitting wall and a smaller penalty for each step they are still touching it.

I tried a 600 max step but maybe that’s too high, the car can complete a lap I that time.

they even learned to scrape the wall as it meant they could get more reward by travelling faster and hitting the checkpoints quicker!

increasing the steps after training didn’t give them any further ability to navigate away from a wreck. they just start travelling further instantly.

I don’t feel any more enlightened at the moment, maybe I’m missing what others perceive to be obvious. if yours is successfully able to recover from a wreck, may I ask what observations you are using?

In addition to tweaking reward functions, it might worth considering utilizing action masking. It specifies that some actions are impossible for the next decision. For example, you can disable move forward action when wall are near from the agent etc. See here for details.

In my case, it improves the training results as well as the training time. I would be a little reluctant to restrict certain actions by hands because I wanted the agent to learn that behavior, but it may be useful if there are actions that you really want to avoid.

1 Like

I just got a chance to look at this again. Sadly it seems action masking only seems to work on Discreet actions.

It’s possible you could begin some episodes with the car in different positions, such as already facing a wall, to give it training experience in these outlier cases. I haven’t looked into this too much myself, but adding more variety to the starting conditions does tend to make agents more robust.

i’m already doing that, but i’ve not yet been able to get it to do anything sensible. if i put some obstacles directly in the way it doesn’t make any real effort to go anywhere yet, they just sort of wibble around. feels very much like i’m missing something quite basic…

Another option would be to punish the agent more for a perpendicular collision than a more parallel one (could be a dot product of the wall’s normal vector and the car’s direction). This would give the bot a gradient to learn against rather than having a static reward for all collisions.