After looking deep into the Internet my team and I are creating this threat to find some help and resources.
We are creating a procedural maze with a start position, and a random-ish end position. Our idea is to make a
Machine Learn agent drives the car provided by the Standard Asset from Unity but we are not having any success with it. We managed to make a simpler agent run through the maze and found the end, but for some reason, the car gets keeps running into walls. We have tried different hyperparameters and observations, we also tried P.P.O., S.A.C. and even using Immitation.
If someone has any advice or resource I would appreciate any help.
below is the agent code → CarAgent.cs
Hi @DiogoQueiroz ,
Have you tried using The Raycast sensor to detect the walls? We had helped an internal team train carts to drive before and they trained relatively quickly and learned to avoid the walls.
your per step reward penalty may be incentivizing the agent to “kill itself” more quickly to avoid getting a lower reward.
Hi @christophergoy , I believe we are using Raycast to detect walls, we might not be using it correctly because this studies are new for us. Below is the snapshot, we can also see in this snapshot that Raycast can detect the finish point as well.
And how we could make the car avoid killing itself and go directly to the finish point?
Below is the code for the MoveAgent
I’m on @DiogoQueiroz team, and just to complement his answer and give you some more info.
We are trying to train it increasing the complexity little by little. The training starts in a small and empty area like the one in Diogo’s snapshot and after some 30 episodes, it increases the complexity a little bit, after the complexity is at the maximum, we increase the maze size and start this process again. Should we try a different flow?
Also, we are currently using the decision requester period of 3. As the agent moves fast, we tried to use it with a lower value (1), but then the agent basically doesn’t move away from the start. What we should look at to find the best value for this?
I’m running a training for almost 10mi steps and the agent keeps being stuck in the wall like this…
Thanks for all of the info @DiogoQueiroz and @casbas ,
Just to clarify, do the walls and the goal target have different tags that are detectable by the raycasts? I could imagine a situation where it thinks the goals and the walls are the same if they aren’t differentiated. It may see the wall and think it’s headed toward the goal.
This sounds reasonable to me, there is a property workflow for this called curriculum learning within ML-Agents that you could use. It allows you to pass different environment parameters to the Unity Environment from python based on how well the agent is doing in the current Curriculum.
3 sounds reasonable, you could try to bump it up to 5 to see if you get better results.
For the kart game we worked with, the ray casts were spread all around the vehicle. I’m not sure if your car can back up or not, but it doesn’t seem to have any raycast vision behind the front bumper which may make it think it can just back up and turn a certain way when in fact, it can not.
I’m working on something similar and found a few things useful:
This may seem obvious and you probably have done this: play using heuristic mode and log all rewards, go around with the car and test all possible cases to make sure rewards/penalties are being sent to the agent with the values you’d expect.
Curiosity + Penalties can cause a survivorship bias (mentioned above). I found it worth while to go as bare bones as possible with network/RL alg parameters and aim for the most simple version of your goal (ie. no curiosity and the most simple version of the task).
Training only on the first part of the curriculum (empty area w/ walls) and getting a stable model that doesnt run into walls can be used in initialize-from for the next step (pretraining). This can be a sanity check that things are coded properly. If your car is still running into walls after training in the open area, something is wrong with your perception.
You can consider using GAIL and or Behavioral Cloning to jump start your learning a little bit via demonstrations. This page in the mlagents docs is very informative. If you do this, you will most likely have to create a more sparse reward system.
I have to give huge props to mbaske - his videos on youtube and his repos are great for learning from. His grid sensor example has a self driving car that you may be able to pull inspiration from.
My naieve guess is that either your perception is messed up (tags/layers) or your reward presentation isn’t representing the concept of your goal to the agent.
Hey, @WaxyMcRivers thanks for the hints. We are still trying to find a good way to train it.
At the moment I’m trying to get a stable model as you said, in just an empty area.
This youtube channel has a lot of good stuff, I hope it will help. Thanks!