Agent can't figure out how to walk around a wall

In my scene the agent learns to simply pick up a cube target and navigate around some obstacles. It does so pretty well, however there is a certain situation where it gets confused. Whenever the target is directly in front of the agent (by in front I mean it’s directly south, east, west, north of it), but obscured by a wall it gets confused and doesn’t know where to go. Here’s an image example of this situation:


In this case the agent randomly moves left, right, forward, backward for a while until the max steps counter runs out and the target changes position. How can I train him so that he learns to simply walk around the wall?

Here’s some info about the agent:

Observations:

RaycastObservations to go all around the agent that detect walls (I don’t want the agent to be touching walls because it ends up grinding up against them)
Vector observations for target and self position

Hyperparameters:

behaviors:
HiderAgent:
trainer_type: ppo
hyperparameters:
batch_size: 128
buffer_size: 2048
learning_rate: 3.0e-4
beta: 5.0e-4
epsilon: 0.2
lambd: 0.99
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: false
hidden_units: 256
num_layers: 4
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
max_steps: 400000
time_horizon: 64
summary_freq: 10000

The agent script:

using System.Collections;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;
using Unity.MLAgents.Actuators;

public class AgentCube_v0_4 : Agent
{
    public float agentRunSpeed = 2f;
    public float spawnAreaMarginMultiplier = 0.9f;
    public Transform Target;
    public GameObject ground;
    Bounds areaBounds;
    Rigidbody m_AgentRb;

    void Start()
    {
        m_AgentRb = GetComponent<Rigidbody>();
        areaBounds = ground.GetComponent<Collider>().bounds;
    }

    public override void OnEpisodeBegin()
    {
        // Move the target to a new spot
        Target.localPosition = GetRandomSpawnPos();
    }

    public override void OnActionReceived(ActionBuffers actionBuffers)
    {
        MoveAgent(actionBuffers);

        // Give reward
        float distanceToTarget = Vector3.Distance(this.transform.localPosition, Target.localPosition);
        if (distanceToTarget < 1.5f)
        {
            SetReward(1.0f);
            EndEpisode();
        }
        // Punish the agent if it is taking too long
        SetReward(-5f / 1000f);

        // Punish the agent for touching the walls
        Collider[] hitColliders = Physics.OverlapBox(gameObject.transform.position, transform.localScale / 2);
        if (hitColliders.Length > 0)
        {
            if (hitColliders[0].name == "Wall")
                SetReward(-0.1f);
        }
    }

    public override void CollectObservations(VectorSensor sensor)
    {
        // Target and Agent positions
        sensor.AddObservation(Target.localPosition);
        sensor.AddObservation(this.transform.localPosition);
    }

    public void MoveAgent(ActionBuffers actionBuffers)
    {
        var dirToGo = Vector3.zero;
        var movementAction = actionBuffers.DiscreteActions[0];

        switch (movementAction)
        {
            case 1:
                dirToGo = transform.forward * 1f;
                break;
            case 2:
                dirToGo = transform.forward * -1f;
                break;
            case 3:
                dirToGo = transform.right * -0.75f;
                break;
            case 4:
                dirToGo = transform.right * 0.75f;
                break;
        }
        m_AgentRb.AddForce(dirToGo * agentRunSpeed,
            ForceMode.VelocityChange);
    }

    public override void Heuristic(in ActionBuffers actionsOut)
    {
        var discreteActionsOut = actionsOut.DiscreteActions;
        discreteActionsOut[0] = 0;
        if (Input.GetKey(KeyCode.D))
        {
            discreteActionsOut[0] = 4;
        }
        else if (Input.GetKey(KeyCode.W))
        {
            discreteActionsOut[0] = 1;
        }
        else if (Input.GetKey(KeyCode.A))
        {
            discreteActionsOut[0] = 3;
        }
        else if (Input.GetKey(KeyCode.S))
        {
            discreteActionsOut[0] = 2;
        }
    }

    public Vector3 GetRandomSpawnPos()
    {
        var foundNewSpawnLocation = false;
        var randomSpawnPos = Vector3.zero;
        while (foundNewSpawnLocation == false)
        {
            var randomPosX = Random.Range(-areaBounds.extents.x * spawnAreaMarginMultiplier,
                areaBounds.extents.x * spawnAreaMarginMultiplier);

            var randomPosZ = Random.Range(-areaBounds.extents.z * spawnAreaMarginMultiplier,
                areaBounds.extents.z * spawnAreaMarginMultiplier);
            randomSpawnPos = ground.transform.localPosition + new Vector3(randomPosX, 0.5f, randomPosZ);
            var worldspacepos = transform.TransformPoint(randomSpawnPos);

            if (Physics.CheckBox(worldspacepos, new Vector3(0.6f, 0.1f, 0.6f)) == false)
            {
                foundNewSpawnLocation = true;
            }
        }
        return randomSpawnPos;
    }
}

Hi. I think the problem here is that the agent knows how to do the final step very well, but hasn’t learned the first steps at all. I would suggest removing the observation for the target location (but then be sure to add the target’s transform tag to the raycast component) and letting the agent learn how to first move around on its own and around the walls, then “stumble upon” the target without knowing where its at.

Also I think the greater part of the problem is that you’re rewarding the agent based on proximity to the target. So if being close to the wall / touching the wall, doesn’t violate the reward conditins and the overall reward loss/gain will be positive, the agent will keep grinding against the wall and STILL get some reward because its close enough to the target. So i think you should reward the agent based on actual colliding with the target. not just proximity.

also, why 4 layers of 256 neurons? I don’t think this problem is that complicated. I’d suggest 3 layers of 128 neurons for better and faster computation. a complicated neural network isn’t always a good neural network. if the problem’s complexity isn’t proportionate to the network’s complexity, it might actually end up being a horrible idea (over-analysis of a simple problem resulting in lack of decision also known as “analysis paralysis”).
best of luck :smile:

2 Likes

Hello @seyyedmahdi69
Thanks for your input. Based on your suggestions I have changed the reward to be given to the agent only once it collides with the target and reduced the number of layers/neurons. I have considered using raycasts to detect the target but there are several issues with that.
First, I do actually want it to know the location of the target, as I plan to expand this project further and compare how this AI agent compares to human players (who will be able to see the target), hence for a fair comparison I want it to know the location.
Second, I have attempted this approach and it did work, albeit I had to re-train it 2-3 times as it would sometimes either stay in one corner and wait for the target to appear in its sight after max steps is reached or it would hug a corner and never end an episode if I didn’t use max steps, hence never learning. It learned successfully one time where it would just circle around the arena until it hits the target with its rays.
Also, for it to catch the target with the raycasts I would need a large number of them to cover all angles. I’m afraid it’s would be terrible practice and would hit the training performance having so many raycasts on every agent. Here’s an image of what I’m talking about. The white rays is what I’ve used before to detect walls. The green is for the target.

Any idea if there’s a way to train it properly while giving it the location of the target?

hi @MrpHDanny and my pleasure.
Firstly, its is a little unclear to me as to why you would use two raycast components if they don’t have different offsets (for example one for diagonal mid-air objects and one for straight up agent-level ones). if they are just both for horizontal use, why not use one and give it as many tags as you want for it to detect (e.g wall, target, etc…)? Also don’t you think the number of the green rays is a bit too large? I think you can reduce the number of rays to less than half of what you’re currently using specially since its a 180 degree style with a long range.

Secondly, well if you want the situation to be fair for AI and players, go ahead and just add the target position to the observations and yes it would be fair. as for the wall detection, as long as you use the raycast sensor properly and punish the target to a proper extend for touching the walls, it shouldn’t pose a problem and should learn fairly quickly.

you could make it a bit more fun and challenging by reducing the raycast visibility degree to (say) 50 and have the agent actually turn to detect the walls but thats up to you.
also for a better performance on the agent’s side, you could add a tiny tiny amount of negative reward for every decision that the agent makes. This would hopefully prevent the agent from becoming dormant or slacking. it will encourage the agent to finish the job faster.
I hope I’ve been useful. let me know how it goes please :slight_smile:

@seyyedmahdi69

Good question. I does makes sense to use one raycast component with two detectable tags, but in my previous attempts to do so it didn’t work. The rays were of length 20, meaning they would always hit a wall regardless of the agent’s position. It would never learn to avoid walls. My guess it was either because it got confused since no matter where it goes its rays hit a wall or I just messed something up (My bets is on the latter). Only when I shortened the rays that detect the walls the agent immediately started to learn to avoid them and using such short rays wouldn’t be so great for the target.

My fear with using less rays is that the bigger the distance to the target the larger the gap between the rays, creating many ‘blind spots’. So the agent would have problems spotting the target over larger distances. Nevertheless, since I do want to make it work with the position of the target known, I won’t need the rays for that.

I will consider adding rotation in the future if I have the time before my deadline (this is part of my bachelors dissertation). Without the raycasts it’s not a vital feature so it’ll stay in the back of my mind for now.

I was, in fact, already using a negative reward for taking too long to find the target (this is from my code above):

// Punish the agent if it is taking too long
        SetReward(-5f / 1000f);

I may have to play around with the reward size more.
And the punishment for touching the wall is -0.1f every step:

if (hitColliders[0].name == "Wall")
                SetReward(-0.1f);

I will keep changing bits and bobs to try and hopefully find a solution for the wall problem and report back if I find anything.
If you have any other ideas please share them. Your help is greatly appreciated.

@MrpHDanny
are you sure about this bit?

if (hitColliders[0].name == "Wall")
                SetReward(-0.1f);

I think there could be a potential bug with this. the agent is constantly at a collision with the ground so when touching a wall, its also touching the ground. so the hitColliders vector would have more than one value. Maybe its my lack of knowledge in coding but why would that code guarantee that the wall will always be the first index in the list of colliders??
try to log a debug to see if the code above really works. Personally I’d go with something like this:

    public void OnCollisionEnter(Collision collision)
    {
        if (collision.transform.tag == "Wall")
        {
               //do stuff
        }
     }

@seyyedmahdi69
OH, I’m sorry I forgot I changed up that part since I uploaded the code here. I’ve fixed the issue since.
Here’s that full bit in the new source code:

  public override void OnActionReceived(ActionBuffers actionBuffers)
    {
        MoveAgent(actionBuffers);
        // Penalty for wandering too much
        SetReward(-5f / 1000f);

        Collider[] hitColliders = Physics.OverlapBox(gameObject.transform.position, new Vector3(0.6f, 0.1f, 0.6f));
        if (hitColliders.Length > 0)
        {
            for (int i = 0; i < hitColliders.Length; i++)
            {
                if (hitColliders[i].name == "Wall")
                {
                    SetReward(-0.1f);
                }

                if (hitColliders[i].name == "Target")
                {
                    SetReward(1.0f);
                    EndEpisode();
                }
            }
        }
    }

I’m using a vector of y size of 0.1f and it doesn’t touch the ground, sensing only walls or the target. The rest of it just runs through all colliders and does the appropriate thing if it’s a wall or target. Your suggestion will work too I expect.

@MrpHDanny I’ve only used Physics.OverlapBox() to search for a suitable spawning location in random spawning scenarios so I am not sure how it would perform in the current situation. But I am fairly sure about the performance of OnCollisionEnter(). you can use either, of course :slight_smile:

@seyyedmahdi69
I think my issue with OnCollisionEnter() was that it only triggered once when the agent touched the wall, but wouldn’t trigger while the agent was still grinding against the wall until he actually stepped back and touched it again.

Here’s a video of my best attempt at training the agent yet, which runs fine up until that point at 2:00 where it just gets stuck on the wall.

@MrpHDanny I couldn’t load the video fam. said its private. If its not a privacy problem, I’d be happy to take a look at the actual project, see if I can do anything to improve it :slight_smile:

as for the issue with the collision, you can always use OnCollisionStay which also accounts for the continues touch

UPDATE:
I saw the video. I think training it for a couple more million steps would help.

@seyyedmahdi69
I’ve fixed the privacy issue on the video. Should work if you try to load it up in my previous comment now.
I’ve uploaded my project onto github, best of luck to you trying to decipher my terrible file management haha. If you have trouble getting it to run let me know. Scene 2 is where you want to look and the prefab should have the latest script added to the agent.
https://github.com/MrpHDanny/UndergraduateProject

EDIT: I will try training it for a longer period (my curreny max steps amount is 400k, I’ll try 2million). I’ll be mighty mad if that fixes it after so much struggle.

@MrpHDanny no problem haha I am pretty sure you’re better at it than I am. I’ll get back to you with the result.

@seyyedmahdi69
I’ve trained the agent for 2m steps (41mins of training) and it seems like the issue is solved!
So glad to have this figured out haha.
Thanks for all your help, I’ll mark this thread as solved

2 Likes

I did the same with a couple of changes.
First its a good practice to normalize the input, so you might wanna set normalize to true in the yaml file. I increased the beta a bit to allow the agents to experience more situations. and made a couple changes in the script. But I am glad you figured it out. was fun :slight_smile:

@seyyedmahdi69
Don’t you have to normalize every input yourself ? I know there was a formula that does it somewhere in the ml-agents docs. How does that benefit the training?
Can I ask what your changes in the script were?

Normalizing the data prevents something called exploding gradient in which (long story short), the network has a really bad time trying decide what the output of each input will be. Normalizing usually scales the values down to something between 0 and 1 which is very good for the network both in clarity of the outputs and process speed. In this case, no you don’t have to do it yourself. Just set normalize: true in the .yaml file and you’ll be good.

sadly I don’t have a github account otherwise I’d contact you there, my changes in the script:
I changed the collision system. added a life int for the agent and ended the episode if life was 0 (reduce one life on each collision with walls)
added a constant negative reward for staying on collision with walls. I also tweaked the raycast sensor a little bit.

@MrpHDanny ,

hey bro, so i am building a evacuation simulation project where the agent have to navigate through the building and find the exit, which is extremely similar with your project.
so my issue is my agent fails to complete the episode after some time, and it keeps grinding with the wall.
how did you solve this issue?

1 Like