Agent REALLY seems to like walls...unless it can see the goal in RayPerceptionSensor?

mrmiketheripper · April 23, 2021, 3:02pm

Unity 2020.3.3f / ml-agents 0.23.0 / communicator 1.3.0 / PyTorch 1.7.0 on macOS Big Sur 11.2.3

I’ve been excitedly playing with ML-Agents for about a month now on and off. I followed through some of the basic tutorials on YouTube and have been trying to find ANY kind of reading material I can on the topics, unfortunately it seems most of the results are all for the same few threads so I thought I’d ask my question here.

I have a basic training environment setup. It’s a small plane with 4 walls surrounding. The agent has a RayPerceptionSensor3D setup about mid height. It projects all around the environment and only detects two tags: “Obstacle” and “Goal”. The sensor can also detect layers “Default” and “TerrainLayer”. The walls are on the “TerrainLayer” and the goal object is on the “Default” layer. I have confirmed that the RayPerceptionSensor3D does indeed detect walls and the goal, but the usage is fairly fuzzy to me still and how to strengthen the associations.

The agent also has 9 observations of its own:

Normalized agent position (x,y,z)
Normalized goal position (x,y,z)
Agent Forward (x,y,z)

My agent script detects when the Agent hangs out against a wall for too long and ends the episode with a negative reward.

For the first 40 times, the agent needs to move from its spawn to the goal. It receives a +1 reward for touching the goal and receives an additive -.075 reward while it’s touching any walls. After 40 attempts, the agent gets pretty good at this so I add walls by switching the environment. The environment size stays the same but 3 walls are placed in the environment, also on the “TerrainLayer” with tag “Obstacle”.

Once the walls are added, the agent does REALLY good if the goal is within its immediate sight of the RayPerceptionSensor3D. However, the agent is on one side of a wall and the goal is on the other side, it just seems to continually try and move itself into the wall and grind against the wall until I end the episode. I would expect odd behaviour like this as it tries to figure things out, however it seems to just do this. Even occasionally, it will fail the simple environment tests (no walls) simply because it moves itself into one corner, locks on the wall, and takes the negative reward until it’s respawned.

I did have one training run that ran overnight (about 6 million steps) that resulted in a good cumulative reward, but when I tried to use that brain to run through the training I observed something similar: sometimes the agent would go right for the goal, other times it would just seemingly give up and run itself into walls.

I’m not quite sure what I’m doing wrong here and as stated previously, the limited threads on this fairly new topic make finding answers quite confusing. I’ve tried tweaking things such as the length of the rays. Initially, the rays were long enough to span and touch all sides of the room which I strongly believe confused the AI into thinking it had limited moves.

I tried shortening the rays significantly hoping it would push the AI away when it gets close, but it just seems to latch onto the wall once it sees it.

A few days before posting this thread, I was incorrectly normalizing the coordinates and thought that would be it. Now, I calculate the bounds of the level and normalize coordinates like that. From my debug view, it works and is correct but I’m not so sure that made such a difference.

I have of course, tried tuning hyperparameters, but that doesn’t seem to make a huge difference in the training.

To me, it seems like the AI is not understanding that hugging walls is being negatively reinforced. I hesitate to make the negative rewards too strong (< -1) since that could potentially screw with AI normalization?

I had a small negative reward every step, but removed it because I thought that was being detrimental to training around the walls.

Hyperparameters:

behaviors:
  FindExit:
    framework: pytorch
    trainer_type: ppo
    hyperparameters:
      batch_size: 128
      buffer_size: 2048
    #   batch_size: 4096
    #   buffer_size: 10240
      learning_rate: 3.0e-4
      beta: 5.0e-4
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: false
      hidden_units: 128
      num_layers: 3
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    keep_checkpoints: 5
    max_steps: 6.0e6
    time_horizon: 64
    summary_freq: 12000
    threaded: true

Agent:

public class MyAgent : Agent
{
    private Vector3 _SpawnPoint;

    [Header("References")]
    private ThirdPersonCharacter _thirdPersonController;
    [SerializeField] BoundsCollector _LevelBounds;
    [SerializeField] Transform ExtractionPoint;
    [SerializeField] Testing_FindRandomPosition Training_RandomizeBoundsContainer;

    [Header("Events")]
    [SerializeField] UnityEvent _OnEpisodeBegin;
    [SerializeField] UnityEvent _OnEpisodePass, _OnEpisodeFail;

    [Header("Properties")]
    [SerializeField] private bool _IsTouchingWall = false;
    [SerializeField] float _TimeTouchingWall = 0f;
    [SerializeField] float MoveSpeed = 2f;

    [Header("Observations")]
    [Tooltip("Now Normalized Extraction Point coordinate")]
    [SerializeField] public Vector3 NormalizedDistanceFromGoal;
    [SerializeField] public Vector3 NormalizedPlayerPosition;

    [Header("Input")]
    [SerializeField] private Vector3 _MovementVector3 = Vector3.zero;

    private bool jump = false;
    private bool fullyGrounded = false;

    public override void Initialize()
    {
        base.Initialize();
        _thirdPersonController = GetComponent<ThirdPersonCharacter>();
        _SpawnPoint = transform.position;
    }

    private void Respawn()
    {
        transform.position = _SpawnPoint;
    }

    public override void OnEpisodeBegin()
    {
        base.OnEpisodeBegin();

        _MovementVector3 = Vector3.zero;

        _TimeTouchingWall = 0f;

        _OnEpisodeBegin?.Invoke();

        transform.position = _SpawnPoint;
    }

    public override void Heuristic(in ActionBuffers actionsOut)
    {
        var discreteOut = actionsOut.DiscreteActions;

        if (Input.GetKey(KeyCode.A)) discreteOut[0] = 1; // L
        else if (Input.GetKey(KeyCode.D)) discreteOut[0] = 2; // R

        if (Input.GetKey(KeyCode.W)) discreteOut[0] = 3; // Up/forward
        else if (Input.GetKey(KeyCode.S)) discreteOut[0] = 4; // down/backward
    }

    public override void OnActionReceived(ActionBuffers actionsOut)
    {
        var discreteActions = actionsOut.DiscreteActions;
        switch((int)discreteActions[0])
        {
            case 1: //L
                _MovementVector3 = MoveSpeed * Vector3.left;
                break;
            case 2: // R
                _MovementVector3 = MoveSpeed * Vector3.right;
                break;
            case 3: // Up
                _MovementVector3 = MoveSpeed * Vector3.forward;
                break;
            case 4: //down
                _MovementVector3 = MoveSpeed * Vector3.back;
                break;
            case 0: _MovementVector3 = Vector3.zero;
                break;
        }

        _thirdPersonController.Move(_MovementVector3, false, jump);
        jump = false;


        if(_IsTouchingWall
            && _TimeTouchingWall > 75f)
        {
            _IsTouchingWall = false;
            _TimeTouchingWall = 0f;

            SetReward(-.1f);
            EndEpisode();
            _OnEpisodeFail?.Invoke();
        }
    }

    private void FixedUpdate()
    {
        if(_IsTouchingWall)
        {
            _TimeTouchingWall += 1.0f * Time.fixedDeltaTime;
        }

        if(transform.position.y < -10f)
        {
            SetReward(-0.5f);
            EndEpisode();
        }
    }


    private void OnCollisionExit(Collision collision) => OnCollisionTriggerExit(collision.collider);
    private void OnTriggerExit(Collider other) => OnCollisionTriggerExit(other);
    private void OnCollisionEnter(Collision collision) => OnCollisionTriggerEnterStay(collision.collider);
    private void OnTriggerEnter(Collider other) => OnCollisionTriggerEnterStay(other);

    private void OnCollisionTriggerExit(Collider other)
    {
        if((other.gameObject.tag == "Water"
            || other.gameObject.tag == "Obstacle") && _IsTouchingWall)
        {
            _IsTouchingWall = false;
        }
    }

    private void OnCollisionTriggerEnterStay(Collider other)
    {
        if (other.gameObject.tag == "Goal")
        {
            SetReward(1f);
            EndEpisode();
            _OnEpisodePass?.Invoke();
        }

        if (other.gameObject.tag == "Water"
            || other.gameObject.tag == "Obstacle")
        {
            if (_IsTouchingWall == false) _IsTouchingWall = true;

            AddReward(-.075f);
            //EndEpisode();
            //_OnEpisodeFail?.Invoke();
        }
    }

    private Vector3 NormalizePositions(Vector3 input, Vector3 min, Vector3 max, out Vector3 vec)
    {
        vec.x = (input.x - min.x) / (max.x - min.x);
        vec.y = (input.y - min.y) / (max.y - min.y);
        vec.z = (input.z - min.z) / (max.z - min.z);
        return vec;
    }

    public override void CollectObservations(VectorSensor sensor)
    {
        // the bounds are determined on the fly when level starts,
        // so this is a simple getter.
        var b = _LevelBounds.GetGroupedBounds;

        /// OBSERVATION #1 - 3x float (x, y, z)
        if (ExtractionPoint != null)
        {
            NormalizePositions(ExtractionPoint.position, b.min, b.max, out NormalizedDistanceFromGoal);

            sensor.AddObservation(NormalizedDistanceFromGoal);
        }
        else
        {
            Debug.Log($"My extraction point is null.", gameObject);
            sensor.AddObservation(Vector3.zero);
        }

        NormalizePositions(transform.position, b.min, b.max, out NormalizedPlayerPosition);
        NormalizedPlayerPosition.y += .28f;

        // 3x float (x,y,z)
        sensor.AddObservation(NormalizedPlayerPosition);

        // 3x float (x,y,z)
        sensor.AddObservation(transform.forward);

        //AddReward(-0.00006f);
    }
}

As is, if the AI happens to spawn close to the goal cube and can see it, goes right for it. Other than that, it seems to be “no thoughts, head empty” when it comes to interpreting the goal’s normalized position.

If there’s any other information you guys need, let me know. Thank you in advance for your help with figuring this new & exciting technology out!

mrmiketheripper · April 23, 2021, 3:03pm

Video of the agent after about 200,000 steps:

EDIT: Something else I thought of but don’t think it’s messing the AI up that much:

You can see there’s about a .6f difference in the Y between the goal position and player position. It’s pretty small, but could be huge to the AI? Also an interesting observation that when the AI doesn’t know what to do, it wants to go to the upper limits on X/Z

mrmiketheripper · April 23, 2021, 4:33pm

Update: Even after removing all Ray Perception Sensors and making sure the only observations were NormalizedPosition, NormalizedGoalPosition, and NormalizedRotation, it still seems to only want to run into walls and mostly run towards the two extremes (x=1, y=1) of the map. Seems to have 0 interest in moving towards the goal.

mrmiketheripper · April 23, 2021, 5:58pm

Ah, I think I’ve figured it out. I stripped everything down to a basic set. No ray sensors, just observing normalized goal position and normalized player position and it still performed very poorly. Well, I made the mistake of basing my player off of the Standard Assets ThirdPersonPlayerController which continually rotates whatever GameObject its applied to. This was definitely causing confusion on the AI’s part. I changed it to a super simple RigidBody movement script with no rotation and within 50,000 steps it seems the AI has already grasped the basic concept of moving towards his goal. He even figured out how to (eventually) go around the walls.

I’m doing a test run now with a RayPerceptionSensor3D attached that just detects walls. I’m excited to see how this run goes!

celion_unity · April 26, 2021, 6:31pm

Sorry for the delayed response, but glad you got it sorted out…

Topic		Replies	Views
Agent can't figure out how to walk around a wall Unity Engine ML-Agents , Question , com_unity_ml-agents	16	4441	January 29, 2021
Not avoiding walls... Unity Engine ML-Agents , com_unity_ml-agents	2	1187	July 23, 2020
Agent dont use ray perceprion sensor 3d Unity Engine ML-Agents , Question , com_unity_ml-agents	5	815	July 31, 2023
Does an agent understand location of tagged objects when using Ray Percept Sensor 3D Unity Engine ML-Agents , Question , com_unity_ml-agents	1	197	May 7, 2024
rayperceptionsensor, override void CollectObservations, help Unity Engine ML-Agents , com_unity_ml-agents	1	794	March 1, 2022

Agent REALLY seems to like walls...unless it can *see* the goal in RayPerceptionSensor?

Related topics

Agent REALLY seems to like walls...unless it can see the goal in RayPerceptionSensor?