Why is my Pac-Man agent unable to learn the movement?

Hi, I’m currently working on a Pac-Man agent to play through the game Pac-Man which is based on this GitHub project. I’m using Unity version 2022.3.20f1 (LTS) and Mlagents release 21. This is my current repository.

I have defined simple rewards for eating single pellets (0.01 for a normal pellet and 0.05 for a big one) and for eating all pellets (1.0). I also have a negative reward (-0.001), which is added in each step, with the background that the agent should take fewer steps. There is also a death reward (-1.0) if PacMan is eaten by ghosts. However, I don’t pay attention to the death reward at the beginning, as I’m training without ghosts for the time being.

The active episode ends when PacMan dies, when all the pellets have been eaten, i.e. the round is over, or when the agent has not managed to complete the round within 4000 steps. The eaten pellets are only reset in the event of death or successful completion of the round, not if the maximum number of steps is exceeded.

I am currently doing the observations with a camera sensor. I had previously tried this with a RayCast and normal Vector Observations. All so far without success.

Now let’s get into my problem in more detail. I think that the agent is not able to learn the movement properly. I have already started several attempts and let the agent train for a long time. I once had him train 8 million steps without ghosts. (This run was without the penalty each step). I think I trained it for 16 hours. The result was a model that had severe overfitting and simply always took the same path. This model did not understand how the game works and how to move.
9676166--1379099--upload_2024-3-2_14-35-37.png
The other training attempt was over 8 million steps, also about 17 hours. Now with penalty steps. Also I added a ghost in that run. The agent didn’t manage to learn how to avoid the ghost in that time. It looks like the decisions on how the agent moves are completely random.
9676166--1379102--upload_2024-3-2_14-40-51.png
I watched a video in which the creator also created a PacMan AI. He has the same problem but for me the agent doesn’t get stuck in a corner as often. His solution was to change the movement from a global state to a local state from PacMan’s perspective rather than from the top-down perspective. I tried to implement this but couldn’t get it to work.

I also don’t think that training for a 2D application like PacMan takes that long (over 10 hours). I train with 4 instances at the same time, all running at 20x speed.
I have now tried many other rewards and configurations, all without success. Do you perhaps have any idea what this could be? Is it really because of the movement or something else?

using System;
using Unity.MLAgents;
using Unity.MLAgents.Actuators;
using Unity.MLAgents.Sensors;
using UnityEngine;

[RequireComponent(typeof(Movement))]
public class PacManAgent : Agent
{
    [SerializeField] private AnimatedSprite deathSequence;
    private SpriteRenderer spriteRenderer;
    private Movement movement;

    private new Collider2D collider;

    private GameManager gamemanager;
    private int currentAction;

    // Constants
    private const float PelletReward = 0.01f;
    private const float PowerPelletReward = 0.05f;
    private const float NegativeRewardPerStep = -0.001f;
    private const float DeathReward = -1f;

    private const float WinReward = 1f;

    public override void Initialize()
    {
        spriteRenderer = GetComponent<SpriteRenderer>();
        movement = GetComponent<Movement>();
        collider = GetComponent<Collider2D>();
        gamemanager = FindObjectOfType<GameManager>();
        currentAction = 3;
    }

    private void OnTriggerEnter2D(Collider2D collision)
    {
        if (collision.CompareTag("Pellet"))
        {
            // timeSinceLastPellet = 0f;
            AddReward(PelletReward);
        }

        if (collision.CompareTag("PowerPellet"))
        {
            // timeSinceLastPellet = 0f;
            AddReward(PowerPelletReward);
        }
    }

    // For the gamemanager
    public void GiveWinReward()
    {
        AddReward(WinReward);
    }

    // For the gamemanager
    public void GiveDeathReward()
    {
        AddReward(DeathReward);
    }

    public override void OnActionReceived(ActionBuffers actions)
    {
        // Convert discrete actions to movement directions
        int movementAction = actions.DiscreteActions[0];
        Vector2 direction = Vector2.zero;

        switch (movementAction)
        {
            case 0:
                direction = Vector2.up;
                break;
            case 1:
                direction = Vector2.down;
                break;
            case 2:
                direction = Vector2.left;
                break;
            case 3:
                direction = Vector2.right;
                break;
        }

        if (!gamemanager.GameIsWon)
        {
            AddReward(NegativeRewardPerStep);
        }

        movement.SetDirection(direction);

        float angle = Mathf.Atan2(movement.direction.y, movement.direction.x);
        transform.rotation = Quaternion.AngleAxis(angle * Mathf.Rad2Deg, Vector3.forward);
    }

    public override void Heuristic(in ActionBuffers actionsOut)
    {
        // Allows manual control for testing purposes
        ActionSegment<int> discreteActions = actionsOut.DiscreteActions;

        if (Input.GetKey(KeyCode.W) || Input.GetKey(KeyCode.UpArrow))
        {
            currentAction = 0;
        }
        else if (Input.GetKey(KeyCode.S) || Input.GetKey(KeyCode.DownArrow))
        {
            currentAction = 1;
        }
        else if (Input.GetKey(KeyCode.A) || Input.GetKey(KeyCode.LeftArrow))
        {
            currentAction = 2;
        }
        else if (Input.GetKey(KeyCode.D) || Input.GetKey(KeyCode.RightArrow))
        {
            currentAction = 3;
        }

        discreteActions[0] = currentAction;
    }

    public override void OnEpisodeBegin()
    {
        enabled = true;
        spriteRenderer.enabled = true;
        collider.enabled = true;
        deathSequence.enabled = false;
        movement.ResetState();
        gamemanager.GameIsWon = false;
        gameObject.SetActive(true);
    }

    public void DeathSequence()
    {
        enabled = false;
        spriteRenderer.enabled = false;
        collider.enabled = false;
        movement.enabled = false;
        deathSequence.enabled = true;
        deathSequence.Restart();
    }
}
using System.Net;
using UnityEngine;
using UnityEngine.UI;
using UnityEngine.UIElements;

public class GameManager : MonoBehaviour
{
    public static GameManager Instance { get; private set; }

    [SerializeField] private Ghost[] ghosts;

    [SerializeField] private PacManAgent pacmanagent;

    [SerializeField] private Transform pellets;

    [SerializeField] private Text gameOverText;

    [SerializeField] private Text scoreText;

    [SerializeField] private Text livesText;

    public int totalLives;

    private int ghostMultiplier = 1;
    private int lives;
    private int score = 0;
    public bool GameIsWon = false;

    public int Lives => lives;
    public int Score => score;

    private void Awake()
    {
        if (Instance != null)
        {
            DestroyImmediate(gameObject);
        }
        else
        {
            Instance = this;
            DontDestroyOnLoad(gameObject);
        }
    }

    private void Start()
    {
        pacmanagent = pacmanagent.GetComponent<PacManAgent>();
        SetLives(totalLives);
        NewGame();
    }

    private void Update()
    {
        if (lives <= 0)
        {
            NewGame();
        }
    }

    private void NewGame()
    {
        SetScore(0);
        SetLives(totalLives);
        NewRound();
    }

    private void NewRound()
    {
        gameOverText.enabled = false;

        foreach (Transform pellet in pellets)
        {
            pellet.gameObject.SetActive(true);
        }

        ResetState();
    }

    private void ResetState()
    {
        for (int i = 0; i < ghosts.Length; i++)
        {
            ghosts[i].ResetState();
        }
        // pacmanagent.startTime = Time.time;
        pacmanagent.OnEpisodeBegin();
    }

    private void GameOver()
    {
        gameOverText.enabled = true;

        for (int i = 0; i < ghosts.Length; i++)
        {
            ghosts[i].gameObject.SetActive(false);
        }
        pacmanagent.gameObject.SetActive(false);
    }

    private void SetLives(int lives)
    {
        this.lives = lives;
        livesText.text = "x" + lives.ToString();
    }

    private void SetScore(int score)
    {
        this.score = score;
        scoreText.text = score.ToString().PadLeft(2, '0');
    }

    public void PacmanEaten()
    {
        pacmanagent.GiveDeathReward();
        pacmanagent.EndEpisode();
        pacmanagent.DeathSequence();

        SetLives(lives - 1);

        if (lives > 0)
        {
            ResetState();
        }
        else
        {
            GameOver();
        }
    }

    public void GhostEaten(Ghost ghost)
    {
        int points = ghost.points * ghostMultiplier;
        SetScore(score + points);

        ghostMultiplier++;
    }

    public void PelletEaten(Pellet pellet)
    {
        pellet.gameObject.SetActive(false);

        SetScore(score + pellet.points);

        if (!HasRemainingPellets())
        {
            pacmanagent.GiveWinReward();
            pacmanagent.EndEpisode();
            pacmanagent.gameObject.SetActive(false);
            GameIsWon = true;
            NewRound();
        }
    }

    public void PowerPelletEaten(PowerPellet pellet)
    {
        for (int i = 0; i < ghosts.Length; i++)
        {
            ghosts[i].frightened.Enable(pellet.duration);
        }

        PelletEaten(pellet);
        CancelInvoke(nameof(ResetGhostMultiplier));
        Invoke(nameof(ResetGhostMultiplier), pellet.duration);
    }

    public bool HasRemainingPellets()
    {
        foreach (Transform pellet in pellets)
        {
            if (pellet.gameObject.activeSelf)
            {
                return true;
            }
        }

        return false;
    }

    public Transform GetPellets()
    {
        return this.pellets;
    }

    private void ResetGhostMultiplier()
    {
        ghostMultiplier = 1;
    }
}
using UnityEngine;

[RequireComponent(typeof(Rigidbody2D))]
public class Movement : MonoBehaviour
{
    public float speed = 8f;
    public float speedMultiplier = 1f;
    public Vector2 initialDirection;
    public LayerMask obstacleLayer;

    public new Rigidbody2D rigidbody { get; private set; }
    public Vector2 direction { get; private set; }
    public Vector2 nextDirection { get; private set; }
    public Vector3 startingPosition { get; private set; }

    private void Awake()
    {
        rigidbody = GetComponent<Rigidbody2D>();
        startingPosition = transform.position;
    }

    private void Start()
    {
        ResetState();
    }

    public void ResetState()
    {
        speedMultiplier = 1f;
        direction = initialDirection;
        nextDirection = Vector2.zero;
        transform.position = startingPosition;
        rigidbody.isKinematic = false;
        enabled = true;
    }

    private void Update()
    {
        // Try to move in the next direction while it's queued to make movements
        // more responsive
        if (nextDirection != Vector2.zero)
        {
            SetDirection(nextDirection);
        }
    }

    private void FixedUpdate()
    {
        Vector2 position = rigidbody.position;
        Vector2 translation = direction * speed * speedMultiplier * Time.fixedDeltaTime;

        rigidbody.MovePosition(position + translation);
    }

    public void SetDirection(Vector2 direction, bool forced = false)
    {
        // Only set the direction if the tile in that direction is available
        // otherwise we set it as the next direction so it'll automatically be
        // set when it does become available
        if (forced || !Occupied(direction))
        {
            this.direction = direction;
            nextDirection = Vector2.zero;
        }
        else
        {
            nextDirection = direction;
        }
    }

    private bool Occupied(Vector2 direction)
    {
        // If no collider is hit then there is no obstacle in that direction
        RaycastHit2D hit = Physics2D.BoxCast(
            transform.position,
            Vector2.one * 0.75f,
            0f,
            direction,
            1.5f,
            obstacleLayer
        );
        return hit.collider != null;
    }
}
1 Like

Did you managed to solve it?

No, unfortunately not. I gave up the project and made an artificial intelligence with Stable Baseslines 3 and Gymnasium, which worked well.

For Pacman?

Yes.