The wonderfully confusing world of MiniMax!

Hi all,

I’ve been scared of AI, not because of terminator, it just seemed hard to implement so just avoided it!
That is until I accidentally stumbled into a set of problems building what I initially thought was a “simple” 2D game.

I’m considering going down the Minimax route with Alpha beta pruning to find best move for the AI, I think I understand the basics of MiniMax, but I’m unsure about how to go about implementing it, assigning values to moves and especially what is called “the heuristic evaluation function”. I’ve found a code snippet here:

http://snipd.net/minimax-algorithm-with-alpha-beta-pruning-in-c But the eval function changes from game to game, how do you even start to write the evaluation function, that part seems to be lacking in explanation in the places I’ve looked. Is that the part that’s assigning or weighting the values of moves?

Has anyone got any experience of designing 2D AI with minimax? Are there any decent examples or tutorials? The only one’s I’ve found are youtube lectures that give you a taste with no real substance.

All the links with maths or theory are welcome!! Got into surreal numbers last night, good times :slight_smile:

Well.
Weeeeeeeell.

The first thing you need to do is to take care of the agent’s (the AI’s) sensors. What information does the NPC have about the world? Depending on the game, you’ll might want to limit it to prevent the NPC from being all-knowing.
The core of your AI is the function that takes the input from the sensors and, based on it, decides which action to take next. That’s where your graphs/trees and the algorithms to search the solutions within them sit.

The edges of your graph (or tree - you have to decide) represent an action. The nodes they connect represent the states of the world prior and after performing said action. It is basically a “what would be if”-graph (or tree). Given the current situation, you look at all the options you could take, then, looking from all those new situations, you look at all the options the opponent could take from there, etc. Once you reach a certain depth (or the end), you use minimax, expectiminimax, or whatever to determine the best step to take.

You said that the evaluation function is confusing you. That’s no shame since they are the meat of the algorithms you’re working with. They make or break everything.

What is the “heuristic evaluation function”?
Let’s take the example from your link: Tic-Tac-Toe.
So you’ve constructed your tree and you have a set of “final-states” and, by traversing the tree back to its root, the sequence of actions required to reach said state (let’s assume you have “x” of them, with x being a rather big number).
Now, imagine that this wasn’t a computer but you, in real life. I come to you, presenting all those possible outcomes. The obvious question you need to ask yourself is:
“Which one is the best for me?”

That’s where the evaluation function comes into play. Your evaluation function takes a given state (situation) and, knowing your goal, it rates it; It gives the state a score. You (that is: the algorithm) then can simply look at the scores and choose the best one.

How does such a function look like?
Well, that’s the big question isn’t it? It’s the central question and the meat of your game’s AI. The creation of the trees, the priority queues and all the other algorithms and data structures can be looked up and copied. It’s the evaluation function that you need to tailor towards your game.
Let’s look at a really simple example: Pathfinding (or: How do I get to where I need to be?)
The algorithm you’ll probably use will be A* (or some variation thereof). The evaluation of a state is super-simple:
traveled_distance + remaining_distance
The lower the result, the better. Think about what happens.
The state you’d be in after taking a step towards your goal will be rated better, than the one where you took a step away from it, because of the remaining distance. Getting closer to your destination = better.
Furthermore, taking a step away from it and then walking towards it will be worse than going towards it from the beginning (because of the first summand). Shorter paths = better.
The end result will be that A* will find and choose the shortest path that gets you to your destination.

I hope this helps you a bit.
You have to be able to look at various situations in your game and put a number on them. Once you have that, the rest is easy.

2 Likes

Oh my I didn’t think anyone would answer this thank you very much @TheSniperFan for taking the time to give such a detailed post.

I’ve been digesting more and more information about the subject, the game I’ve been trying to make is Chomp, I think the logic would be very similar to tic-tac-to, I’ve come across things such as perfect positions in the game and I’m thinking weight those decisions much higher than the other options, and if the human player doesn’t take the perfect position, move into that position, and then symmetrically reflect the plays of the human.

I think I’ve got the algorithm down in my head, but when thinking about coding it I’m confused at how I scan the matrix or organise the values in a n x m grid. For instance in Chomp when a move is made all pieces to the right and north are “eaten”, this leaves an L like shape, I’ve been looping round to destroy the points with a for loop from i to gridResolutionX, but that approach breaks down on the second move, as you have these L types of chunks taken out leaving scattered arrays. I think I need an array per row to keep track of the moves.

I’m not sure whether to get into using lists for this to keep track of the remaining objects in the particular rows. Or whether it’s possible to look at the size of an array at a particular index in a multidimensional array.

That should work.
The most naïve implementation for Tic-Tac-Toe is to only use three numbers 1, 2 and 3. A state scores a 1, if the player wins, a 2 if it’s either undecided or no one won and a 3, if the opponent wins.

I’ve just looked into this game for a second. Apparently you have a matrix where every entry is either filled or empty. Each value in your grid has one of two possible values. That is convenient, because you can store it extremely efficiently, but more on that later.
The easiest way is to go with arrays. In C# you have three ways to deal with that situation:

  • Two-dimensional array (bool[,] field)
  • Jagged array (bool[ ][ ] field)
  • One-dimensional array (bool[ ] field) (You can store every multi-dimensional array in a one dimensional one, with a bit of simple maths)

I would recommend you to go with the jagged array, which is an array of arrays. For you there is no practical difference between the first two options, but the latter tends to be faster for many use-cases due to how the data is stored/accessed in memory. Each line is an array, all of which are stored in an “array of lines”.

// This is the array at the beginning of a 3 x 3 game
true, true, true
true, true, true
true, true, true

// After you've taken the one in the upper right
false, false, false
true,  true,  false
true,  true,  false

// This is the same field but stored in a one-dimensional array
// See if you can understand how it works, just by looking at it closely,
// comparing it to the above one
false, false, false, true, true, false, true, true, false

Now, I mentioned an “extremely efficient” way to store this. I would recommend you to not look into this until you have finished your game though. Do it one step at a time.
The idea is that you can store that entire field within a single number. No array of numbers, just a single variable. It’s called a “bitmask”.
You use the fact that your computer works in the binary system. So a number is, internally, a sequence of ones and zeros. Rather than using an array of booleans for storing the values, you can store the same kind of data in the individual digits within the number.

byte f = bool[8]
short f = bool[16]
int f = bool[32]
long f = bool[64]

If you’re done with your game and still want to improve it, look into this. First convert your game from using a jagged array to using a one-dimensional one. After that, look into bit-wise operators, logic and bitmasks to store your game’s field in a single number (or an array of numbers, if you want fields which contain more than 64 tiles).

1 Like

Wow thank you again for such an insightful response, it’s truly appreciated :slight_smile: I think I understand and grasp some the concepts you’re talking about, so basically store an array of bools to see what is on/off, and decipher the best move from their, potentially I could have another array of bools for perfect positions, and could compare the two together. So scan though the matrix to see what bits are left, cross check with winning position table, and if there is a winning position left move into it.

I think I’m starting to wrap my head around the bitmask bit, (but I did not know an int has 32 bits in it), I used to mess around with Arduino’s and made a few button pad thingys a year or two ago, and think I used the >> bit shifting stuff to scan through a button matrix.

So an 8 x 15 grid is 120, so would you use two longs? I think two separate arrays might fit my game a bit better as I make a separation of X and Y, I’ve been parsing and naming, so it’s easier to keep track of! But everything is in a one dimensional array called grid, so I think it should be fairly straightforward to fit a bitmask next to it in the Start function.

This is pretty much the finished game, obv without a good AI yet, you can see my first attempt using random, but it’s a slimey cheating cretin that skips turns at the end! I’m gonna give the array madness a crack over the next few days and see if I can get something working. Thank you for your help you’ve certainly helped clear things up for me.

using UnityEngine;
using System.Collections;
using System.Collections.Generic;
using UnityEngine.UI;

public class TransformGridVSComputer : MonoBehaviour {

    public Transform prefab; // used with a cube of scale 0.5.

    public int gridResoX = 15;
    public int gridResoY = 8;

    private int newGridResoX;
    private int newGridResoY;

    private int row;
    private int column;
    private bool playerTurn;
    private float delaySec = 3f;

    public Text playerGo;
    public Button restart;
    public GameObject button;
    private Vector3 lastPosition;

    Transform[] grid;

    public void StartPoints ()
    {
        grid = new Transform[gridResoX * gridResoY];
            for (int i = 0, y = 0; y < gridResoY; y++) {
                for (int x = 0; x < gridResoX; x++) {
                    grid[i] = CreatePoints(x, y);
            }
        }
        //newGridResoX = gridResoX;
        //newGridResoY = gridResoY;
        playerTurn = true;
        UpdatePlayerGo ();
    }

    void Awake ()
    {
        StartPoints();
    }

    void UpdateMouse()
    {
        lastPosition = Input.mousePosition;
    }

    void UpdatePlayerGo()
    {
        if (playerTurn == true)
        {
            playerGo.text = "Your Go...";
            playerTurn = false;
        }
        else
        {
            playerGo.text = "AI's Go...!!";
            playerTurn = true;
        }
    }

    void Update()
    {
        if (Input.mousePosition != lastPosition) {
            RaycastHit hit;
            Ray ray = Camera.main.ScreenPointToRay (Input.mousePosition);
            if (Physics.Raycast (ray, out hit))
            {
                if (Input.GetMouseButtonDown (0) && hit.collider.tag == "Player")
                {
                    if (playerTurn == false)
                    {
                        UpdatePlayerGo ();

                        DestroyPoints (hit.collider.name);

                        UpdateMouse ();
                        StartCoroutine(AIPlay());
                    }
                }
            }
        }
    }

    private static int numberOfMoves(int x, int y)
    {
        return x * y;
    }
       
    IEnumerator AIPlay()
    {
        int x = Random.Range (1, gridResoX);
        int maxY = newGridResoY;

        int y = Random.Range (1, gridResoY);
        int maxX = newGridResoX;

        if (x > maxY && maxX != 0)
        {
            x = Random.Range (1, maxY);
        //    Debug.Log ("X =" + x);
        }

        if (y > maxX && maxY != 0)
        {
            y = Random.Range (1, maxX);
            //    Debug.Log ("X =" + x);
        }

        WaitForSeconds wait = new WaitForSeconds (3f);
        yield return wait;
        string pointRandom = x + "/" + y;
        DestroyPoints(pointRandom);
        UpdatePlayerGo ();
        Debug.Log("AI's move: " + pointRandom);

    }

    void GameEnd()
    {
        button.gameObject.SetActive(true);
    }

    void DestroyPoints(string name)
    {
        string[] lines = name.Split(new char[]{'/'});
        column = int.Parse(lines[0]);
        row = int.Parse(lines[1]);

        if (row == 0 && column == 0)
        {
            //Debug.Log("happens");
            if (playerTurn == true)
            {
                playerGo.text = "Sorry you Loose!...";
            }
            else
            {
                playerGo.text = "Well done you won!...";
            }
            GameEnd ();
        }

        StartCoroutine (Destruction ());
        newGridResoX = column;
        Debug.Log ("Grid X =" + newGridResoX);
        newGridResoY = row;
        Debug.Log ("Grid Y =" + newGridResoY);
        //hit.collider.gameObject.SetActive(false);
    }

    private IEnumerator Destruction ()
    {
        WaitForSeconds wait = new WaitForSeconds (0.1f);
        for (int x = column; x < gridResoX; x++)
        {
            for (int y = row; y < gridResoY; y++)
            {
                string backParse = x + "/" + y;
                Destroy (GameObject.Find (backParse), 0.01f);
                //GameObject.Find(backParse).SetActive(false);
            }
            yield return wait;
        }
    }

    Transform CreatePoints(int x, int y)
    {
        Transform point = Instantiate<Transform>(prefab);
        int x1 = x + 1;
        int y1 = y + 1;
        point.name =  x + "/" + y;
        point.parent = transform;
        point.localPosition = GetCoordinates (x, y);
        point.GetComponent<MeshRenderer> ().material.color = new Color
        (
            (float)x / gridResoX,
            (float)y / gridResoY,
            (float)y / gridResoX
        );
        return point;
    }

    Vector2  GetCoordinates(int x, int y) {
        return new Vector2
        (
            x - (gridResoX - 1) * 0.5f,
            y - (gridResoY - 1) * 0.5f
        );
    }
}

If you are new to AI, I recommend first implementing something more simple like Dijkstra or more preferrably A* and when you understand that you add the pruning techinique. Which basically speeds the algorithm up. If you do that minmax will be much easier to understand. ^^

It’s not that I don’t understand the theory, as ever with me, it’s the mechanics of the implementation, getting the right data structures. Or maybe it could be the recursion part that makes my brain overflow a little!

Is A* really easier than minimax? I’ve avoided pathfinding as I don’t make those kind of games!

MinMax is pathfinding pruning. A* is just pathfinding but finding the right board state in say chess is done with pathfinding. Instead of expanding nodes you are walking to you expand possible board states. (atleast thats one way of doing it) And MinMax with alpha beta pruning is just removing some of the possible board states so it’s like removing possible walkable nodes that are for some reason not interesing in pathfinding to make it run faster. If you look at it that way its easier to grasp thats what I meant with my ‘tip’.

A* is easier to implement than minmax, yes since minmax already needs some sort of pathfinding. And If you do implement pathfinding you will already have (most of) the datastructures already set up.

You think they are separated these things but its the same. You basically dont do MinMAx without pathfinding anyway, ergo its not the implementation its the theory you dont grasp yet

Wow you’re right, I totally didn’t look at it like that way at all, I did think Minimax and pathfinding were very separate entities, that actually cleared things up mentally, I guess either way it’s searching for the best possible solution. I think I was a little to quick to respond. So does A* have a heuristic evaluation function as well? Would that be measuring distance and rewarding most for the shortest possible path for instance?

They differ in my head as pathfinding seems more concrete than minimax, transversing a graph seems more logical and concrete, minimax seems slightly more abstract in a way as you’re considering the opponents future moves, as well as what the best move is to play, so in my head I couldn’t really equate distance search to best move search (best-first over depth-first search), but you’ve cleared that up for me, I guess it’s just a slightly different reward or heuristic evaluation function.

I’ve done a bit of research seems A* and Minimax are different entities with different use cases, best first search is informed and Depth first and breadth first search are un-informed, best first uses problem specific information whereas DFS, BFS are essentially blind searches. Do you mean searches are the same as pathfinding? As in you basically don’t do MiniMax without searching?

I didn’t know Chomp was gonna get this deep damn it!

A* and minimax are very different indeed. The most fundamental difference is that minimax is for games with more than one player, whereas A* is not. Neither algorithm is complicated (as you can see by looking at their short pseudo-code). It’s more about the datastructures in both cases. It’s true that if you have done A*, it should be easy to do minimax, but the same thing applies the other way around. In terms of difficulty I would say that A* and minimax are equally difficult, whereas alpha-beta-pruning is harder than both of them (on account of it being minimax + something extra).

Once you have a graph, you can search it with BFS, DFS, UCS, A*, minimax, expectiminimax and about a million other algorithms and variations thereof. Once you have the graph that is.

Since chomp is a two-player game, you will use minimax, expectiminimax or something comparable. Using A* would result in an AI that would be trivial to outplay.

I did not mean A* and minmax were the same :slight_smile: I meant if you do a pruning techinique you will need the data structures AND the pathfinding in place anyway, so I would do it incrimentally in steps. Im sorry for being confusing (and/or abit wrong)