Playing games with AI
Let’s say you play a game with a friend. And the metric of “how good you perform in that game” is measured by a numeric score which increases when you get closer to “winning the game” than your friend and decreases when your friend gets closer to “winning the game” than you. In this setting, you can think of yourself as trying to maximize the score and your friend as trying to minimize it.
We can come up with an algorithm able to make good decisions and win such a game by modeling the above situation in the following way: we will have 2 entities (functions) that call each other; one tries to maximize the score, the other to minimize it. Basically, these 2 functions will mimic the two players.
This algorithm is also a good example of AI that’s not ML. Some people make this confusion and think that AI = ML; in reality, ML is a subset of AI. Some AI techniques don’t involve ML. The minimax algorithm is such an algorithm that makes computers behave intelligently but they are not learning anything. And despite that, it works quite well in many games.
If we think of a game in terms of these 2 players, Max & Min, changing turns with each other, then we can represent the game as a tree of decisions. Let’s look at a quite simple example:
Each node in this tree (except for the terminal nodes) represents a decision that should be made at that moment in the game. We decide which move to take. In this example, we have to choose from only 2 movements, but in general, we can have any number of movements, and this number can vary from node to node, depending on the state of the game.
The top node (the one at depth 0) is the current state of the game. Here is where we have to decide the next movement in the game. Here we start with Max, as it is the player that wants what we do: to maximize the score. Instead of deciding based only on the next possible states of the game that Max can reach from this point, he thinks: “After I will do one of these moves, what will do my enemy Min?”
So, it calls Min and says: “Hey, what move will you take if I choose left?”, and after that: “What move will you take if I choose right?”.
After Max finds what Min would do, he chooses the branch that will give him the maximum score.
But wait. How will Min decide what to do to minimize the score? He will apply the same strategy as Max. Min in turn will call Max to ask him what he will do for each possible choice of Min. But after that, instead of making the choice that maximizes the score, Min will do the opposite: will make the choice that minimizes the score.
And so on … Each one will call the other, building a big tree in a recursive fashion like this until they reach a terminal state.
A terminal state should be, preferably, a state in which the game is ended. But that’s usually too expensive computationally; letting the algorithm explore all the possible moves until the game ends may take an extremely long time. So, we set a maximum depth. A game state will be considered terminal when either the game is over, or the maximum depth is reached.
In the example above, the terminal states are the ones on the bottom (depth 2).
What will happen on the terminal nodes?
Whoever player will reach terminal states, be it Max or Min, will not be able to apply the same strategy used so far; the strategy of calling the other player for help.
Now, in terminal nodes, we need to calculate the score of the game in each one of these terminal states.
This may not be so obvious in some games, but we need to, at least estimate the score based on that state of the game.
In our example above, the scores of the terminal states are those numbers on the bottom: 1, 2, 3, and 4.
The only thing that the Max player will do at this last level is to return these scores to the calling Min on the previous level.
After this is done, our tree will look like:
The choices that Min will do are marked with an arrow. The scores are propagated from level 2 to level 1 (Min’s level).
Now, after we know what will happen on depth 1, we let Max do his move on level 0:
This means that the optimal move to do at the current state of the game is the one represented by the right edge of that top node, and the score that we will get is at least 3.
I say “at least” because this algorithm runs as if both Max & Min always choose their best move. But if, in reality, our enemy is worse than Min, we may get a better score.
So now, how can we express this algorithm clearly in code? The above tree was just to describe it intuitively. A computer doesn’t need to construct such a tree explicitly. All we need are 2 functions: maximize & minimize that will call each other.
Here is a sketch of the above-described algorithm:
maximize() function returns a tuple that contains on its first position the child state of the game that maximizes the score, and on the second position is an estimate of the score that will be reached by following that state. An analog quantity is returned by the
decision() function below takes as input the current state of the game and returns the state that should be followed if we want to maximize our score.
eval() above is a function that estimates the score of its input state of the game.
Minimax with α — β pruning
The larger the depth, the better our player will be. But this algorithm can take a while if we set a large depth for it. So, we should choose the maximum possible depth that still fits our time requirements. Can we make this algorithm a little faster, so that we can use a larger value for depth, while also obeying the time constraint?
Let’s look at the following example:
To decide the move of the top node, do we need to evaluate all the nodes in the tree?
α — β pruning is a strategy that we can use to improve the Minimax algorithm by ignoring some branches of the tree that we know ahead of time they won’t help us in making the optimal decision. The name α — β pruning comes from the 2 parameters used in this algorithm which are α and β.
Now, how this method works?
Each time we establish the score of a node, we also propagate it upwards in the tree to its parent and use it to set a lower or upper bound of the outcome of the parent node.
For example, in our tree, if we start evaluating the nodes from left to right after we establish the score of the leftmost terminal node (which is 4) we know that the outcome for the parent node cannot be greater than 4. That’s because the parent is a Min node, and if we already have 4, the parent node cannot choose something bigger than this.
But, at this point, this information doesn’t help too much. Let’s continue and see what happens. We evaluate the next terminal node and find that the value for the parent is 3 and that the top node cannot be smaller than 3. And this piece of information will help.
If we continue to evaluate the nodes, we discover the terminal with a score of 2, and this means that its parent should be ≤ 2.
Now, do we need to also evaluate the last node? No. We already know that the top node can get at least a 3 if it follows the left branch. And we know that from the right branch we cannot get more than 2. So, we can just ignore the right branch from this point on.
In this small example, it may not seem like a big improvement as we skipped only one terminal node. But in a bigger tree, one such node may hold a quite large subtree. And note that here we skipped only one sibling node because that was the only remaining one in that branch. If there were more remaining nodes, we would skip all of them. For example, if the right branch had 2, 1, and 0 as terminal nodes, then after we evaluate the 2, we will be able to skip both 1 and 0.
Also, how many nodes will be skipped will depend on the ordering of the terminal nodes. You probably noticed that for this example I reversed the scores from 1, 2, 3, 4 to 4, 3, 2, 1. That was not at random. When the scores are in ascending order, that’s the worst-case scenario in which no pruning happens. But there are few chances this bad ordering will arise in practice. On average, α — β pruning allows the minimax algorithm to go almost twice as deep in the same amount of time compared with no pruning.
Below is the pseudocode for the above-described algorithm.
Alpha is the biggest lower bound among (direct or indirect) parent Max nodes, and Beta is the smallest upper bound among (direct or indirect) parent Min nodes. In maximize, while iterating over child nodes, we update the alpha value to be the maximum score found so far and if this maximum score is bigger than beta we know that there is some parent Min node that will not choose our current branch, so we stop exploring the remaining child nodes. An analog thing happens to minimize.
The -/+ infinity in the decision function (first call to maximize) means that we begin the algorithm with no restriction on what the resulting score can be.
So, the minimax algorithm is a relatively easy algorithm that works well on simple games (low branching factor). It is also a good example of AI which isn’t ML.
In the next couple of articles, I will show how to use this algorithm (along with Selenium WebDriver) to create an AI capable of playing the 2048 game live on our screens.
I hope you found this information useful and thanks for reading!
This article is also posted on Medium here. Feel free to have a look!