CSE 399-004: Python Programming Lecture 3.5: Alpha-beta Pruning January 22, 2007 http://www.seas.upenn.edu/~cse39904/
Slides mostly as shown in lecture
Scoring an Othello board and AIs A simple way to "score" an Othello board: number of white pieces number of black pieces The white player wants to maximize this number The black player wants to minimize this number An AI for each side is either trying to maximize or minimize that number for the final board position 3
Minimax But there's no way to predict what the final board position will be So white could choose moves such that the minimum score at any potential final board is as great as possible This gives rise to so-called minimax algorithms 4
Drawing games as trees We can draw two-player games as trees Each node in the tree represents a "position", for example the board in an Othello game At each position, it is one of the players turns An arrow points from position X to position Y if the player's move at position X results in Y 5
current position (your move) 45 47 64 50 33 41 20 37 52
bad move! you'll end at 41 current position (your move) 45 47 64 50 33 41 20 37 52
bad move! you'll end at 37 current position (your move) 45 47 64 50 33 41 20 37 52
best move! you'll end at 45 current position (your move) 45 47 64 50 33 41 20 37 52
best move! you'll end at 45 current position (your move) 45 47 64 50 33 41 20 37 52 we've maximized the minimum score we can get
Computation is expensive Looking ahead to the end of the game is usually not feasable: too much computation involved Cheap solution Compute the score for the board after each possible move you can make Choose the move that gives you the highest score Problem: What looks good here might look really bad after a few more moves 10
Alpha-beta pruning We can't scan the entire tree The cheap solution is too cheap Intermediate solution: Alpha-beta pruning Looks out some number of moves (called the "depth") Attempts to avoid examining moves which are obviously incredibly bad. The trick: stop examining a move if it's worse than one you've already found. 11
A note about the following code The code is very similar to that found on Wikipedia: http://en.wikipedia.org/wiki/alpha-beta_pruning 12
Alpha-beta pruning: pseudocode function abp(node, depth, alpha, beta): if depth is zero: return the score of node if node has no children: return the score of the node for each child of node: alpha = max(alpha, -abp(child, depth-1, -beta, -alpha)) if alpha >= beta: return alpha return alpha alpha is the maximum minimum-score you've found so far 13
Alpha-beta pruning: pseudocode function abp(node, depth, alpha, beta): if depth is zero: return the score of node if node has no children: return the score of the node for each child of node: alpha = max(alpha, -abp(child, depth-1, -beta, -alpha)) if alpha >= beta: return alpha return alpha beta is the minimum maximum-score your opponent has found so far 14
Alpha-beta pruning: pseudocode function abp(node, depth, alpha, beta): if depth is zero: return the score of node if node has no children: return the score of the node for each child of node: alpha = max(alpha, -abp(child, depth-1, -beta, -alpha)) if alpha beta: return alpha return alpha thus, when alpha beta, give up because someone did something non-optimal 15
Alpha-beta pruning: pseudocode function abp(node, depth, alpha, beta): if depth is zero: return the score of node if node has no children: return the score of the node for each child of node: alpha = max(alpha, -abp(child, depth-1, -beta, -alpha)) if alpha beta: return alpha return alpha initially, call abp() with alpha = -infinity and beta = +infinity 16
Alpha-beta pruning: pseudocode function abp(node, depth, alpha, beta): if depth is zero: return the score of node if node has no children: return the score of the node for each child of node: alpha = max(alpha, -abp(child, depth-1, -beta, -alpha)) if alpha beta: return alpha return alpha reverse the roles of alpha and beta because this next call is from the view point of your opponent 17
Alpha-beta pruning: pseudocode function abp(node, depth, alpha, beta): if depth is zero: return the score of node if node has no children: return the score of the node for each child of node: alpha = max(alpha, -abp(child, depth-1, -beta, -alpha)) if alpha beta: return alpha return alpha for the same reason, negate the result of this call 18
Problems with the pseudocode It returns a score, which is great, but you really need to return what move gets you that score! ±infinity are for a generic version Can replace -infinity with the least score possible Can replace +infinity with the greatest score possible 19
Completely new slides (mainly for Homework 4)
Pseudocode function abp(node, depth, alpha, beta): if depth is zero: return the score of node if node has no children: return the score of node for each child of node: alpha = max(alpha, -abp(child, depth-1, -beta, -alpha)) if alpha beta: return alpha return alpha See that highlighted minus sign? Big bug I accidentally made in class. 21
Pseudocode function abp(node, depth, alpha, beta): if depth is zero: return the score of node if node has no children: return the score of node for each child of node: alpha = max(alpha, -abp(child, depth-1, -beta, -alpha)) if alpha beta: return alpha return alpha This version of the alpha-beta pruning algorithm attempts to maximize alpha It is the corrected version of what I showed in class 22
Pseudocode: Problems! function abp(node, depth, alpha, beta): if depth is zero: return the score of node if node has no children: return the score of node for each child of node: alpha = max(alpha, -abp(child, depth-1, -beta, -alpha)) if alpha beta: return alpha return alpha It doesn't return an actual move It doesn't worry about the case where one player goes twice in a row (this can happen in Othello!) 23
10 20 5 60 30 30 15 circles: our move squares: opponents move we want to maximize the minimum score we can get
tree here has depth 2 node A 10 20 5 60 30 30 15 our move, so we call: abp(a, 2, -, + )
alpha = - node A 10 20 5 60 30 30 15 we'll keep track of the current value of alpha and beta in the call to abp() on a given node
alpha = - node A node B alpha = - 10 20 5 60 30 30 15 A has children, so make a recursive call to abp: abp(b, 1, -, + )
alpha = - node B node C alpha = - 10 20 5 60 30 30 15 alpha = - B has children, so make a recursive call to abp: abp(c, 0, -, + )
alpha = - node B node C alpha = -10 10 20 5 60 30 30 15 alpha = - At C, depth is zero, so just return 10. At B, -10 is greater than -, so update alpha. (Remember that minus sign in the algorithm?)
alpha = - node B alpha = -10 10 20 5 60 30 30 15 let's grey out nodes that we've already considered
alpha = - node B alpha = -10 10 20 5 60 30 30 15 We now call abp() on the other child of B, which returns 20. At B, -20 is not better than -10. No update.
alpha = 10 node A alpha = -10 10 20 5 60 30 30 15 That call to abp(b, 1, -, + ) from before returns -10 now. At A, we update alpha to 10. (Again, remember that minus sign?)
alpha = 10 node D alpha = - beta = -10 10 20 5 60 30 30 15 Move on to the next child of A: abp(d, 1, -, -10) # note the values here!
alpha = 10 node D alpha = -5 beta = -10 10 20 5 60 30 30 15 We're at D. Call abp() on the first child of D. That returns 5. Update alpha at D.
alpha = 10 node D alpha = -5 beta = -10 10 20 5 60 30 30 15 We're at D. Fiddlesticks. alpha beta. Simply return alpha, skip the rest of D's children.
alpha = 10 node D 10 20 5 30 15 I've erased the nodes we skipped. We're done with D. 5 is not greater than 10, so no update at A.
alpha = 10 node E alpha = - beta = -10 10 20 5 30 15 A has one more child. Call abp(e, 1, -, -10).
alpha = 10 node E alpha = -30 beta = -10 10 20 5 30 15 After the first child of E.
alpha = 10 node E alpha = -15 beta = -10 10 20 5 30 15 After the second child of E.
alpha = 15 node E alpha = -15 beta = -10 10 20 5 30 15 The call to abp() on E returns -15. At A, 15 10, so update alpha. Our original call to abp(a, 2, -, + ) returns 15!
10 20 5 60 30 30 15 Looking back at the entire, original tree, this looks right.
Additional resources http://en.wikipedia.org/wiki/alpha-beta_pruning Probably overkill, as usual In fact, maybe only read this if you like really dry text But it is where I got the psuedo-code from http://www.seanet.com/~brucemo/topics/ alphabeta.htm I don't like the code example here so much The introductory text is pretty good So read the intro stuff, skip the code 42