CS 331: rtificial ntelligence dversarial Search 1 Games we will consider Deterministic Discrete states and decisions Finite number of states and decisions Perfect information ie. fully observable Two agents whose actions alternate Their utility values at the end of the game are equal and opposite (we call this zero-sum) t s not enough for me to win, have to see my opponents lose 2 1
Which of these games fit the description? Two-player, zero-sum, discrete, finite, deterministic games of perfect information Which of these games fit the description? Two-player, zero-sum, discrete, finite, deterministic games of perfect information 2
What makes games hard? Hard to solve eg. Chess has a search tree with about 10 40 distinct nodes Need to make a decision even though you can t calculate the optimal decision Need to make a decision with time limits 5 Formal Definition of a Game quintuplet (S,, Succ(), T, U): S Finite set of states. States include information on which player s turn it is to move. nitial board position and which player is first to move Succ() T U Takes a current state and returns a list of (move,state) pairs, each indicating a legal move and the resulting state Terminal test which determines when the game ends. Terminal states: subset of S in where the game has ended Utility function (aka objective function or payoff function): maps from terminal state to real number 6 3
Challenge 7 Nim Many different variations. We ll do this one. Start with 9 beaver er logos n one player s turn, that player can remove 1, 2 or 3 beaver logos The person who takes the last beaver logo wins 8 4
Formal Definition of Nim Notation: () quintuplet (S,, Succ(), T, U): Who s move # matches left S (), (), (), () (), (), (), () () Succ() Succ(()) = {(),(),()} Succ(()) = {(),(),()} Succ(()) = {(),()} Succ(()) = {(),()} Succ(()) = {()} Succ(()) = {()} T (), (), (), (), (), () U Utility(() or () or ()) =, Utility(() or () or ()) = 9 Nim 10 5
Nim Game Tree We ll call the players and, with starting first How to Use a Game Tree wants to maximize his utility wants to minimize s utility s strategy must take into account what does since they alternate moves move by or is called a ply 12 6
The imax Value of a Node The minimax value of a node is the utility for MX of being in the corresponding state, assuming that both players play optimally from there to the end of the game MNMX- VLUE( n) UTLTY(n) f n is a terminal state max s Successors( n) MNMX- VLUE( s) min s Successors( n) MNMX- VLUE( s) f n is a MX node f n is a MN node imax value maximizes worst-case outcome for MX Nim Game Tree 14 7
imax Values in Nim Game Tree 15 imax Values in Nim Game Tree 16 8
imax Values in Nim Game Tree 17 imax Values in Nim Game Tree 1 1 18 9
imax Values in Nim Game Tree 1 1 19 imax Values in Nim Game Tree 1 1 imax decision at the root: taking this action results in the successor with highest minimax value 20 10
MX nother Example = imizing player = imizing player MN B C D 3 12 8 2 4 6 14 5 2 21 nother Example MX MN 3 B 2 C 2 D 3 12 8 2 4 6 14 5 2 22 11
nother Example MX 3 MN 3 B 2 C 2 D 3 12 8 2 4 6 14 5 2 23 The MNMX lgorithm function MNMX-DECSON(state) returns an action inputs: state, current state in game v MX-VLUE(state) return the action in SUCCESSORS(state) with value v function MX-VLUE(state) returns a utility value if TERMNL-TEST(state) then return UTLTY(state) v - nfinity for a, s in SUCCESSORS(state) do v MX(v, MN-VLUE(s)) return v function MN-VLUE(state) returns a utility value if TERMNL-TEST(state) then return UTLTY(state) v nfinity for a, s in SUCCESSORS(state) do v MN(v, MX-VLUE(s)) return v 24 12
The MNMX algorithm Computes minimax decision from the current state Depth-first exploration of the game tree Time Complexity O(b m ) where b=# of legal moves, m=maximum depth of tree Space Complexity: O(bm) if all successors generated at once O(m) if only one successor generated at a time (each partially expanded node remembers which successor to generate next) 25 imax With 3 Players B C (1,2,6) (4,2,3) (6,1,2) (7,4,1) (5,1,1) (1,5,2) (7,7,1) (5,4,5) Now have a vector of utilities for players (,B,C). ll players maximize their utilities. Note: n two-player, zero-sum games, we have a single value 26 because the values are always opposite. 13
imax With 3 Players B C (1,2,6) (6,1,2) (1,5,2) (5,4,5) (1,2,6) (4,2,3) (6,1,2) (7,4,1) (5,1,1) (1,5,2) (7,7,1) (5,4,5) 27 imax With 3 Players B (1,2,6) (1,5,2) C (1,2,6) (6,1,2) (1,5,2) (5,4,5) (1,2,6) (4,2,3) (6,1,2) (7,4,1) (5,1,1) (1,5,2) (7,7,1) (5,4,5) 28 14
imax With 3 Players (1,2,6) B (1,2,6) (1,5,2) C (1,2,6) (6,1,2) (1,5,2) (5,4,5) (1,2,6) (4,2,3) (6,1,2) (7,4,1) (5,1,1) (1,5,2) (7,7,1) (5,4,5) 29 Subtleties With Multiplayer Games lliances can be made and broken For example, if and B are weaker than C, they can gang up on C But and B can turn on each other once C is weakened But society considers the player that breaks the alliance to be dishonorable 30 15
Pruning Can we improve on the time complexity of O(b m )? Yes if we prune away branches that cannot possibly influence the final decision 31 Pruning in Nim 1 1 f we know that the only two outcomes are and, what branches do we not need to explore when minimax backtracks? 16
Pruning in Nim 1 1 f we know that the only two outcomes are and, what branches do we not need to explore when minimax backtracks? Pruning in Nim 1 1 What happens if we have more than just two outcomes? 34 17
Pruning ntuition (General Case) MX MN 5 1 The max player will never choose the right subtree once it knows that it is upper bounded by 1 5 10 1 Suppose we just went down this branch. We know that the minimax value of its parent will be 1 35 Pruning Example MX MN B C D 3 12 8 2 x y 14 5 2 MNMX-VLUE(root) = max(min(3,12,8),min(2,x,y),min(14,5,2)) min(2 x min(14 2)) = max(3,min(2,x,y),2) = max(3,z,2) where z 2 = 3 36 18
Pruning ntuition Remember that minimax search is DFS. t any one time, we only have to consider the nodes along a single path in the tree n general, let: = highest minimax value of all of the MX player s choices expanded on current path = lowest minimax value of all of the MN player s choices expanded on current path f at a MN player node, prune if minimax value of node f at a MX player node, prune if minimax value of node v 4 37 LPH-BET Pseudocode function LPH-BET-SERCH(state) returns an action inputs: state, current state in game v MX-VLUE(state, -, + ) return the action in SUCCESSORS(state) with value v function MX-VLUE(state,, ) returns a utility value inputs: state, current state in game, the value of the best alternative for MX along the path to state, the value of the best alternative for MN along the path to state if TERMNL-TEST(state) then return UTLTY(state) v - for a, s in SUCCESSORS(state) do v MX(v, MN-VLUE(s,, )) if v then return v MX(, v) return v 38 19
LPH-BET Pseudocode function MN-VLUE(state,, ) returns a utility value inputs: state, current state in game, the value of the best alternative for MX along the path to state, the value of the best alternative for MN along the path to state if TERMNL-TEST(state) then return UTLTY(state) v + for a, s in SUCCESSORS(state) do v MN(v, MX-VLUE(s,, )) if v then return v MN(, v) return v 39 llustrating the Pseudocode n the example to follow, the notation (-, + ) represents the (, ) values for the corresponding node This example is intended to illustrate how the actual implementation of lpha-beta pruning works = imizing player (-, + ) = imizing player B C D 40 20
lpha-beta Pruning Example a) (-, + ) b) (-, + ) (-, + ) B C D (-, 3) B C D 3 c) d) (-, + ) (-, + ) (-, 3) B C D (-, 3) B C D 3 12 3 12 8 lpha-beta Pruning Example e) (3, + ) f) (3, + ) B C D B C D (3, + ) 3 12 8 3 12 8 g) h) (3, + ) (3, + ) B C D (3, + ) B C D 3 12 8 2 3 12 8 2 Pruning happens: 2 ( =3) 21
lpha-beta Pruning Example i) (3, + ) j) (3, + ) B C (3, + ) D B C (3, 14) D 3 12 8 2 3 12 8 2 14 k) l) (3, + ) (3, + ) B C (3, 5) D B C D 3 12 8 2 14 5 3 12 8 2 14 5 2 Pruning happens: 2 ( =3) but not much is pruned since we re at the bottom Effectiveness of lpha-beta Depends on order of successors Best case: lpha-beta reduces complexity from O(b m ) for minimax to O(b m/2 ) This means lpha-beta can lookahead about twice as far as minimax in the same amount of time 44 22
mplementation Details n games we have the problem of transposition Transposition means different permutations of the move sequence that end up in the same position Results in lots of repeated states Use a transposition table to remember the states you ve seen (similar to closed list) 45 What you should know Be able to draw up a game tree Know how the imax algorithm works Know how the lpha-beta algorithm works Be able to do both algorithms by hand 46 23