Adversarial Search and Game Playing. Russell and Norvig: Chapter 5

Adversarial Search and Game Playing Russell and Norvig: Chapter 5

Typical case 2-person game Players alternate moves Zero-sum: one player s loss is the other s gain Perfect information: both players have access to complete information about the state of the game. No information is hidden from either player. No chance (e.g., using dice) involved Examples: Tic-Tac-Toe, Checkers, Chess, Go, Nim, Othello Not: Bridge, Solitaire, Backgammon,...

Nim Game: start with several piles of tokens At each turn, a player choose one of the piles, and may pick up one or two tokens from the pile The player who picks up the last token LOSES

NIM(2,2) Try the following case: pile 1 pile 2

Game Search Formulation Two players MAX and MIN take turns (with MAX playing first) State space Initial state Successor function Terminal test Utility function, that tells whether a terminal state is a win (for MAX), a loss, or a draw MAX uses search tree to determine next move.

Game Tree for NIM(2,2)

Optimal strategies Find the contingent strategy for MAX assuming an infallible MIN opponent. Assumption: Both players play optimally!! Given a game tree, the optimal strategy can be determined by using the minimax value of each node: MINIMAX-VALUE(n)= UTILITY(n) max s successors(n) MINIMAX-VALUE(s) min s successors(n) MINIMAX-VALUE(s) If n is a terminal If n is a max node If n is a min node

Optimal Play 2 2 1 2 7 1 8 2 2 1 2 7 1 8 2 7 1 8 This is the optimal play MAX MIN 2 7 1 8 2 7 1 8

Minimax Tree MAX node MIN node f value value computed by minimax

Two-Ply Game Tree

Two-Ply Game Tree The minimax decision Minimax maximizes the worst-case outcome for max.

Min-Max Game Tree for NIM(2,2)

What if MIN does not play optimally? Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX. But if MIN does not play optimally, MAX can do even better.

Properties of Minimax Criterion Minimax

Properties of Minimax Criterion Minimax Complete? yes, in theory Optimal? yes, Time O(b m ), Space O(bm),

Problem of minimax search Number of games states is exponential to the number of moves. Solution: Do not examine every node ==> Alpha-beta pruning Remove branches that do not influence final decision

Alpha-beta pruning MIN MAX MAX We can improve on the performance of the minimax algorithm through alpha-beta pruning Basic idea: If you have an idea that is surely bad, don't take the time to see how truly awful it is. -- Pat Winston =2 >=2 <=1 We don t need to compute the value at this node. No matter what it is, it can t affect the value of the root node. 2 7 1?

Alpha-Beta Example Do DF-search until first leaf Range of possible values [-,+ ] [-, + ]

Alpha-Beta Example (continued) [-,+ ] [-,3]

Alpha-Beta Example (continued) [3,+ ] [3,3]

Alpha-Beta Example (continued) [3,+ ] This node is worse for MAX [3,3] [-,2]

Alpha-Beta Example (continued) [3,14], [3,3] [-,2] [-,14]

Alpha-Beta Example (continued) [3,5], [3,3] [,2] [-,5]

Alpha-Beta Example (continued) [3,3] [3,3] [,2] [2,2]

Alpha-Beta Example (continued) [3,3] [3,3] [-,2] [2,2]

General alpha-beta pruning Consider a node n somewhere in the tree If player has a better choice at Parent node of n Or any choice point further up n will never be reached in actual play. Hence when enough is known about n, it can be pruned.

Imperfect Real-time Decisions In general the search tree is too big to make it possible to reach the terminal states! Examples: Checkers: ~10 40 nodes Chess: ~10 120 nodes SHANNON (1950): Cut off search earlier (replace TERMINAL-TEST by CUTOFF-TEST) Apply heuristic evaluation function EVAL (replacing utility function of alpha-beta)

Cutting off search Change: if TERMINAL-TEST(state) then return UTILITY(state) into if CUTOFF-TEST(state,depth) then return EVAL(state) Introduces a fixed-depth limit depth Is selected so that the amount of time will not exceed what the rules of the game allow. When cuttoff occurs, the evaluation is performed.

Evaluation function Evaluation function or static evaluator is used to evaluate the goodness of a game position. Contrast with heuristic search where the evaluation function was a non-negative estimate of the cost from the start node to a goal and passing through the given node The zero-sum assumption allows us to use a single evaluation function to describe the goodness of a board with respect to both players. f(n) >> 0: position n good for me and bad for you f(n) << 0: position n bad for me and good for you f(n) near 0: position n is a neutral position f(n) = +infinity: win for me f(n) = -infinity: win for you

Evaluation function examples Example of an evaluation function for Tic-Tac-Toe: f(n) = [# of 3-lengths open for me] - [# of 3-lengths open for you] where a 3-length is a complete row, column, or diagonal Alan Turing s function for chess f(n) = w(n)/b(n) where w(n) = sum of the point value of white s pieces and b(n) = sum of black s Most evaluation functions are specified as a weighted sum of position features: f(n) = w 1 *feat 1 (n) + w 2 *feat 2 (n) +... + w n *feat k (n) Example features for chess are piece count, piece placement, squares controlled, etc. Deep Blue has about 6000 features in its evaluation function

Issues Choice of the horizon Size of memory needed Number of nodes examined

Games that include chance White has rolled (6,5) Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

Games that include chance chance nodes Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16) [1,1], [6,6] chance, all other chance

Games that include chance chance nodes Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16) [1,1], [6,6] chance 1/36, all other chance 1/18

Games that include chance [1,1], [6,6] chance 1/36, all other chance 1/18 Can not calculate definite minimax value, only expected value

Expected minimax value EXPECTED-MINIMAX-VALUE(n)= UTILITY(n) If n is a terminal max s successors(n) MINIMAX-VALUE(s) If n is a max node min s successors(n) MINIMAX-VALUE(s) If n is a max node s successors(n) P(s). EXPECTEDMINIMAX(s) If n is a chance node These equations can be backed-up recursively all the way to the root of the game tree.

Position evaluation with chance nodes Left, A1 wins Right A2 wins Outcome of evaluation function may not change when values are scaled differently. Behavior is preserved only by a positive linear transformation of EVAL.

Checkers Jonathan Schaeffer

Backgammon branching factor several hundred TD-Gammon v1 1-step lookahead, learns to play games against itself TD-Gammon v2.1 2-ply search, does well against world champions TD-Gammon has changed the way experts play backgammon.

Reversi/Othello Jonathan Schaeffer