Game Playing: Adversarial Search. Chapter 5

Similar documents
Game playing. Chapter 6. Chapter 6 1

Games vs. search problems. Game playing Chapter 6. Outline. Game tree (2-player, deterministic, turns) Types of games. Minimax

CS 380: ARTIFICIAL INTELLIGENCE

Game playing. Chapter 6. Chapter 6 1

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón

Game playing. Outline

Game Playing. Philipp Koehn. 29 September 2015

ADVERSARIAL SEARCH. Chapter 5

Outline. Game playing. Types of games. Games vs. search problems. Minimax. Game tree (2-player, deterministic, turns) Games

Games vs. search problems. Adversarial Search. Types of games. Outline

Lecture 5: Game Playing (Adversarial Search)

Game Playing. Dr. Richard J. Povinelli. Page 1. rev 1.1, 9/14/2003

Game playing. Chapter 5. Chapter 5 1

Game playing. Chapter 5, Sections 1 6

Today. Nondeterministic games: backgammon. Algorithm for nondeterministic games. Nondeterministic games in general. See Russell and Norvig, chapter 6

Game playing. Chapter 5, Sections 1{5. AIMA Slides cstuart Russell and Peter Norvig, 1998 Chapter 5, Sections 1{5 1

Artificial Intelligence, CS, Nanjing University Spring, 2018, Yang Yu. Lecture 4: Search 3.

Adversarial search (game playing)

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1

Artificial Intelligence. Topic 5. Game playing

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Spring Game Playing in Practice

Ch.4 AI and Games. Hantao Zhang. The University of Iowa Department of Computer Science. hzhang/c145

Programming Project 1: Pacman (Due )

Adversarial Search Lecture 7

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

Adversarial Search (a.k.a. Game Playing)

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

Game Playing. Why do AI researchers study game playing? 1. It s a good reasoning problem, formal and nontrivial.

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

CS 331: Artificial Intelligence Adversarial Search II. Outline

Adversarial Search and Game Playing

Today. Types of Game. Games and Search 1/18/2010. COMP210: Artificial Intelligence. Lecture 10. Game playing

COMP219: Artificial Intelligence. Lecture 13: Game Playing

Game-Playing & Adversarial Search

CSE 473: Artificial Intelligence. Outline

CS 188: Artificial Intelligence Spring Announcements

Game Playing State-of-the-Art

CS 188: Artificial Intelligence

Adversarial Search. Chapter 5. Mausam (Based on slides of Stuart Russell, Andrew Parks, Henry Kautz, Linda Shapiro) 1

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Adversarial Search. CMPSCI 383 September 29, 2011

Adversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

CS 5522: Artificial Intelligence II

Artificial Intelligence

CS 771 Artificial Intelligence. Adversarial Search

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Artificial Intelligence

School of EECS Washington State University. Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

Artificial Intelligence 1: game playing

Ar#ficial)Intelligence!!

Artificial Intelligence Adversarial Search

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

CS 188: Artificial Intelligence Spring 2007

Intuition Mini-Max 2

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

Adversarial Search and Game Playing. Russell and Norvig: Chapter 5

CS 188: Artificial Intelligence. Overview

Game Playing State of the Art

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

CSE 573: Artificial Intelligence Autumn 2010

Artificial Intelligence

CSE 40171: Artificial Intelligence. Adversarial Search: Games and Optimality

CSE 573: Artificial Intelligence

CITS3001. Algorithms, Agents and Artificial Intelligence. Semester 2, 2016 Tim French

Game-playing: DeepBlue and AlphaGo

Games and Adversarial Search

Foundations of Artificial Intelligence

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

CS 4700: Foundations of Artificial Intelligence

Local Search. Hill Climbing. Hill Climbing Diagram. Simulated Annealing. Simulated Annealing. Introduction to Artificial Intelligence

Adversarial Search. Chapter 5. Mausam (Based on slides of Stuart Russell, Andrew Parks, Henry Kautz, Linda Shapiro, Diane Cook) 1

CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Artificial Intelligence. Minimax and alpha-beta pruning

Foundations of Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Fall Local Search. Hill Climbing. Simulated Annealing. Hill Climbing Diagram

Adversarial Search 1

Adversarial Search Aka Games

Adversarial Search (Game Playing)

Game Playing AI Class 8 Ch , 5.4.1, 5.5

Game-Playing & Adversarial Search Alpha-Beta Pruning, etc.

Artificial Intelligence

CSE 473: Artificial Intelligence Autumn 2011

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

ARTIFICIAL INTELLIGENCE (CS 370D)

Games and Adversarial Search II

CSE 473: Ar+ficial Intelligence

Pengju

Contents. Foundations of Artificial Intelligence. Problems. Why Board Games?

DIT411/TIN175, Artificial Intelligence. Peter Ljunglöf. 2 February, 2018

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5

CSE 40171: Artificial Intelligence. Adversarial Search: Game Trees, Alpha-Beta Pruning; Imperfect Decisions

Transcription:

Game Playing: Adversarial Search Chapter 5

Outline Games Perfect play minimax search α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information

Games vs. Search Problems In games we have: Unpredictable opponent solution is a strategy, specifying a move for every possible opponent reply Time limits: Unlikely to find goal; do the best that you can.

Games vs. Search Problems In games we have: Unpredictable opponent solution is a strategy, specifying a move for every possible opponent reply Time limits: Unlikely to find goal; do the best that you can. Game playing goes back a long way: Computer considers possible lines of play (Babbage, 1846) Algorithm for perfect play (Zermelo, 1912; Von Neumann, 1944) Finite horizon, approx. evaluation (Zuse, 1945; Wiener, 1948; Shannon, 1950) First chess program (Turing, 1951) Machine learning to improve evaluation (Samuel, 1952 57) Pruning to allow deeper search (McCarthy, 1956)

Types of Games perfect information imperfect information deterministic chance

Types of Games deterministic chance perfect information chess, checkers, backgammon go, othello, monopoly imperfect information battleships, bridge, poker, scrabble, blind tictactoe poker, war

Two-Player Games Two players, MAX and MIN, who take turns playing.

Two-Player Games Two players, MAX and MIN, who take turns playing. Main game components: Initial state: Initial game position.

Two-Player Games Two players, MAX and MIN, who take turns playing. Main game components: Initial state: Initial game position. Actions: The set of legal moves

Two-Player Games Two players, MAX and MIN, who take turns playing. Main game components: Initial state: Initial game position. Actions: The set of legal moves Transition function: Returns a list of legal moves and the resulting state

Two-Player Games Two players, MAX and MIN, who take turns playing. Main game components: Initial state: Initial game position. Actions: The set of legal moves Transition function: Returns a list of legal moves and the resulting state Terminal test: Determines when the game is over.

Two-Player Games Two players, MAX and MIN, who take turns playing. Main game components: Initial state: Initial game position. Actions: The set of legal moves Transition function: Returns a list of legal moves and the resulting state Terminal test: Determines when the game is over. Utility function: Value of a terminal state. Also called a objective or payoff function Generally we ll deal with zero-sum games. Later we ll talk about a static evaluation function, which gives a value to every game state.

Game Tree (2-player, deterministic, turns) MAX (X) MIN (O) X X X X X X X X X MAX (X) X O X O X O... MIN (O) X O X X O X X O X............... TERMINAL Utility X O X X O X X O X O X O O X X O X X O X O O 1 0 +1...

Minimax Perfect play for deterministic, perfect-information games Idea: choose move to position with highest minimax value = best achievable payoff against best play E.g., 2-ply game: MAX 3 A 1 A 2 A 3 MIN 3 2 2 A 11 A 12 A 13 A 21 A 22 A 23 A 31 A 32 A 33 3 12 8 2 4 6 14 5 2

Minimax Value MinimaxValue(n) = Utility(n) max s Successors(n) MinimaxValue(s) min s Successors(n) MinimaxValue(s) if n is a terminal node if n is a MAX node if n is a MIN node

Minimax Algorithm Function Minimax-Decision(state) returns an action inputs: state current state in game return a Actions(state) maximizing Min-Value(Result(a, state)) Function Max-Value(state) returns a utility value if Terminal-Test(state) then return Utility(state) v for s in Successors(state) do v Max(v, Min-Value(s)) return v Function Min-Value(state) returns a utility value if Terminal-Test(state) then return Utility(state) v for s in Successors(state) do v Min(v, Max-Value(s)) return v

Complete:?? Properties of Minimax

Properties of Minimax Complete: Yes, if tree is finite. (Chess has specific rules for this). Optimal:??

Properties of Minimax Complete: Yes, if tree is finite. Optimal: Yes, against a rational opponent. Otherwise?? Time complexity:??

Properties of Minimax Complete: Yes, if tree is finite. Optimal: Yes, against an optimal opponent. Otherwise?? Time complexity: O(b m ) Space complexity:??

Properties of Minimax Complete: Yes, if tree is finite. Optimal: Yes, against an optimal opponent. Otherwise?? Time complexity: O(b m ) Space complexity: O(bm) (depth-first exploration)

Complete: Yes, if tree is finite. Properties of Minimax Optimal: Yes, against an optimal opponent. Otherwise?? Time complexity: O(b m ) Space complexity: O(bm) (depth-first exploration) For chess, b 35, m 100 for reasonable games Exact solution is completely infeasible But do we need to explore every path?

Game tree search is inherently exponential α β Pruning However we can speed things up by pruning parts of the search space that are guaranteed to be inferior. α β pruning returns the same move as minimax, but prunes branches that can t affect the final outcome.

α β Pruning Example MAX 3 MIN 3 3 12 8

α β Pruning Example MAX 3 MIN 3 2 3 12 8 2 X X

α β Pruning Example MAX 3 MIN 3 2 14 3 12 8 2 X X 14

α β Pruning Example MAX 3 MIN 3 2 14 5 3 12 8 2 X X 14 5

α β Pruning Example MAX 3 3 MIN 3 2 14 5 2 3 12 8 2 X X 14 5 2

The General Case MAX MIN...... MAX MIN V α is the best value (to max) found so far. If V is worse than α, max will avoid it. So this node won t be reached in play. So prune that branch Define β similarly for min

The General Case α is the value of the best (i.e. maximum) choice we have found so far for MAX. β is the value of the best (i.e. minimum) choice we have found so far for MIN. α β search updates the values of α and β as it progresses. Note: It prunes branches at a node if they are known to be worse than the current α (for MAX) or β (for MIN) values. The α values of MAX nodes can never decrease. The β values of MIN nodes can never increase.

Observe: α β Search Search can be discontinued below any MAX node where that node has α value the β value of any of its MIN ancestors. The final value of this MAX node can then be set to its α value. Search can be discontinued below any MIN node where that node has β value the α value of any of its MAX ancestors. The final value of this MIN node can then be set to its β value. Main point (again): The α value of a MAX node = the current largest final value of its successors. The β value of a MIN node = the current smallest final value of its successors.

The α β Algorithm Function Alpha-Beta-Decision(state) returns an action v Max-Value(state,, ) return the a in Actions(state) with value v

The α β Algorithm Function Max-Value(state, α, β) returns a utility value inputs: state current state in game α, the value of the best alternative for max along the path to state β, the value of the best alternative for min along the path to state

The α β Algorithm Function Max-Value(state, α, β) returns a utility value inputs: state current state in game α, the value of the best alternative for max along the path to state β, the value of the best alternative for min along the path to state if Terminal-Test(state) then return Utility(state) v for s in Successors(state) do v Max(v, Min-Value(s, α, β)) if v β then return v /* discontinue since Min can do better elsewhere */ α Max(α, v) return v Function Min-Value(state, α, β) returns a utility value same as Max-Value but with roles of α, β reversed This is slightly simpler than the algorithm in the 3 rd ed.

Pruning does not affect final result Properties of α β Good move ordering improves effectiveness of pruning With perfect ordering, time complexity = O(b m/2 ) doubles solvable depth

Pruning does not affect final result Properties of α β Good move ordering improves effectiveness of pruning With perfect ordering, time complexity = O(b m/2 ) doubles solvable depth Q: What if you reverse a perfect ordering?

Pruning does not affect final result Properties of α β Good move ordering improves effectiveness of pruning With perfect ordering, time complexity = O(b m/2 ) doubles solvable depth Q: What if you reverse a perfect ordering? A simple example of the value of reasoning about which computations are relevant (a form of metareasoning) Unfortunately, for chess, 35 50 is still impossible!

Most games cannot be exhaustively searched. Resource Limits Usually have to terminate search before hitting a goal state. Standard approach: Use Cutoff-Test instead of Terminal-Test e.g., depth limit Use Eval instead of Utility/Goal-Test i.e., evaluation function that estimates desirability of position Suppose we have 100 seconds, explore 10 4 nodes/second 10 6 nodes per move 35 8/2 α β reaches depth 8 pretty good chess program (if we have a good static evaluation function).

Evaluation Functions Black to move White slightly better White to move Black winning For chess, typically linear weighted sum of features Eval(s) = w 1 f 1 (s) + w 2 f 2 (s) +... + w n f n (s) e.g., w 1 = 9 with f 1 (s) = (# of white queens) (# of black queens), etc.

Evaluation Functions: Issues Quiescence vs. non-quiescence Search to a quiescent area (i.e. where the static evaluation function doesn t change much between moves). Or (pretty much the same thing): If the static evaluation function changes radically between moves, keep searcing. Horizon effect Problem if there is an unavoidable loss that can be pushed beyond the cutoff by other moves.

Digression: Exact Values Don t Matter MAX MIN 1 2 1 20 1 2 2 4 1 20 20 400 Behaviour is preserved under any monotonic transformation of Eval Only the order matters: payoff in deterministic games acts as an ordinal utility function

Deterministic Games in Practice: Checkers Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used an endgame database giving perfect play for all positions with 8 pieces on the board, a total of 443,748,401,247 positions. Now totally solved (by computer)

Deterministic Games in Practice: Chess Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue searched 200 million positions per second, used very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply.

Deterministic Games in Practice: Othello Human champions refuse to compete against computers, which are too good. Makes a good AI assignment!

Deterministic Games in Practice: Go Until recently, human champions refused to compete against computers, which were too bad.

Deterministic Games in Practice: Go Until recently, human champions refused to compete against computers, which were too bad. In chess, there are something around 10 40 positions, in Go there are 10 170 positions.

Deterministic Games in Practice: Go Until recently, human champions refused to compete against computers, which were too bad. In chess, there are something around 10 40 positions, in Go there are 10 170 positions. Go was considered hard because the search space is staggering and it was extremely difficult to evaluate a board position.

Deterministic Games in Practice: Go Until recently, human champions refused to compete against computers, which were too bad. In chess, there are something around 10 40 positions, in Go there are 10 170 positions. Go was considered hard because the search space is staggering and it was extremely difficult to evaluate a board position. However, in March 2016, AlphaGo beat Lee Sedol (winner of 18 world titles) 4 games to 1 AlphaGo combines learning via neural networks, along with Monte Carlo tree search.

Deep Blue Handcrafted chess knowledge Deterministic Games in Practice: DeepBlue vs. AlphaGo Alpha-beta search guided by heuristic evaluation function 200 million positions / second AlphaGo Knowledge learned from expert games and self-play Monte-Carlo search guided by policy and value networks 60,000 positions / second

Deep Blue Handcrafted chess knowledge Deterministic Games in Practice: DeepBlue vs. AlphaGo Alpha-beta search guided by heuristic evaluation function 200 million positions / second AlphaGo Knowledge learned from expert games and self-play Monte-Carlo search guided by policy and value networks 60,000 positions / second Q: Which seems the more human-like?

Nondeterministic Games: Backgammon 0 1 2 3 4 5 6 7 8 9 10 11 12 25 24 23 22 21 20 19 18 17 16 15 14 13

Nondeterministic Games in General In nondeterministic games, chance is introduced by dice, card-shuffling, etc. Simplified example with coin-flipping: MAX CHANCE 3 1 0.5 0.5 0.5 0.5 MIN 2 4 0 2 2 4 7 4 6 0 5 2

ExpectiMinimax Value ExpectiMinimaxValue(n) = Utility(n) if n is a terminal node max s Successors(n) ExpectiMinimaxValue(s) if n is a MAX node min s Successors(n) ExpectiMinimaxValue(s) if n is a MIN node Σ s Successors(n) P(s).ExpectiMinimaxValue(s) if n is a chance node

Algorithm for Nondeterministic Games Expectiminimax gives perfect play

Algorithm for Nondeterministic Games Expectiminimax gives perfect play Given the chance nodes, MAX may not get the best outcome. But MAX s move gives the best expected outcome.

Algorithm for Nondeterministic Games Expectiminimax gives perfect play Given the chance nodes, MAX may not get the best outcome. But MAX s move gives the best expected outcome. Algorithm is just like Minimax, except we must also handle chance nodes:... if state is a Max node then return the highest ExpectiMinimax-Value of Successors(state) if state is a Min node then return the lowest ExpectiMinimax-Value of Successors(state) if state is a chance node then return average of ExpectiMinimax-Value of Successors(state)...

Nondeterministic Games in Practice Dice rolls increase b: 21 possible rolls with 2 dice Backgammon 20 legal moves (can be 6,000 with 1-1 roll) depth 4 = 20 (21 20) 3 1.2 10 9 As depth increases, probability of reaching a given node shrinks value of lookahead is diminished α β pruning is much less effective TDGammon uses depth-2 search + very good Eval world-champion level

Digression: Exact Values DO Matter MAX DICE 2.1 1.3.9.1.9.1 21 40.9.9.1.9.1 MIN 2 3 1 4 20 30 1 400 2 2 3 3 1 1 4 4 20 20 30 30 1 1 400 400 Behaviour is preserved only by positive linear transformation of Eval Hence Eval should be proportional to the expected payoff

Games of Imperfect Information E.g., card games, where opponent s initial cards are unknown Typically we can calculate a probability for each possible deal Seems just like having one big dice roll at the beginning of the game Idea: Compute the minimax value of each action in each deal, then choose the action with highest expected value over all deals Special case: If an action is optimal for all deals, it s optimal. GIB, current best bridge program, approximates this idea by 1. generating 100 deals consistent with bidding information 2. picking the action that wins most tricks on average but in fact this doesn t quite work out (as discussed next)

Example Four-card bridge/whist/hearts hand, Max to play first 6 6 8 7 8 6 6 7 6 6 7 6 6 7 6 6 7 4 2 9 3 4 2 9 3 9 4 2 3 2 4 3 4 3 0

Example Four-card bridge/whist/hearts hand, Max to play first MAX MIN 6 6 8 7 8 6 6 7 6 6 7 6 6 7 6 6 7 4 2 9 3 4 2 9 3 9 4 2 3 2 4 3 4 3 0 MAX MIN 6 6 8 7 8 6 6 7 6 6 7 6 6 7 6 6 7 4 2 9 3 4 2 9 3 9 4 2 3 2 4 3 4 3 0

Example Four-card bridge/whist/hearts hand, Max to play first MAX MIN 6 6 8 7 8 6 6 7 6 6 7 6 6 7 6 6 7 4 2 9 3 4 2 9 3 9 4 2 3 2 4 3 4 3 0 MAX MIN 6 6 8 7 8 6 6 7 6 6 7 6 6 7 6 6 7 4 2 9 3 4 2 9 3 9 4 2 3 2 4 3 4 3 0 MAX 6 6 8 7 8 6 6 7 6 6 7 6 6 7 6 4 6 7 3 0.5 MIN 4 2 9 3 4 2 9 3 9 4 2 3 2 4 3 6 6 4 7 3 0.5

Commonsense Example 1. Road A leads to a small heap of gold pieces Road B leads to a fork: take the left fork and you ll find a mound of jewels; take the right fork and you ll be run over by a bus.

Commonsense Example 1. Road A leads to a small heap of gold pieces Road B leads to a fork: take the left fork and you ll find a mound of jewels; take the right fork and you ll be run over by a bus. 2. Road A leads to a small heap of gold pieces Road B leads to a fork: take the left fork and you ll be run over by a bus; take the right fork and you ll find a mound of jewels.

Commonsense Example 1. Road A leads to a small heap of gold pieces Road B leads to a fork: take the left fork and you ll find a mound of jewels; take the right fork and you ll be run over by a bus. 2. Road A leads to a small heap of gold pieces Road B leads to a fork: take the left fork and you ll be run over by a bus; take the right fork and you ll find a mound of jewels. 3. Road A leads to a small heap of gold pieces Road B leads to a fork: guess correctly and you ll find a mound of jewels; guess incorrectly and you ll be run over by a bus.

Proper Analysis The intuition that the value of an action is the average of its values in all actual states is WRONG With partial observability, value of an action depends on the information state or belief state that the agent is in. Can generate and search a tree of information states Leads to rational behaviors such as Acting to obtain information Signalling to one s partner Acting randomly to minimize information disclosure

Games are fun to work on! They illustrate several important points about AI Summary perfection is unattainable must approximate good idea to think about what to think about uncertainty constrains the assignment of values to states optimal decisions depend on information state, not real state