CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017

Similar documents
Games and Adversarial Search

Game Playing. Philipp Koehn. 29 September 2015

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

Programming Project 1: Pacman (Due )

Game playing. Chapter 6. Chapter 6 1

ADVERSARIAL SEARCH. Chapter 5

Games vs. search problems. Game playing Chapter 6. Outline. Game tree (2-player, deterministic, turns) Types of games. Minimax

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Game playing. Chapter 6. Chapter 6 1

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón

Game Playing State-of-the-Art

Artificial Intelligence

Outline. Game playing. Types of games. Games vs. search problems. Minimax. Game tree (2-player, deterministic, turns) Games

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

CS 380: ARTIFICIAL INTELLIGENCE

Artificial Intelligence

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

Game playing. Outline

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

CS 5522: Artificial Intelligence II

Adversarial search (game playing)

CSE 473: Artificial Intelligence. Outline

CS 188: Artificial Intelligence

Game playing. Chapter 5. Chapter 5 1

CS 188: Artificial Intelligence

Games vs. search problems. Adversarial Search. Types of games. Outline

Adversarial Search. CMPSCI 383 September 29, 2011

Lecture 5: Game Playing (Adversarial Search)

Game playing. Chapter 5, Sections 1 6

Adversarial Search and Game Playing

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

Game Playing. Dr. Richard J. Povinelli. Page 1. rev 1.1, 9/14/2003

Game Playing: Adversarial Search. Chapter 5

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

Game Engineering CS F-24 Board / Strategy Games

Adversarial Search. Robert Platt Northeastern University. Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA

Artificial Intelligence

Adversarial Search Lecture 7

Game Playing. Why do AI researchers study game playing? 1. It s a good reasoning problem, formal and nontrivial.

Artificial Intelligence. Topic 5. Game playing

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1

Data Structures and Algorithms

CPS 570: Artificial Intelligence Two-player, zero-sum, perfect-information Games

Adversarial Search. Chapter 5. Mausam (Based on slides of Stuart Russell, Andrew Parks, Henry Kautz, Linda Shapiro) 1

Artificial Intelligence Adversarial Search

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Adversarial Search 1

Game-Playing & Adversarial Search

CS 331: Artificial Intelligence Adversarial Search II. Outline

Ar#ficial)Intelligence!!

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

ADVERSARIAL SEARCH. Today. Reading. Goals. AIMA Chapter , 5.7,5.8

School of EECS Washington State University. Artificial Intelligence

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search

CS 4700: Foundations of Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

More Adversarial Search

CSE 40171: Artificial Intelligence. Adversarial Search: Game Trees, Alpha-Beta Pruning; Imperfect Decisions

ADVERSARIAL SEARCH. Today. Reading. Goals. AIMA Chapter Read , Skim 5.7

Artificial Intelligence. Minimax and alpha-beta pruning

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

CS885 Reinforcement Learning Lecture 13c: June 13, Adversarial Search [RusNor] Sec

CS 188: Artificial Intelligence Spring Announcements

Artificial Intelligence, CS, Nanjing University Spring, 2018, Yang Yu. Lecture 4: Search 3.

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

CSE 473: Artificial Intelligence Fall Outline. Types of Games. Deterministic Games. Previously: Single-Agent Trees. Previously: Value of a State

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1

CPS331 Lecture: Search in Games last revised 2/16/10

ARTIFICIAL INTELLIGENCE (CS 370D)

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

CS 188: Artificial Intelligence Spring Game Playing in Practice

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

Adversarial Search Aka Games

Adversarial Search. Chapter 5. Mausam (Based on slides of Stuart Russell, Andrew Parks, Henry Kautz, Linda Shapiro, Diane Cook) 1

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

CS 188: Artificial Intelligence Spring 2007

2/5/17 ADVERSARIAL SEARCH. Today. Introduce adversarial games Minimax as an optimal strategy Alpha-beta pruning Real-time decision making

Adversarial Search: Game Playing. Reading: Chapter

Adversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012

CS 771 Artificial Intelligence. Adversarial Search

CITS3001. Algorithms, Agents and Artificial Intelligence. Semester 2, 2016 Tim French

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

Solving Problems by Searching: Adversarial Search

Foundations of Artificial Intelligence

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Foundations of Artificial Intelligence

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

COMP219: Artificial Intelligence. Lecture 13: Game Playing

6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search

CSE 573: Artificial Intelligence

Game playing. Chapter 5, Sections 1{5. AIMA Slides cstuart Russell and Peter Norvig, 1998 Chapter 5, Sections 1{5 1

Game Playing State of the Art

Game-playing AIs: Games and Adversarial Search I AIMA

CSE 40171: Artificial Intelligence. Adversarial Search: Games and Optimality

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Artificial Intelligence 1: game playing

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5

Contents. Foundations of Artificial Intelligence. Problems. Why Board Games?

Pengju

Path Planning as Search

Transcription:

CS440/ECE448 Lecture 9: Minimax Search Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017

Why study games? Games are a traditional hallmark of intelligence Games are easy to formalize Games can be a good model of real-world competitive or cooperative activities Military confrontations, negotiation, auctions, etc.

Game AI: Origins Minimax algorithm: Ernst Zermelo, 1912 Chess playing with evaluation function, quiescence search, selective search: Claude Shannon, 1949 (paper) Alpha-beta search: John McCarthy, 1956 Checkers program that learns its own evaluation function by playing against itself: Arthur Samuel, 1956

Types of game environments Perfect information (fully observable) Imperfect information (partially observable) Deterministic Chess, checkers, go Battleship Stochastic Backgammon, monopoly Scrabble, poker, bridge

Zero-sum Games

Alternating two-player zero-sum games Players take turns Each game outcome or terminal state has a utility for each player (e.g., 1 for win, 0 for loss) The sum of both players utilities is a constant

Games vs. single-agent search We don t know how the opponent will act The solution is not a fixed sequence of actions from start state to goal state, but a strategy or policy (a mapping from state to best move in that state)

Game tree A game of tic-tac-toe between two players, max and min

http://xkcd.com/832/

A more abstract game tree Terminal utilities (for MAX) A two-ply game

Minimax Search

The rules of every game Every possible outcome has a value (or utility ) for me. Zero-sum game: if the value to me is +V, then the value to my opponent is V. Phrased another way: My rational action, on each move, is to choose a move that will maximize the value of the outcome My opponent s rational action is to choose a move that will minimize the value of the outcome Call me Max Call my opponent Min

Game tree search 3 3 2 2 Minimax value of a node: the utility (for MAX) of being in the corresponding state, assuming perfect play on both sides Minimax strategy: Choose the move that gives the best worst-case payoff

Computing the minimax value of a node 3 3 2 2 Minimax(node) = Utility(node) if node is terminal max action Minimax(Succ(node, action)) if player = MAX min action Minimax(Succ(node, action)) if player = MIN

Optimality of minimax The minimax strategy is optimal against an optimal opponent What if your opponent is suboptimal? Your utility will ALWAYS BE HIGHER than if you were playing an optimal opponent! A different strategy may work better for a sub-optimal opponent, but it will necessarily be worse against an optimal opponent 11 Example from D. Klein and P. Abbeel

More general games 4,3,2 4,3,2 1,5,2 4,3,2 7,4,1 1,5,2 7,7,1 More than two players, non-zero-sum Utilities are now tuples Each player maximizes their own utility at their node Utilities get propagated (backed up) from children to parents

Alpha-Beta Pruning

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 2

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 2 14

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 2 5

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 2 2

Alpha-Beta Pruning Key point that I find most counter-intuitive: MIN needs to calculate which move MAX will make. MAX would never choose a suboptimal move. So if MIN discovers that, at a particular node in the tree, she can make a move that s REALLY REALLY GOOD for her She can assume that MAX will never let her reach that node. and she can prune it away from the search, and never consider it again.

Alpha-beta pruning α is the value of the best choice for the MAX player found so far at any choice point above node n More precisely: α is the highest number that MAX knows how to force MIN to accept We want to compute the MIN-value at n As we loop over n s children, the MIN-value decreases If it drops below α, MAX will never choose n, so we can ignore n s remaining children

Alpha-beta pruning β is the value of the best choice for the MIN player found so far at any choice point above node n More precisely: β is the lowest number that MIN know how to force MAX to accept We want to compute the MAX-value at m As we loop over m s children, the MAX-value increases If it rises above β, MIN will never choose m, so we can ignore m s remaining children β m

Alpha-beta pruning An unexpected result: α is the highest number that MAX knows how to force MIN to accept β is the lowest number that MIN know how to force MAX to accept So β m

Alpha-beta pruning Function action = Alpha-Beta-Search(node) v = Min-Value(node,, ) return the action from node with value v node α: best alternative available to the Max player β: best alternative available to the Min player Function v = Min-Value(node, α, β) if Terminal(node) return Utility(node) v = + for each action from node v = Min(v, Max-Value(Succ(node, action), α, β)) if v α return v β = Min(β, v) end for return v action Succ(node, action)

Alpha-beta pruning Function action = Alpha-Beta-Search(node) v = Max-Value(node,, ) return the action from node with value v node α: best alternative available to the Max player β: best alternative available to the Min player Function v = Max-Value(node, α, β) if Terminal(node) return Utility(node) v = for each action from node v = Max(v, Min-Value(Succ(node, action), α, β)) if v β return v α = Max(α, v) end for return v action Succ(node, action)

Alpha-beta pruning Pruning does not affect final result Amount of pruning depends on move ordering Should start with the best moves (highest-value for MAX or lowest-value for MIN) For chess, can try captures first, then threats, then forward moves, then backward moves Can also try to remember killer moves from other branches of the tree With perfect ordering, the time to find the best move is reduced to O(b m/2 ) from O(b m ) Depth of search is effectively doubled

Limited-Horizon Computation

Games vs. single-agent search We don t know how the opponent will act The solution is not a fixed sequence of actions from start state to goal state, but a strategy or policy (a mapping from state to best move in that state)

Games vs. single-agent search We don t know how the opponent will act The solution is not a fixed sequence of actions from start state to goal state, but a strategy or policy (a mapping from state to best move in that state) Efficiency is critical to playing well The time to make a move is limited The branching factor, search depth, and number of terminal configurations are huge In chess, branching factor 35 and depth 100, giving a search tree of 10 154 nodes Number of atoms in the observable universe 10 80 This rules out searching all the way to the end of the game

Evaluation function Cut off search at a certain depth and compute the value of an evaluation function for a state instead of its minimax value The evaluation function may be thought of as the probability of winning from a given state or the expected value of that state A common evaluation function is a weighted sum of features: Eval(s) = w 1 f 1 (s) + w 2 f 2 (s) + + w n f n (s) For chess, w k may be the material value of a piece (pawn = 1, knight = 3, rook = 5, queen = 9) and f k (s) may be the advantage in terms of that piece Evaluation functions may be learned from game databases or by having the program play many games against itself

Cutting off search Horizon effect: you may incorrectly estimate the value of a state by overlooking an event that is just beyond the depth limit For example, a damaging move by the opponent that can be delayed but not avoided Possible remedies Quiescence search: do not cut off search at positions that are unstable for example, are you about to lose an important piece? Singular extension: a strong move that should be tried when the normal depth limit is reached

Advanced techniques Transposition table to store previously expanded states Forward pruning to avoid considering all possible moves Lookup tables for opening moves and endgames

Chess playing systems Baseline system: 200 million node evalutions per move (3 min), minimax with a decent evaluation function and quiescence search 5-ply human novice Add alpha-beta pruning 10-ply typical PC, experienced player Deep Blue: 30 billion evaluations per move, singular extensions, evaluation function with 8000 features, large databases of opening and endgame moves 14-ply Garry Kasparov More recent state of the art (Hydra, ca. 2006): 36 billion evaluations per second, advanced pruning techniques 18-ply better than any human alive?

Summary A zero-sum game can be expressed as a minimax tree Alpha-beta pruning finds the correct solution. In the best case, it has half the exponent of minimax (can search twice as deeply with a given computational complexity). Limited-horizon search is always necessary (you can t search to the end of the game), and always suboptimal. Estimate your utility, at the end of your horizon, using some type of learned utility function Quiescence search: don t cut off the search in an unstable position (need some way to measure stability ) Singular extension: have one or two super-moves that you can test at the end of your horizon