Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Similar documents
CSE 473: Artificial Intelligence. Outline

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

CS 188: Artificial Intelligence

Game Playing State-of-the-Art

Artificial Intelligence

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

CSE 573: Artificial Intelligence

Adversarial Search Lecture 7

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence Spring Announcements

Programming Project 1: Pacman (Due )

Game Playing State of the Art

Local Search. Hill Climbing. Hill Climbing Diagram. Simulated Annealing. Simulated Annealing. Introduction to Artificial Intelligence

Artificial Intelligence

Adversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012

Announcements. CS 188: Artificial Intelligence Fall Local Search. Hill Climbing. Simulated Annealing. Hill Climbing Diagram

CSE 40171: Artificial Intelligence. Adversarial Search: Games and Optimality

CSE 473: Artificial Intelligence Fall Outline. Types of Games. Deterministic Games. Previously: Single-Agent Trees. Previously: Value of a State

Announcements. CS 188: Artificial Intelligence Fall Today. Tree-Structured CSPs. Nearly Tree-Structured CSPs. Tree Decompositions*

CSE 573: Artificial Intelligence Autumn 2010

CS 188: Artificial Intelligence. Overview

CSE 473: Ar+ficial Intelligence

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Spring 2007

Adversarial Search 1

Artificial Intelligence

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Adversarial Games. Deterministic Games.

CS 188: Artificial Intelligence

Adversarial Search. Chapter 5. Mausam (Based on slides of Stuart Russell, Andrew Parks, Henry Kautz, Linda Shapiro) 1

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

CSE 473: Artificial Intelligence Autumn 2011

Game Playing: Adversarial Search. Chapter 5

Lecture 5: Game Playing (Adversarial Search)

CS 188: Artificial Intelligence Spring Game Playing in Practice

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón

Game Playing. Philipp Koehn. 29 September 2015

Game Playing. Why do AI researchers study game playing? 1. It s a good reasoning problem, formal and nontrivial.

CSE 40171: Artificial Intelligence. Adversarial Search: Game Trees, Alpha-Beta Pruning; Imperfect Decisions

Games vs. search problems. Game playing Chapter 6. Outline. Game tree (2-player, deterministic, turns) Types of games. Minimax

Artificial Intelligence. Topic 5. Game playing

Game playing. Chapter 6. Chapter 6 1

Game playing. Chapter 5. Chapter 5 1

Game playing. Outline

Adversarial search (game playing)

Project 1. Out of 20 points. Only 30% of final grade 5-6 projects in total. Extra day: 10%

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

CS 380: ARTIFICIAL INTELLIGENCE

Game playing. Chapter 6. Chapter 6 1

Artificial Intelligence Adversarial Search

Artificial Intelligence, CS, Nanjing University Spring, 2018, Yang Yu. Lecture 4: Search 3.

COMP219: Artificial Intelligence. Lecture 13: Game Playing

Game Playing. Dr. Richard J. Povinelli. Page 1. rev 1.1, 9/14/2003

Today. Types of Game. Games and Search 1/18/2010. COMP210: Artificial Intelligence. Lecture 10. Game playing

Outline. Game playing. Types of games. Games vs. search problems. Minimax. Game tree (2-player, deterministic, turns) Games

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Game playing. Chapter 5, Sections 1{5. AIMA Slides cstuart Russell and Peter Norvig, 1998 Chapter 5, Sections 1{5 1

Games vs. search problems. Adversarial Search. Types of games. Outline

Adversarial Search. CMPSCI 383 September 29, 2011

Game-Playing & Adversarial Search

Game playing. Chapter 5, Sections 1 6

Adversarial Search. Chapter 5. Mausam (Based on slides of Stuart Russell, Andrew Parks, Henry Kautz, Linda Shapiro, Diane Cook) 1

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

CS 771 Artificial Intelligence. Adversarial Search

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1

Intuition Mini-Max 2

Game-playing: DeepBlue and AlphaGo

Foundations of Artificial Intelligence

Artificial Intelligence. Minimax and alpha-beta pruning

Foundations of Artificial Intelligence

Adversarial Search and Game Playing

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

Adversarial Search (Game Playing)

Adversarial Search. Robert Platt Northeastern University. Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA

Ch.4 AI and Games. Hantao Zhang. The University of Iowa Department of Computer Science. hzhang/c145

ARTIFICIAL INTELLIGENCE (CS 370D)

ADVERSARIAL SEARCH. Chapter 5

CS 331: Artificial Intelligence Adversarial Search II. Outline

CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017

Games and Adversarial Search

Game Playing AI Class 8 Ch , 5.4.1, 5.5

6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search

Ar#ficial)Intelligence!!

CITS3001. Algorithms, Agents and Artificial Intelligence. Semester 2, 2016 Tim French

ADVERSARIAL SEARCH. Today. Reading. Goals. AIMA Chapter , 5.7,5.8

CS 4700: Foundations of Artificial Intelligence

Adversarial Search: Game Playing. Reading: Chapter

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

Game-Playing & Adversarial Search Alpha-Beta Pruning, etc.

ADVERSARIAL SEARCH. Today. Reading. Goals. AIMA Chapter Read , Skim 5.7

School of EECS Washington State University. Artificial Intelligence

Adversarial Search Aka Games

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Adversarial Search (a.k.a. Game Playing)

CPS 570: Artificial Intelligence Two-player, zero-sum, perfect-information Games

Transcription:

CSE 473: Artificial Intelligence Fall 2017 Adversarial Search Mini, pruning, Expecti Dieter Fox Based on slides adapted Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Dan Weld, Stuart Russell or Andrew Moore 1 Game Playing State-of-the-Art 2017 Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. Checkers is now solved! Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue exaed 200 million positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic. Othello: Human champions refuse to compete against computers, which are too good. Go: In March 2016, AlphaGo beats 9-dan master Lee Sedol (3 wins, 1 loss, 1 win). Combines Monte-Carlo tree search with deep reinforcement learning. Poker: In December 2016, computer beats professional players at no-limit Texas hold 'em Adversarial Search Game Playing Many different kinds of games! Choices: Deteristic or stochastic? One, two, or more players? Perfect information (can you see the state)? Want algorithms for calculating a strategy (policy) which recommends a move in each state Deteristic Games Many possible formalizations, one is: States: S (start at s 0 ) Players: P={1...N} (usually take turns) Actions: A (may depend on player / state) Transition Function: S x A S Teral Test: S {t,f} Teral Utilities: S x P R Solution for a player is a policy: S A Zero-Sum Games Zero-Sum Games Agents have opposite utilities (values on outcomes) Lets us think of a single value that one imizes and the other imizes Adversarial, pure competition General Games Agents have independent utilities (values on outcomes) Cooperation, indifference, competition, & more are possible 1

Single-Agent Trees Value of a State Value of a state: The best achievable outcome (utility) from that state Non-Teral States: 8 8 2 0 2 6 4 6 2 0 2 6 4 6 Teral States: Adversarial Game Trees Mini Values States Under Agent s Control: States Under Opponent s Control: -8-5 -10 +8-20 -8-18 -5-10 +4-20 +8 Teral States: Tic-tac-toe Game Tree Adversarial Search (Mini) Deteristic, zero-sum games: Tic-tac-toe, chess, checkers One player imizes result The other imizes result Mini search: A state-space search tree Players alternate turns Compute each node s i value: the best achievable utility against a rational (optimal) adversary Mini values: computed recursively 5 2 5 8 2 5 6 Teral values: part of the game 2

Mini Implementation Mini Implementation (Dispatch) def -value(state): v = (v, value(successor)) def -value(state): initialize v = + v = (v, value(successor)) def value(state): if the state is a teral state: return the state s utility if the next agent is MAX: return -value(state) if the next agent is MIN: return -value(state) def -value(state): v = (v, value(successor)) def -value(state): initialize v = + v = (v, value(successor)) Concrete Mini Example Mini Properties Optimal? Yes, against perfect player. Otherwise? Time complexity O(b m ) Space complexity? O(bm) For chess, b 35, m 100 Exact solution is completely infeasible But, do we need to explore the whole tree? 10 10 9 100 Pruning Example α-β Pruning General configuration 3 [-,2]? α is the best value that MAX can get at any choice point along the current path If n becomes worse than α, MAX will avoid it, so can stop considering n s other children Define β similarly for MIN Player Opponent Player Opponent α n Progress of search 3

Alpha-Beta Pruning Properties This pruning has no effect on final result at the root Values of intermediate nodes might be wrong! but, they are bounds Good child ordering improves effectiveness of pruning With perfect ordering : Time complexity drops to O(b m/2 ) Doubles solvable depth! Full search of, e.g. chess, is still hopeless Alpha-Beta Implementation α: MAX s best option on path to root β: MIN s best option on path to root def -value(state, α, β): v = (v, value(successor, α, β)) if v β α = (α, v) def -value(state, α, β): initialize v = + v = (v, value(successor, α, β)) if v α β = (β, v) Cannot search to leaves Depth-limited search Resource Limits Instead, search a limited depth of tree Replace teral utilities with an eval function for non-teral positions Guarantee of optimal play is gone Example: Suppose we have 100 seconds, can explore 10K nodes / sec So can check 1M nodes per move α-β reaches about depth 8 decent chess program 4-2 4-1 -2 4 9???? Evaluation Functions Function which scores non-terals Ideal function: returns the utility of the position In practice: typically weighted linear sum of features: e.g. f 1 (s) = (num white queens num black queens), etc. Which algorithm? α-β, depth 4, simple eval fun Which algorithm? α-β, depth 4, better eval fun 4

Worst-Case vs. Average Case Worst-Case vs. Average Case ma x mi n chance 10 10 9 100 10 10 9 100 Idea: Uncertain outcomes controlled by chance, not an adversary! Expecti Search Mini vs Expecti Why wouldn t we know what the result of an action will be? Explicit randomness: rolling dice Unpredictable opponents: the ghosts respond randomly Actions can fail: when moving a robot, wheels might slip Values should now reflect average-case (expecti) outcomes, not worst-case (i) outcomes chance Expecti Mini Expecti search: compute the average score under optimal play Max nodes as in i search Chance nodes are like nodes but the outcome is uncertain Calculate their expected utilities I.e. take weighted average (expectation) of children Later, we ll learn how to formalize the underlying uncertain-result problems as Markov Decision Processes 10 10 4 59 100 7 3 ply look ahead, ghosts move randomly Expecti Pseudocode Expecti Pseudocode def value(state): if the state is a teral state: return the state s utility if the next agent is MAX: return -value(state) if the next agent is EXP: return exp-value(state) 1/2 1/3 10 1/6 def -value(state): v = (v, value(successor)) def exp-value(state): initialize v = 0 p = probability(successor) v += p * value(successor) 58 24 7-12 def exp-value(state): initialize v = 0 v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10 p = probability(successor) v += p * value(successor) 5

Expecti Example Expecti Pruning? 10 3 12 9 2 4 6 15 6 0 8 24-12 2 Depth-Limited Expecti Estimate of true expecti 400 300 value (which would require a lot of work to 492 362 compute) 6