CS 188: Artificial Intelligence

Similar documents
Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Game Playing State-of-the-Art

CS 5522: Artificial Intelligence II

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

CS 188: Artificial Intelligence

Artificial Intelligence

Programming Project 1: Pacman (Due )

Adversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

CS 188: Artificial Intelligence Spring Announcements

Adversarial Search Lecture 7

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón

Game playing. Chapter 6. Chapter 6 1

CS 188: Artificial Intelligence. Overview

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Game Playing: Adversarial Search. Chapter 5

Game Playing. Philipp Koehn. 29 September 2015

Outline. Game playing. Types of games. Games vs. search problems. Minimax. Game tree (2-player, deterministic, turns) Games

CS 380: ARTIFICIAL INTELLIGENCE

Games vs. search problems. Game playing Chapter 6. Outline. Game tree (2-player, deterministic, turns) Types of games. Minimax

Game playing. Chapter 6. Chapter 6 1

CS 188: Artificial Intelligence Spring 2007

CSE 473: Ar+ficial Intelligence

Artificial Intelligence

Games vs. search problems. Adversarial Search. Types of games. Outline

Announcements. CS 188: Artificial Intelligence Fall Local Search. Hill Climbing. Simulated Annealing. Hill Climbing Diagram

Local Search. Hill Climbing. Hill Climbing Diagram. Simulated Annealing. Simulated Annealing. Introduction to Artificial Intelligence

CSE 573: Artificial Intelligence

Lecture 5: Game Playing (Adversarial Search)

Game playing. Chapter 5. Chapter 5 1

Game Playing State of the Art

CSE 40171: Artificial Intelligence. Adversarial Search: Games and Optimality

CS 188: Artificial Intelligence Spring Game Playing in Practice

CSE 573: Artificial Intelligence Autumn 2010

Artificial Intelligence

Game playing. Outline

ADVERSARIAL SEARCH. Chapter 5

Artificial Intelligence, CS, Nanjing University Spring, 2018, Yang Yu. Lecture 4: Search 3.

Game playing. Chapter 5, Sections 1 6

Adversarial Search and Game Playing

Adversarial search (game playing)

CSE 473: Artificial Intelligence. Outline

CS 771 Artificial Intelligence. Adversarial Search

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Game-Playing & Adversarial Search

Game Playing. Dr. Richard J. Povinelli. Page 1. rev 1.1, 9/14/2003

CSE 473: Artificial Intelligence Fall Outline. Types of Games. Deterministic Games. Previously: Single-Agent Trees. Previously: Value of a State

Adversarial Search 1

Game playing. Chapter 5, Sections 1{5. AIMA Slides cstuart Russell and Peter Norvig, 1998 Chapter 5, Sections 1{5 1

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1

Artificial Intelligence. Topic 5. Game playing

CSE 473: Artificial Intelligence Autumn 2011

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Adversarial Games. Deterministic Games.

Artificial Intelligence Adversarial Search

CS 188: Artificial Intelligence

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

Games and Adversarial Search

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

CS 331: Artificial Intelligence Adversarial Search II. Outline

CSE 40171: Artificial Intelligence. Adversarial Search: Game Trees, Alpha-Beta Pruning; Imperfect Decisions

6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Project 1. Out of 20 points. Only 30% of final grade 5-6 projects in total. Extra day: 10%

Artificial Intelligence. Minimax and alpha-beta pruning

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

Adversarial Search. CMPSCI 383 September 29, 2011

Adversarial Search. Chapter 5. Mausam (Based on slides of Stuart Russell, Andrew Parks, Henry Kautz, Linda Shapiro) 1

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

Announcements. CS 188: Artificial Intelligence Fall Today. Tree-Structured CSPs. Nearly Tree-Structured CSPs. Tree Decompositions*

Today. Types of Game. Games and Search 1/18/2010. COMP210: Artificial Intelligence. Lecture 10. Game playing

School of EECS Washington State University. Artificial Intelligence

Game Playing. Why do AI researchers study game playing? 1. It s a good reasoning problem, formal and nontrivial.

CITS3001. Algorithms, Agents and Artificial Intelligence. Semester 2, 2016 Tim French

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5

Adversarial Search (Game Playing)

CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017

Adversarial Search Aka Games

Ch.4 AI and Games. Hantao Zhang. The University of Iowa Department of Computer Science. hzhang/c145

CS 4700: Foundations of Artificial Intelligence

Adversarial Search (a.k.a. Game Playing)

COMP219: Artificial Intelligence. Lecture 13: Game Playing

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Foundations of Artificial Intelligence

Ar#ficial)Intelligence!!

Adversarial Search. Robert Platt Northeastern University. Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA

Foundations of Artificial Intelligence

Pengju

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Contents. Foundations of Artificial Intelligence. Problems. Why Board Games?

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Adversarial Search: Game Playing. Reading: Chapter

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Adversarial Search and Game Playing. Russell and Norvig: Chapter 5

Game-Playing & Adversarial Search Alpha-Beta Pruning, etc.

Artificial Intelligence

Intuition Mini-Max 2

Transcription:

CS 188: Artificial Intelligence Adversarial Search Instructor: Stuart Russell University of California, Berkeley

Game Playing State-of-the-Art Checkers: 1950: First computer player. 1959: Samuel s self-taught program. 1994: First computer world champion: Chinook ended 40- year-reign of human champion Marion Tinsley using complete 8-piece endgame. 2007: Checkers solved! Endgame database of 39 trillion states Chess: 1945-1960: Zuse, Wiener, Shannon, Turing, Newell & Simon, McCarthy. 1960s onward: gradual improvement under standard model 1997: special-purpose chess machine Deep Blue defeats human champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second and extended some lines of search up to 40 ply. Current programs running on a PC rate > 3200 (vs 2870 for Magnus Carlsen). Go: 1968: Zobrist s program plays legal Go, barely (b>300!) 2005-2014: Monte Carlo tree search enables rapid advances: current programs beat strong amateurs, and professionals with a 3-4 stone handicap. Pacman

Behavior from Computation [Demo: mystery pacman (L6D1)]

Video of Demo Mystery Pacman

Types of Games Many different kinds of games! Axes: Deterministic or stochastic? One, two, or more players? Turn-taking or simultaneous? Zero sum? Perfect information (fully observable)? Want algorithms for calculating a contingent plan (a.k.a. strategy or policy) which recommends a move for every possible eventuality

Standard Games Standard games are deterministic, observable, turn-taking, two-player, zero-sum Game formulation: Initial state: s 0 Players: Player(s) indicates whose move it is Actions: Actions(s) for player on move Transition model: Result(s,a) Terminal test: Terminal-Test(s) Terminal values: Utility(s,p) for player p Or just Utility(s) for player making the decision at root

Zero-Sum Games Zero-Sum Games Agents have opposite utilities Pure competition: One maximizes, the other minimizes General Games Agents have independent utilities Cooperation, indifference, competition, shifting alliances, and more are all possible

Adversarial Search

Single-Agent Trees 8 2 0 2 6 4 6

Value of a State Value of a state: The best achievable outcome (utility) from that state Non-Terminal States: 8 2 0 2 6 4 6 Terminal States:

Tic-Tac-Toe Game Tree

Minimax Values MAX nodes: under Agent s Control: MIN nodes: under Opponent s Control: -8-8 -10-8 -5-10 +8 Terminal States:

Minimax Implementation What kind of search? Depth-first function minimax-decision(s) returns an action return the action a in Actions(s) with the highest min-value(result(s,a)) function max-value(s) returns a value if Terminal-Test(s) then return Utility(s) initialize v = - for each a in Actions(s): v = max(v, min-value(result(s,a))) return v function min-value(s) returns a value if Terminal-Test(s) then return Utility(s) initialize v = + for each a in Actions(state): v = min(v, max-value(result(s,a)) return v

Alternative Implementation function minimax-decision(s) returns an action return the action a in Actions(s) with the highest value(result(s,a)) function value(s) returns a value if Terminal-Test(s) then return Utility(s) if Player(s) = MAX then return max a in Actions(s) value(result(s,a)) if Player(s) = MIN then return min a in Actions(s) value(result(s,a))

Minimax Example 3 12 8 2 4 6 14 5 2

Minimax Efficiency How efficient is minimax? Just like (exhaustive) DFS Time: O(b m ) Space: O(bm) Example: For chess, b 35, m 100 Exact solution is completely infeasible But, do we need to explore the whole tree?

Resource Limits

Resource Limits Problem: In realistic games, cannot search to leaves! Solution 1: Bounded lookahead Search only to a preset depth limit or horizon Use an evaluation function for non-terminal positions Guarantee of optimal play is gone More plies make a BIG difference Example: Suppose we have 100 seconds, can explore 10K nodes / sec So can check 1M nodes per move For chess, b=~35 so reaches about depth 4 not so good 4-2 4-1 -2 4 9???? max min

Depth Matters Evaluation functions are always imperfect Deeper search => better play (usually) Or, deeper search gives same quality of play with a less accurate evaluation function An important example of the tradeoff between complexity of features and complexity of computation [Demo: depth limited (L6D4, L6D5)]

Video of Demo Limited Depth (2)

Video of Demo Limited Depth (10)

Evaluation Functions

Evaluation Functions Evaluation functions score non-terminals in depth-limited search Ideal function: returns the actual minimax value of the position In practice: typically weighted linear sum of features: EVAL(s) = w 1 f 1 (s) + w 2 f 2 (s) +. + w n f n (s) E.g., w 1 = 9, f 1 (s) = (num white queens num black queens), etc. Terminate search only in quiescent positions, i.e., no major changes expected in feature values

Evaluation for Pacman

Video of Demo Smart Ghosts (Coordination)

Generalized minimax What if the game is not zero-sum, or has multiple players? Generalization of minimax: Terminals have utility tuples Node values are also utility tuples Each player maximizes its own component Can give rise to cooperation and competition dynamically 8,8,1 8,8,1 7,7,2 0,0,7 8,8,1 7,7,2 0,0,8 1,1,6 0,0,7 9,9,0 8,8,1 9,9,0 7,7,2 0,0,8 0,0,7

Game Tree Pruning

Minimax Example 3 12 8 2 4 6 14 5 2

Alpha-Beta Example α = best option so far from any MAX node on this path α =3 α =3 3 12 8 2 14 5 2 The order of generation matters: more pruning is possible if good moves come first

Alpha-Beta Pruning General case (pruning children of MIN node) We re computing the MIN-VALUE at some node n We re looping over n s children n s estimate of the childrens min is dropping Who cares about n s value? MAX Let α be the best value that MAX can get so far at any choice point along the current path from the root If n becomes worse than α, MAX will avoid it, so we can prune n s other children (it s already bad enough that it won t be played) Pruning children of MAX node is symmetric Let β be the best value that MIN can get so far at any choice point along the current path from the root MAX MIN MAX MIN a n

Alpha-Beta Pruning Properties Theorem: This pruning has no effect on minimax value computed for the root! Good child ordering improves effectiveness of pruning Iterative deepening helps with this max With perfect ordering : Time complexity drops to O(b m/2 ) Doubles solvable depth! 1M nodes/move => depth=8, respectable min 10 10 0 This is a simple example of metareasoning (computing about what to compute)

Alpha-Beta Quiz

Alpha-Beta Quiz 2

Minimax Revisited a b c 100 101 100 99 500 500 99 99 99 Minimax acts as if the leaf values are exact In fact they are estimates with some uncertainty; probably b is a better choice than a

Games with uncertain outcomes

Chance outcomes in trees 10 10 9 100 Tictactoe, chess Minimax 10 10 9 100 Tetris, investing Expectimax 10 9 10 9 10 100 Backgammon, Monopoly Expectiminimax

Minimax function decision(s) returns an action return the action a in Actions(s) with the highest value(result(s,a)) function value(s) returns a value if Terminal-Test(s) then return Utility(s) if Player(s) = MAX then return max a in Actions(s) value(result(s,a)) if Player(s) = MIN then return min a in Actions(s) value(result(s,a))

Expectiminimax function decision(s) returns an action return the action a in Actions(s) with the highest value(result(s,a)) function value(s) returns a value if Terminal-Test(s) then return Utility(s) if Player(s) = MAX then return max a in Actions(s) value(result(s,a)) if Player(s) = MIN then return min a in Actions(s) value(result(s,a)) if Player(s) = CHANCE then return sum a in Actions(s) Pr(a) * value(result(s,a))

Reminder: Expectations The expected value of a random variable is the average, weighted by the probability distribution over outcomes Example: How long to get to the airport? Time: Probability: 20 min 30 min 60 min + + x x x 0.25 0.50 0.25 35 min

Expectimax Pseudocode sum a in Action(s) Pr(a) * value(result(s,a)) 1/2 1/3 1/6 58 24 7-12 v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

Example: Backgammon Dice rolls increase b: 21 possible rolls with 2 dice Backgammon 20 legal moves Depth 2 = 20 x (21 x 20) 3 = 1.2 x 10 9 As depth increases, probability of reaching a given search node shrinks So usefulness of search is diminished So limiting depth is less damaging But pruning is trickier Historic AI: TDGammon uses depth-2 search + very good evaluation function + reinforcement learning: world-champion level play Image: Wikipedia

What Values to Use? 0 40 20 30 x 2 0 1600 400 900 x>y => f(x)>f(y) f(x) = Ax+B where A>0 For worst-case minimax reasoning, evaluation function scale doesn t matter We just want better states to have higher evaluations (get the ordering right) Minimax decisions are invariant with respect to monotonic transformations on values Expectiminimax decisions are invariant with respect to positive affine transformations Expectiminimax evaluation functions have to be aligned with actual win probabilities!

Summary Games require decisions when optimality is impossible Bounded-depth search and approximate evaluation functions Games force efficient use of computation Alpha-beta pruning Game playing has produced important research ideas Reinforcement learning (checkers) Iterative deepening (chess) Rational metareasoning (Othello) Monte Carlo tree search (Go) Solution methods for partial-information games in economics (poker) Video games present much greater challenges lots to do! b = 10 500, S = 10 4000, m = 10,000 48