CITS3001. Algorithms, Agents and Artificial Intelligence. Semester 2, 2016 Tim French

Similar documents
Artificial Intelligence. Topic 5. Game playing

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

Game Playing. Philipp Koehn. 29 September 2015

Today. Types of Game. Games and Search 1/18/2010. COMP210: Artificial Intelligence. Lecture 10. Game playing

Game playing. Chapter 6. Chapter 6 1

COMP219: Artificial Intelligence. Lecture 13: Game Playing

Adversarial search (game playing)

Game Playing: Adversarial Search. Chapter 5

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Games vs. search problems. Game playing Chapter 6. Outline. Game tree (2-player, deterministic, turns) Types of games. Minimax

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón

CS 188: Artificial Intelligence

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

Game playing. Chapter 5. Chapter 5 1

Game playing. Chapter 6. Chapter 6 1

Artificial Intelligence Adversarial Search

Games vs. search problems. Adversarial Search. Types of games. Outline

Game playing. Outline

Lecture 5: Game Playing (Adversarial Search)

Programming Project 1: Pacman (Due )

Outline. Game playing. Types of games. Games vs. search problems. Minimax. Game tree (2-player, deterministic, turns) Games

CS 380: ARTIFICIAL INTELLIGENCE

CS 188: Artificial Intelligence

CSE 473: Artificial Intelligence. Outline

Game Playing. Dr. Richard J. Povinelli. Page 1. rev 1.1, 9/14/2003

Game playing. Chapter 5, Sections 1 6

School of EECS Washington State University. Artificial Intelligence

Adversarial Search Lecture 7

Foundations of Artificial Intelligence

Artificial Intelligence, CS, Nanjing University Spring, 2018, Yang Yu. Lecture 4: Search 3.

Game playing. Chapter 5, Sections 1{5. AIMA Slides cstuart Russell and Peter Norvig, 1998 Chapter 5, Sections 1{5 1

6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search

Artificial Intelligence Search III

Foundations of Artificial Intelligence

CS 331: Artificial Intelligence Adversarial Search II. Outline

Adversarial Search. CMPSCI 383 September 29, 2011

CS 188: Artificial Intelligence Spring Announcements

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

ADVERSARIAL SEARCH. Chapter 5

Adversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012

CS 5522: Artificial Intelligence II

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

CSE 573: Artificial Intelligence Autumn 2010

Adversarial Search 1

Game Playing State-of-the-Art

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

CS 188: Artificial Intelligence Spring Game Playing in Practice

CS 4700: Foundations of Artificial Intelligence

Game-Playing & Adversarial Search

Artificial Intelligence. Minimax and alpha-beta pruning

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017

CS 771 Artificial Intelligence. Adversarial Search

Contents. Foundations of Artificial Intelligence. Problems. Why Board Games?

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

Local Search. Hill Climbing. Hill Climbing Diagram. Simulated Annealing. Simulated Annealing. Introduction to Artificial Intelligence

Artificial Intelligence

Ch.4 AI and Games. Hantao Zhang. The University of Iowa Department of Computer Science. hzhang/c145

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

CPS 570: Artificial Intelligence Two-player, zero-sum, perfect-information Games

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

CS 188: Artificial Intelligence Spring 2007

Artificial Intelligence

Game Playing. Why do AI researchers study game playing? 1. It s a good reasoning problem, formal and nontrivial.

Adversarial Search Aka Games

CSE 573: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Fall Local Search. Hill Climbing. Simulated Annealing. Hill Climbing Diagram

Adversary Search. Ref: Chapter 5

CS 188: Artificial Intelligence. Overview

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Intuition Mini-Max 2

ADVERSARIAL SEARCH. Today. Reading. Goals. AIMA Chapter Read , Skim 5.7

Adversarial Search and Game Playing

Artificial Intelligence

CPS331 Lecture: Search in Games last revised 2/16/10

Game Playing State of the Art

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13

Game Playing. Garry Kasparov and Deep Blue. 1997, GM Gabriel Schwartzman's Chess Camera, courtesy IBM.

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

Adversarial Search: Game Playing. Reading: Chapter

Game-playing AIs: Games and Adversarial Search I AIMA

Adversarial Search (Game Playing)

Foundations of AI. 5. Board Games. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard and Luc De Raedt SA-1

ADVERSARIAL SEARCH. Today. Reading. Goals. AIMA Chapter , 5.7,5.8

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

Game Playing AI Class 8 Ch , 5.4.1, 5.5

Games (adversarial search problems)

Games and Adversarial Search

DIT411/TIN175, Artificial Intelligence. Peter Ljunglöf. 2 February, 2018

Ar#ficial)Intelligence!!

Foundations of AI. 6. Board Games. Search Strategies for Games, Games with Chance, State of the Art

CSE 40171: Artificial Intelligence. Adversarial Search: Games and Optimality

Game Engineering CS F-24 Board / Strategy Games

Adversarial Search (a.k.a. Game Playing)

Artificial Intelligence

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

Transcription:

CITS3001 Algorithms, Agents and Artificial Intelligence Semester 2, 2016 Tim French School of Computer Science & Software Eng. The University of Western Australia 8. Game-playing AIMA, Ch. 5

Objectives We will motivate the investigation of games in AI We will apply our ideas on search to game trees o Minimax o Alpha-beta pruning We will introduce the idea of an evaluation function o And some concepts important to their design CITS3001: Algorithms, Agents and AI 2 8. Game-playing

Broadening our worldview In our discussions so far, we have assumed that world descriptions have been o Complete all information needed to solve the problem is available to the search algorithm o Deterministic the effects of actions are uniquely determined and predictable But this is rarely the case with real-world problems! Sources of incompleteness include o Sensor limitations it may be impossible to perceive the entire state of the world o Intractability the full state description may be too large to store, or too large to compute Sources of non-determinism are everywhere o e.g. people, weather, mechanical failure, dice, etc. Incompleteness non-determinism? o Both imply uncertainty o Addressing them involves similar techniques CITS3001: Algorithms, Agents and AI 3 8. Game-playing

Three approaches to uncertainty Contingency planning o Build all possibilities into the plan o Often makes the tree very large o Can only guarantee a solution if the number of contingencies is tractable Interleaving, or adaptive planning o Alternate between planning, acting, and sensing o Requires extra work during execution o Unsuitable for offline planning Strategy learning o Learn, from examples, strategies that can be applied in any situation o Must decide on parameterisation, state-evaluation, suitable examples to study, etc. CITS3001: Algorithms, Agents and AI 4 8. Game-playing

Why do we study games? Games provide o An abstraction of the real world o Well-defined, clear state descriptions o Limited operations with well-defined consequences o A way of making incremental, controllable changes o A way of including hostile agents So they provide a forum for investigating many of the real-world issues outlined previously o More like the real world than previous examples The initial state and the set of actions (the moves of the game) define a game tree that serves as the search tree o But of course different players get to choose actions at various points o So our previous search algorithms don t work! Games are to AI, as F1 is to car design CITS3001: Algorithms, Agents and AI 5 8. Game-playing

Example: noughts and crosses (tic-tac-toe) Each level down represents a move by one player o Known as one ply o Stop when we get to a goal state (three in a line) What is the size of this problem? CITS3001: Algorithms, Agents and AI 6 8. Game-playing

Noughts and crosses vital statistics The game tree as drawn above has 9! = 362,880 edges o But that includes games that continue after a victory o Removing these gives 255,168 edges Combining equivalent game boards leaves 26,830 edges o Mostly this means resolving rotations and reflections Each square can be a cross, a circle, or empty o Therefore there are 3 9 = 19,683 distinct boards o But that includes (e.g.) boards with five crosses and two circles o Removing these gives 5,478 distinct legal boards Resolving rotations and reflections leaves 765 distinct legal boards The takeaway message is think before you code! CITS3001: Algorithms, Agents and AI 7 8. Game-playing

Noughts and crosses scenarios You get to choose your opponent s moves, and you know the goal, but you don t know what is a good move o Normal search works, because you control everything o What is the best uninformed search strategy? How many states does it visit? o What is a good heuristic for A* here? How many states does it visit? Your opponent plays randomly o Does normal search work? o Uninformed strategy? o A* heuristic? Your opponent tries o We know it s a draw really CITS3001: Algorithms, Agents and AI 8 8. Game-playing

Noughts and crosses example One important difference with games is that we don t get to dictate all of the actions chosen o The opponent has a say too! cross wins circle wins draw circle wins cross wins draw CITS3001: Algorithms, Agents and AI 9 8. Game-playing

Perfect play the Minimax algorithm Consider a two-player game between MAX and MIN o Moves alternate between the players Assume it is a zero-sum game o Whatever is good for one player, is bad for the other Assume also that we have a utility function that we can apply to any game position o utility(s) returns r R o if s is a win for MAX o positive if s is good for MAX o 0 if s is even o negative if s is good for MIN o if s is a win for MIN Whenever MAX has the move in position s, they choose the move that maximises the value of utility(s) o Assuming that MIN chooses optimally Conversely for MIN Minimax(s) = utility(s), if terminal(s) = max{minimax(result(s, a)) a actions(s)}, if player(s) = MAX = min{minimax(result(s, a)) a actions(s)}, if player(s) = MIN CITS3001: Algorithms, Agents and AI 10 8. Game-playing

Minimax operation We imagine that the game tree is expanded to some definition of terminals o This will depend on the search depth In the figure, two ply o This will depend on the available resources o In general, it won t be uniform across the tree The tree is generated top-down, starting from the current position o Then Minimax is applied bottom-up, from the leaves back to the current position At each of MAX s choices, they (nominally) choose the move that maximises the utility o Conversely for MIN CITS3001: Algorithms, Agents and AI 11 8. Game-playing

Minimax performance Complete: yes, for a finite tree Optimal: yes, against an optimal opponent Time: O(b m ), all nodes examined Space: O(bm), depth-first (or depth-limited) search Minimax can be extended straightforwardly to multi-player games o Section 5.2.2 of AIMA But for a big game like chess, expanding to the terminals is completely infeasible The standard approach is to employ o A cut-off test, e.g. a depth limit Possibly with quiescence search o An evaluation function Heuristic used to estimate the desirability of a position This will still be perfect play o If we have a perfect evaluation function CITS3001: Algorithms, Agents and AI 12 8. Game-playing

Example: chess Average branching factor is 35 o Search tree has maybe 35 100 nodes o Although only around 10 40 distinct legal positions Clearly cannot solve by brute force o Intractable nature incomplete search o So offline contingency planning is impossible Interleave time- or space-limited search with moves o This lecture o Algorithm for perfect play [Von Neumann, 1944] o Finite-horizon, approximate evaluation [Zuse, 1945] o Pruning to reduce search costs [McCarthy, 1956] Or use/learn strategies to facilitate move-choice based on current position o Later in CITS3001 What do humans do? CITS3001: Algorithms, Agents and AI 13 8. Game-playing

Evaluation functions If we cannot expand the game tree to terminal nodes, we expand as far as we can and apply some judgement to decide which positions are best A standard approach is to define a linear weighted sum of relevant features o e.g. in chess: 1 for each pawn, 3 for each knight or bishop, 5 for each rook, 9 for each queen o Plus positional considerations, e.g. centre control o Plus dynamic considerations, e.g. threats material advantage positional advantage eval(s) = w 1 f 1 (s) + w 2 f 2 (s) + + w n f n (s) o e.g. w 1 = 9 o e.g. f 1 (s) = number of white Qs number of black Qs Non-linear combinations are also used o e.g. reward pairs of bishops CITS3001: Algorithms, Agents and AI 14 8. Game-playing

Properties of good evaluation functions Usually the quality of the player depends critically on the quality of the evaluation function An evaluation function should o Agree with the utility function on terminal states o Reflect the probability of winning o Be time efficient, to allow maximum search depth Note that the exact values returned seldom matter o Only the ordering matters An evaluation could also be accompanied by a measure of certainty o e.g. we may prefer high certainty when we are ahead, low certainty when we are behind CITS3001: Algorithms, Agents and AI 15 8. Game-playing

Cutting off search We can cut-off search at a fixed depth o Works well for simple games o Depth-limited search Often we are required to manage the time taken per move o Can be hard to turn time into a cut-off depth o Use iterative-deepening An anytime algorithm o Sacrifice (some) depth for flexibility Sometimes we are required to manage the time taken for a series of moves o More complicated again o Sometimes we can anticipate changes in the branching factor Seldom want cut-off depth to be uniform across the tree o Two particular issues that arise often are quiescence and the horizon effect CITS3001: Algorithms, Agents and AI 16 8. Game-playing

Quiescence A quiescent situation is one where values from the evaluation function are unlikely to change much in the near future Using a fixed search-depth can mean relying on the evaluations of non-quiescent situations o Can avoid this by e.g. extending the search to the end of a series of captures CITS3001: Algorithms, Agents and AI 17 8. Game-playing

The horizon effect If we are searching to k ply, something bad that will happen on the k+1 th ply (or later) will be invisible In extreme cases, we may even select bad moves, simply to postpone the inevitable o If the inevitable scores x, any move that scores better than x in the search window looks good o Even if the inevitable is still guaranteed to happen later! No general solution to this problem o It is fundamentally a problem with lack of depth CITS3001: Algorithms, Agents and AI 18 8. Game-playing

Alpha-beta pruning One way we can reduce the number of nodes examined by Minimax is to identify nodes that cannot be better than those that we have already seen o This will enable a deeper search in the same time Consider again Fig. 5.2 Minimax(A) = max(min(_,_,_), min(_,_,_), min(_,_,_)) o Working from left-to-right o First we inspect the 3, 12, and 8 Minimax(A) = max(3, min(_, _, _), min(_, _, _)) o Next we inspect the first 2 Minimax(A) = max(3, min(2, _, _), min(_, _, _)) o This is less than the 3 o The next two leaves are immediately irrelevant Minimax(A) = max(3, min(_, _, _)) = max(3, 2) = 3 We do not need to inspect the 5 th and 6 th leaves o But we do need to inspect the 8 th and 9 th CITS3001: Algorithms, Agents and AI 19 8. Game-playing

Alpha-beta operation We need to keep track of the range of possible values for each internal node CITS3001: Algorithms, Agents and AI 20 8. Game-playing

Alpha-beta general case In Fig. 5.6, if o On the left sub-tree, we know definitely that we can choose a move that gives score m, and o On the right sub-tree, we know that the opponent can choose a move that limits the score to n m Then we will never (rationally) choose the move that leads to the right sub-tree CITS3001: Algorithms, Agents and AI 21 8. Game-playing

Alpha-beta pseudo-code αβsearch(s): return a actions(s) with value maxvalue(s,, + ) maxvalue(s, α, β): if terminal(s) return utility(s) else v = for a in actions(s) v = max(v, minvalue(result(s, a), α, β)) if v β return v α = max(α, v) return v minvalue(s, α, β): if terminal(s) return utility(s) else w = + for a in actions(s) w = min(w, maxvalue(result(s, a), α, β)) if w α return w β = min(β, w) return w CITS3001: Algorithms, Agents and AI 22 8. Game-playing

Alpha-beta in action αβsearch(a) = maxvalue(a,, + ), v = call minvalue(b,, + ), w = + o call maxvalue(b 1,, + ) returns 3, w = 3, β = 3 o call maxvalue(b 2,, 3) returns 12 o call maxvalue(b 3,, 3) returns 8 o returns 3, v = 3, α = 3 call minvalue(c, 3, + ), w = + o call maxvalue(c 1, 3, + ) returns 2, w = 2 o returns 2 call minvalue(d, 3, + ), w = + o call maxvalue(d 1, 3, + ) returns 14, w = 14, β = 14 o call maxvalue(d 2, 3, 14) returns 5, w = 5, β = 5 o call maxvalue(d 3, 3, 5) returns 2, w = 2 o returns 2 returns 3 CITS3001: Algorithms, Agents and AI 23 8. Game-playing

Alpha-beta discussion Pruning does not affect the final result o It simply gets us there sooner A good move ordering means we can prune more o e.g. if we had inspected D 3 first, we could have pruned D 1 and D 2 We want to test expected good moves first o Good from the POV of that node s player Perfect ordering can double our search depth o Obviously perfection is unattainable, but e.g. in chess we might test Captures Threats Forward moves Backward moves Sometimes we can learn good orderings o Known as speedup learning o Can play either faster at the same standard, or better in the same time CITS3001: Algorithms, Agents and AI 24 8. Game-playing

Game-playing agents: history and state-of-the-art Checkers (Draughts) o Marion Tinsley ruled Checkers for forty years, losing only seven games in that time o In 1994 Tinsley s health forced him to resign from a match against Chinook, which was crowned world champion shortly afterwards o At that time, Chinook used a database of 443,748,401,247 endgame positions o Checkers has since been proved to be a draw with perfect play The proof was announced in this very room! Chinook now plays perfectly, using αβ search and a database of 39,000,000,000,000 positions Chess o Deep Blue defeated Gary Kasparov in a six-game match in 1997 o Deep Blue searches 200,000,000 positions/second, up to 40 ply deep Othello o Look-ahead is very difficult for humans in Othello o The Moor became world champion in 1980 o These days computers are banned from championship play CITS3001: Algorithms, Agents and AI 25 8. Game-playing

Contd. Go o 19x19 Go has a branching factor of over 300, making look-ahead very difficult for programs Play at a good amateur level, although still improving o They are much better at 9x9 o DeepMind AlphaGo defeated Go champion Lee Sedol in March 2016. Backgammon o Dice rolls increase the branching factor 21 possible rolls with two dice o About 20 legal moves with most positions and rolls Although approx 6,000 sometimes with a 1+1! Depth 4 means 20 x (21 x 20) 3 1,500,000,000 possibilities o Obviously most of this search is wasted Value of look-ahead is much diminished o TDGammon (1992) used depth 2 search plus a very good evaluation function to reach almost world champion level Players have since copied its style! o Modern programs based on neural networks are believed to better than the best humans CITS3001: Algorithms, Agents and AI 26 8. Game-playing