CSE 332: Data Structures and Parallelism Games, Minimax, and Alpha-Beta Pruning. Playing Games. X s Turn. O s Turn. X s Turn.

Similar documents
2 person perfect information

mywbut.com Two agent games : alpha beta pruning

ARTIFICIAL INTELLIGENCE (CS 370D)

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

Adversary Search. Ref: Chapter 5

Playing Games. Henry Z. Lo. June 23, We consider writing AI to play games with the following properties:

CMPUT 396 Tic-Tac-Toe Game

game tree complete all possible moves

Adversarial Search 1

Games (adversarial search problems)

Computer Game Programming Board Games

CPS331 Lecture: Search in Games last revised 2/16/10

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13

Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning

CSC 380 Final Presentation. Connect 4 David Alligood, Scott Swiger, Jo Van Voorhis

Before attempting this project, you should read the handout on the algorithms! (games.pdf)

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

Game-Playing & Adversarial Search

Foundations of Artificial Intelligence

ADVERSARIAL SEARCH. Today. Reading. Goals. AIMA Chapter , 5.7,5.8

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Artificial Intelligence. Minimax and alpha-beta pruning

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Announcements. Homework 1 solutions posted. Test in 2 weeks (27 th ) -Covers up to and including HW2 (informed search)

CS188 Spring 2010 Section 3: Game Trees

COMP219: Artificial Intelligence. Lecture 13: Game Playing

CS188 Spring 2010 Section 3: Game Trees

CS151 - Assignment 2 Mancala Due: Tuesday March 5 at the beginning of class

Adversarial Search: Game Playing. Reading: Chapter

Game-Playing & Adversarial Search Alpha-Beta Pruning, etc.

Adversarial Search (Game Playing)

Game-playing AIs: Games and Adversarial Search I AIMA

CS 4700: Foundations of Artificial Intelligence

Game Playing AI. Dr. Baldassano Yu s Elite Education

CS 771 Artificial Intelligence. Adversarial Search

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Game Playing AI Class 8 Ch , 5.4.1, 5.5

Adversarial Search and Game Playing

For slightly more detailed instructions on how to play, visit:

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5

Game Playing Beyond Minimax. Game Playing Summary So Far. Game Playing Improving Efficiency. Game Playing Minimax using DFS.

Today. Types of Game. Games and Search 1/18/2010. COMP210: Artificial Intelligence. Lecture 10. Game playing

CS 188: Artificial Intelligence Spring 2007

CS188 Spring 2014 Section 3: Games

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

Game Playing. Chapter 8

Artificial Intelligence 1: game playing

Game Playing. Why do AI researchers study game playing? 1. It s a good reasoning problem, formal and nontrivial.

CS885 Reinforcement Learning Lecture 13c: June 13, Adversarial Search [RusNor] Sec

CS 188: Artificial Intelligence

Adversarial Search. Robert Platt Northeastern University. Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA

Game-playing: DeepBlue and AlphaGo

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

Adversarial Search Aka Games

Programming Project 1: Pacman (Due )

Five-In-Row with Local Evaluation and Beam Search

CPS 570: Artificial Intelligence Two-player, zero-sum, perfect-information Games

CS61B Lecture #22. Today: Backtracking searches, game trees (DSIJ, Section 6.5) Last modified: Mon Oct 17 20:55: CS61B: Lecture #22 1

ADVERSARIAL SEARCH. Today. Reading. Goals. AIMA Chapter Read , Skim 5.7

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

Adversarial Search and Game Playing. Russell and Norvig: Chapter 5

Artificial Intelligence Lecture 3

2/5/17 ADVERSARIAL SEARCH. Today. Introduce adversarial games Minimax as an optimal strategy Alpha-beta pruning Real-time decision making

Foundations of AI. 5. Board Games. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard and Luc De Raedt SA-1

V. Adamchik Data Structures. Game Trees. Lecture 1. Apr. 05, Plan: 1. Introduction. 2. Game of NIM. 3. Minimax

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1

Path Planning as Search

Tree representation Utility function

More Adversarial Search

Game Playing State-of-the-Art

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Adverserial Search Chapter 5 minmax algorithm alpha-beta pruning TDDC17. Problems. Why Board Games?

CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017

ADVERSARIAL SEARCH 5.1 GAMES

Game Tree Search 1/6/17

CS 5522: Artificial Intelligence II

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Introduction to Artificial Intelligence CS 151 Programming Assignment 2 Mancala!! Due (in dropbox) Tuesday, September 23, 9:34am

Artificial Intelligence

Artificial Intelligence Adversarial Search

1 Introduction. 1.1 Game play. CSC 261 Lab 4: Adversarial Search Fall Assigned: Tuesday 24 September 2013

16.410/413 Principles of Autonomy and Decision Making

Adversarial Search. CMPSCI 383 September 29, 2011

INF September 25, The deadline is postponed to Tuesday, October 3

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

Game Playing. Philipp Koehn. 29 September 2015

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game?

ADVERSARIAL SEARCH. Chapter 5

Artificial Intelligence

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

COMP9414: Artificial Intelligence Adversarial Search

Chess Algorithms Theory and Practice. Rune Djurhuus Chess Grandmaster / September 23, 2013

CS 188: Artificial Intelligence Spring Announcements

CSE 573: Artificial Intelligence

Adversarial search (game playing)

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

Transcription:

CSE 332: ata Structures and Parallelism Games, Minimax, and Alpha-Beta Pruning This handout describes the most essential algorithms for game-playing computers. NOTE: These are only partial algorithms: you will need to work out the details when doing P3. Playing Games To play a game of Tic-Tac-Toe, two players ( and O) alternate making moves. The first player to get three of their letter in a row wins. Usually, the board starts empty, but in the interest of a reasonable example, we ll look at a partially played game instead: must choose one of these moves O O O O O O O We make a few observations about the game above: If and O are both playing optimally, O will win. (Why) The leaves of the tree are terminal positions of the game. Moves alternate between the two players. This diagram is called a game tree and it s generated by starting at a move and recursively generating all the possible moves that could be made until the game ends. Putting this idea into pseudocode, we have: 1 void printterminalpositions(position p) { 3 print p } 5 else { 6 for (move in p.getmoves()) { 7 p.applymove(move); 8 printterminalpositions(move); 9 p.undomove(); 1 } 11 } 12 } Notice that this is a recursive backtracking algorithm. The definitions of getmoves, applymove, and undomove depend on the game that we re playing. or example, in Tic-Tac-Toe, getmoves returns a list of all the valid moves (or O moves, depending on the player s turn). 1

In a two player game (like Tic-Tac-Toe), there are three possible outcomes: (I win, I lose, We draw) Because every leaf must be one of these options, we can give them numerical values to evaluate how good they are. Since there are only these three, we use +1 for win, for draw, and 1 for lose. Importantly, these values are the only thing about the position that we actually care about! If we know a move is a +1, it doesn t matter what exactly the series of moves we made is. So, taking this into account, we re-draw our game tree (from s perspective): must choose one of these moves Now, to figure out which move to make, all we have to do is figure out the values of the blue moves. To do this, we make a major assumption: Our opponent will make the best possible move they can. Intuitively, if we give our opponent the benefit of the doubt, then we can t be surprised by any move they make. The best possible move for our opponent is the worst possible move for us. To figure out the values of the blue moves, we recursively figure out the values of the moves below them in the game tree. There are two cases: If it s our turn, then we ll take the best possible move we can. In other words, we take the maximum value of the children s values. If it s our opponent s turn, then they will give us the worst possible move they can. In other words, we take the minimum value of the children s values. So, on the lines labeled, we take the maximum of the moves below, and on the lines labeled O s Turn, we take the minimum of the moves below. The filled in game tree looks like this (from s perspective): must choose one of these moves Unfortunately for us, since all of the choices we have are, it means no matter what we do, a perfect opponent can always force us to lose this game of Tic-Tac-Toe. If we follow the s down the game tree, we can see the moves in every case that make us lose. 2

Minimax The idea we just used to fill in the Tic-Tac-Toe board is a general one called Minimax. irst, we describe the general algorithm, and then we get into some important changes that must be implemented in practice. The Algorithm 1 int minimax(position p) { 3 // evaluate tells us the // value of the current 5 // position 6 return p.evaluate(); 7 } 8 9 int bestvalue = ; 1 for (move in p.getmoves()) { 11 p.applymove(move); 12 int value = minimax(p); 13 p.undomove(); 1 if (value > bestvalue) { 15 bestvalue = value; 16 } 17 } 18 } This really is the same algorithm that we describe on the previous page. Notice the in front of the recursive call. This is because the move after us is our opponent who is attempting to do the opposite thing from us. Mathematically, this works because max(a, b) = min( a, b) When writing a bot to play a game, we d also need to keep track of the actual move corresponding to the best score. This involves a small addition to the if statement where we update the best score. Notice that we re only interested in the very next move. We re using the future move to help us understand the next move better. The version of the algorithm we ve described here is usually called negamax, because it uses this property to reduce code redundancy. Using Minimax in a Real Game Since our goal is to ultimately implement a chess bot, let s do some back-of-the-hand calculations on a chess game. The branching factor of a tree is the number of children a node has. Since some positions in chess have more moves than others, we work with the average branching factor instead. It turns out in chess, the average branching factor is approximately 35. The average chess game lasts approximately moves. Putting these numbers together, we would need to evaluate at least 35 5.8 1 61 leaves to use this method in a real chess game. If we were able to evaluate 1 trillion leaves per second, it would take at least 1 8 seconds (which is more than 1 3 times the number of seconds the universe has existed). This is clearly infeasible. So, in the real world, instead of evaluating all the way down to the leaves, we estimate the leaves by going several moves ahead. Although this is less accurate, it s the best we can do. The only change this makes to the code is to add a second parameter depth and change our base case to depth == in addition to checking for a leaf. Unfortunately, this also makes our evaluation function more complicated, because we must estimate how good a position is without knowing if it actually leads to a win. A natural question to ask is how many levels ahead can we look (we call these ply). ou will determine this yourself experimentally on the homework, but the best chess bots in the world can look about 2 ply ahead; you should expect your bot to be able to do a few less than half of that. To review, in the real world... We only look a few moves ahead instead of going to the end of the game The evaluation function takes on a much larger range of numbers than just,, and 1, because we re less sure of the value of the position. In p3, you will be provided with a reasonable evaluation function. ou may edit it if you like, but it s not required. our bot will be given three minutes for each game (and it will gain two seconds every time it makes a move). This is much less time than it sounds like it is. 3

Parallel Minimax The Algorithm 1 int minimax(position p) { 3 return p.evaluate(); } 5 6 int bestvalue = ; 7 parallel (move in p.getmoves()) { 8 p = p.copy(); 9 int value = minimax(p); 1 if (value > bestvalue) { 11 bestvalue = value; 12 } 13 } 1 } Minimax is a naturally parallelizable algorithm. Each node of the game tree can be run on independent threads. Even though the algorithm is very similar there are a couple of gotchas: Since different threads will be working at the same time, they can t share one position. This means you ll need to copy the position for each thread. As always, you ll want to have a cutoff. our cutoff should be in terms of the depth remaining of the tree. Make sure you use divide and conquer to get the threads running as quickly as possible. Alpha-Beta Pruning Alpha-beta Pruning is a more efficient version of Minimax that avoids considering branches of the game tree that are irrelevant. Before getting too deep into the algorithm, it is very important to note that a correct Alpha-beta Search will return the same answer as Minimax. In other words, it is not an approximation algorithm, it only ignores moves that cannot change the answer. What might such a move look like Consider the following: 1 1 6 3 Suppose that we ve gone through most of the game tree and evaluated the first three leaves. The question that remains is is it possible that is a better move than the 1 If it is, then we have to evaluate ; otherwise, we don t have to waste the time. It turns out to not matter, here s why: If 3, then Min would choose the 3; so, = 3. But, this is less than the 1 we can already get. If < 3, then Min would choose ; so, < 3. But, this is less than the 1 we can already get. More succinctly, because = min(3, ), we know that 3 which is less than another move we already found. It follows that we can ignore this last value. This sort of bounding argument can be very powerful. Let s consider another game tree where we write down all of the bounds as we go: Z Z A A B Z B C E A 3 B 5 C 2 25 E 1 3 3 5 2 The idea is that as we fill in these values, if we find one that contradicts a bound, we can stop looking in that subtree. Before looking at the next page, try to figure out which leaves we don t need to evaluate.

We evaluate 3,, and 5; then, we notice that, but we hit a 5 which violates the condition. So, we cut off the rest of that subtree. Z Z B Z B C E A 3 B 5 C 2 25 E 1 3 3 5 2 We evaluate 2 then ; then, we notice that min can force a if we choose. So, we can cut off the rest of that subtree. Z Z B Z E A 3 B 5 C 2 25 E 1 3 3 5 2 inally, we evaluate 1 and 2 and notice that that gives us a cutoff for the remaining subtree. Z B Z 2 A 3 B 5 C 2 25 E 1 3 3 5 2 Notice, again, that, if we were to evaluate the whole tree (via minimax), we would get the same answer. urthermore, we are able to make these cutoffs both as the min player and the max player. In code, a cleaner way of dealing with the inequalities is as a valid range. The Min player makes the upper bound smaller and the Max player makes the lower bound bigger. Alpha-beta pruning gets its name from this idea: we call the lower bound α and the upper bound β and we provide them as arguments. Whenever β α, we cut off. Notice how nodes on max levels only propagate up β and nodes on min levels only propagate up α: [, ] [, ] [, ] [, ] α = [, ] [, 2] [, ] β = [, ] β = 2 [, ] β = 2 [3, ] [5, ] [, ] [, ] 3 5 2 5

inally, we can describe the actual algorithm. The Algorithm 1 int alphabeta(position p, int alpha, int beta) { 3 return p.evaluate(); } 5 6 for (move in p.getmoves()) { 7 p.applymove(move); 8 int value = alphabeta(p, beta, alpha); 9 p.undomove(); 1 11 // If value is between alpha and beta, we've 12 // found a new lower bound 13 if (value > alpha) { 1 alpha = value; 15 } 16 17 // If the value is bigger than beta, we won't 18 // actually be able to get this move 19 if (alpha >= beta) { 2 return alpha; 21 } 22 } 23 2 // Return the best achievable value 25 return alpha; 26 } Again, we re using the special properties of min and max to make the code cleaner. This time, when we switch from min to max, we swap the upper and lower bounds as well. It s also important to notice that the best move value is alpha; we re not keeping track of another value in addition to alpha. We strongly recommend running through the algorithm on your own in the above tree before attempting to code it up. Alphabeta is deceptively complicated! Move Ordering Because alphabeta attempts to prune as many nodes as possible based on which nodes it evaluates, the order that you visit the moves in matters substantially. The assignment does not require that you do any interesting move ordering, but in both alphabeta and jamboree (see next section), if you apply move ordering, your performance will be substantially better. Parallel Alpha-Beta Pruning After you have alphabeta working, you will write a parallel version. Unfortunately, unlike minimax, alphabeta is not naturally parallelizable. In particular, if we attempt to parallelize the loop, we will be unable to propogate the new alpha and beta values to each iteration. This would result in us evaluating unnecessary parts of the tree. In practice, however, it turns out that that this is an acceptable loss, because the parallelism still gives us an overall benefit. So, our general strategy (a variant of an algorithm called Jamboree) is the following. Evaluate x of the moves sequentially to get reasonable alpha/beta values that will enable us to cut out large parts of the tree. Evaluate the remaining moves in parallel. This means we will evaluate some unnecessary moves, but, in practice, it s worth it. Then, the algorithm looks something like the following: 6

The Algorithm 1 PERCENTAGE_SEQUENTIAL =.5; 2 int jamboree(position p, int alpha, int beta) { 3 if (p is a leaf) { return p.evaluate(); 5 } 6 7 moves = p.getmoves(); 8 9 for (i = ; i < PERCENTAGE_SEQUENTIAL * moves.length; i++) { 1 p.applymove(moves[i]); 11 int value = jamboree(p, beta, alpha); 12 p.undomove(); 13 1 if (value > alpha) { 15 alpha = value; 16 } 17 if (alpha >= beta) { 18 return alpha; 19 } 2 } 21 22 parallel (i = PERCENTAGE_SEQUENTIAL * moves.length; i < moves.length; i++) { 23 p = p.copy(); 2 int value = jamboree(p, beta, alpha); 25 26 if (value > alpha) { 27 alpha = value; 28 } 29 if (alpha >= beta) { 3 return alpha; 31 } 32 } 33 3 return alpha; 35 } This algorithm has a lot of important constants to tweak which make a big difference: PERCENTAGE_SEQUENTIAL can make a big difference. ou should play with the value until you find a good one. There will also be a sequential cutoff like normal which is not the same as PERCENTAGE_SEQUENTIAL. As with all the other algorithms, you will need to choose a depth to go to. This algorithm should get further than any of the others. Make sure that your sequential cut-off does not recursively call the parallel version. If you accidentally do that, performance will degrade substantially. As with the other parallel algorithm, it is important to figure out when you should copy the board vs. just undoing the move. 7