More on games (Ch )

Similar documents
More on games (Ch )

Announcements. Homework 1 solutions posted. Test in 2 weeks (27 th ) -Covers up to and including HW2 (informed search)

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

game tree complete all possible moves

ARTIFICIAL INTELLIGENCE (CS 370D)

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1

mywbut.com Two agent games : alpha beta pruning

Game-playing: DeepBlue and AlphaGo

Adversary Search. Ref: Chapter 5

CS 387: GAME AI BOARD GAMES

2 person perfect information

Adversarial Search 1

Adversarial Search and Game Playing. Russell and Norvig: Chapter 5

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

Ar#ficial)Intelligence!!

Playing Games. Henry Z. Lo. June 23, We consider writing AI to play games with the following properties:

Game Engineering CS F-24 Board / Strategy Games

CPS331 Lecture: Search in Games last revised 2/16/10

CMPUT 396 Tic-Tac-Toe Game

Artificial Intelligence

Artificial Intelligence

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13

CS 387/680: GAME AI BOARD GAMES

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Foundations of Artificial Intelligence

CS188 Spring 2014 Section 3: Games

Game-playing AIs: Games and Adversarial Search I AIMA

Foundations of Artificial Intelligence

CS 331: Artificial Intelligence Adversarial Search II. Outline

More Adversarial Search

CS 4700: Artificial Intelligence

Artificial Intelligence Search III

CS510 \ Lecture Ariel Stolerman

2/5/17 ADVERSARIAL SEARCH. Today. Introduce adversarial games Minimax as an optimal strategy Alpha-beta pruning Real-time decision making

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

CS885 Reinforcement Learning Lecture 13c: June 13, Adversarial Search [RusNor] Sec

Data Structures and Algorithms

Game-Playing & Adversarial Search

Foundations of Artificial Intelligence

Programming Project 1: Pacman (Due )

Foundations of AI. 6. Board Games. Search Strategies for Games, Games with Chance, State of the Art

CS221 Project Final Report Gomoku Game Agent

Adversarial Reasoning: Sampling-Based Search with the UCT algorithm. Joint work with Raghuram Ramanujan and Ashish Sabharwal

Project 1. Out of 20 points. Only 30% of final grade 5-6 projects in total. Extra day: 10%

Contents. Foundations of Artificial Intelligence. Problems. Why Board Games?

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

CS188 Spring 2010 Section 3: Game Trees

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

Artificial Intelligence Adversarial Search

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1

CPS 570: Artificial Intelligence Two-player, zero-sum, perfect-information Games

Adversarial Search. CMPSCI 383 September 29, 2011

Adversarial Search Aka Games

School of EECS Washington State University. Artificial Intelligence

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Monte Carlo Tree Search

Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning

Artificial Intelligence

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

Game Playing Part 1 Minimax Search

Today. Types of Game. Games and Search 1/18/2010. COMP210: Artificial Intelligence. Lecture 10. Game playing

Game Playing AI Class 8 Ch , 5.4.1, 5.5

Games (adversarial search problems)

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Game Playing: Adversarial Search. Chapter 5

Adversarial Search (Game Playing)

Learning from Hints: AI for Playing Threes

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5

CSC 380 Final Presentation. Connect 4 David Alligood, Scott Swiger, Jo Van Voorhis

Game Playing Beyond Minimax. Game Playing Summary So Far. Game Playing Improving Efficiency. Game Playing Minimax using DFS.

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage

CS 771 Artificial Intelligence. Adversarial Search

Adversarial Search: Game Playing. Reading: Chapter

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

Artificial Intelligence

Artificial Intelligence. Minimax and alpha-beta pruning

Foundations of AI. 5. Board Games. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard and Luc De Raedt SA-1

CS188 Spring 2010 Section 3: Game Trees

Adversarial Search and Game Theory. CS 510 Lecture 5 October 26, 2017

CITS3001. Algorithms, Agents and Artificial Intelligence. Semester 2, 2016 Tim French

CS-E4800 Artificial Intelligence

Pengju

CS 188: Artificial Intelligence Spring Announcements

Solving Problems by Searching: Adversarial Search

Multiple Agents. Why can t we all just get along? (Rodney King)

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

Five-In-Row with Local Evaluation and Beam Search

Artificial Intelligence. 4. Game Playing. Prof. Bojana Dalbelo Bašić Assoc. Prof. Jan Šnajder

Monte Carlo tree search techniques in the game of Kriegspiel

Computing Science (CMPUT) 496

Rules of the game. chess checkers tic-tac-toe...

Transcription:

More on games (Ch. 5.4-5.6)

Announcements Midterm next Tuesday: covers weeks 1-4 (Chapters 1-4) Take the full class period Open book/notes (can use ebook) ^^ No programing/code, internet searches or friends Exam is in this room I will provide paper

Alpha-beta pruning Let's solve this with alpha-beta pruning L F R 2 L R L R 1 3 0 4

Alpha-beta pruning max( min(1,3), 2, min(0,??) ) = 2, should pick Order: action F 1 st. Red 2 nd. Blue 3 rd. Purp 2 L F R Do not consider 1 2 0 L R L R 1 3 0 4

Alpha-beta pruning Pruning in Alpha-beta algorithm: min node: if parent's current choice greater On : if < parent max node: if parent's current choice less On : if > parent (i.e. On : if parent < )

Alpha-beta pruning range for node \ranton I think the book is confusing about alpha-beta, especially Figure 5.5 alpha (sort of) beta (sort of)

αβ pruning L F R 2 L 3 F 1 R 2 L 4 F 8 2 R L F R Solve this problem with alpha-beta pruning: 10 20 4 14 L F 5 R

Alpha-beta pruning In general, alpha-beta pruning allows you to search to a depth 2d for the minimax search cost of depth d So if minimax needs to find: O(b m ) Then, alpha-beta searches: O(b m/2 ) This is exponentially better, but the worst case is the same as minimax

Alpha-beta pruning Ideally you would want to put your best (largest for max, smallest for min) actions first This way you can prune more of the tree as a min node stops more often for larger best Obviously you do not know the best move, (otherwise why are you searching?) but some effort into guessing goes a long way (i.e. exponentially less states)

Side note: In alpha-beta pruning, the heuristic for guess which move is best can be complex, as you can greatly effect pruning While for A* search, the heuristic had to be very fast to be useful (otherwise computing the heuristic would take longer than the original search)

Alpha-beta pruning This rule of checking your parent's best/worst with the current value in the child only really works for two player games... What about 3 player games?

3-player games For more than two player games, you need to provide values at every state for all the players When it is the player's turn, they get to pick the action that maximizes their own value the most (We will assume each agent is greedy and only wants to increase its own score... more on this next time)

3-player games (The node number shows who is max-ing) 1 What should player 1 do? What can you prune? 4,3,3 2 2 3 3 3 3 0,0,10 1,8,1 4,6,0 7,1,2 1,1,8 4,1,5 1 7,2,1 4,2,4 1,3,6 3,3,4

3-player games How would you do alpha-beta pruning in a 3-player game?

3-player games How would you do alpha-beta pruning in a 3-player game? TL;DR: Not easily (also you cannot prune at all if there is no range on the values even in a zero sum game) This is because one player could take a very low score for the benefit of the other two

Mid-state evaluation So far we assumed that you have to reach a terminal state then propagate backwards (with possibly pruning) More complex games (Go or Chess) it is hard to reach the terminal states as they are so far down the tree (and large branching factor) Instead, we will estimate the value minimax would give without going all the way down

Mid-state evaluation By using mid-state evaluations (not terminal) the best action can be found quickly These mid-state evaluations need to be: 1. Based on current state only 2. Fast (and not just a recursive search) 3. Accurate (represents correct win/loss rate) The quality of your final solution is highly correlated to the quality of your evaluation

Mid-state evaluation For searches, the heuristic only helps you find the goal faster (but A* will find the best solution as long as the heuristic is admissible) There is no concept of admissible mid-state evaluations... and there is almost no guarantee that you will find the best/optimal solution For this reason we only apply mid-state evals to problems that we cannot solve optimally

Mid-state evaluation A common mid-state evaluation adds features of the state together (we did this already for a heuristic...) eval( )=20 We summed the distances to the correct spots for all numbers

Mid-state evaluation We then minimax (and prune) these mid-state evaluations as if they were the correct values You can also weight features (i.e. getting the top row is more important in 8-puzzle) A simple method in chess is to assign points for each piece: pawn=1, knight=4, queen=9... then sum over all pieces you have in play

Mid-state evaluation What assumptions do you make if you use a weighted sum?

Mid-state evaluation What assumptions do you make if you use a weighted sum? A: The factors are independent (non-linear accumulation is common if the relationships have a large effect) For example, a rook & queen have a synergy bonus for being together is non-linear, so queen=9, rook=5... but queen&rook = 16

Mid-state evaluation There is also an issue with how deep should we look before making an evaluation?

Mid-state evaluation There is also an issue with how deep should we look before making an evaluation? A fixed depth? Problems if child's evaluation is overestimate and parent underestimate (or visa versa) Ideally you would want to stop on states where the mid-state evaluation is most accurate

Mid-state evaluation Mid-state evaluations also favor actions that put off bad results (i.e. they like stalling) In go this would make the computer use up ko threats rather than give up a dead group By evaluating only at a limited depth, you reward the computer for pushing bad news beyond the depth (but does not stop the bad news from eventually happening)

Mid-state evaluation It is not easy to get around these limitations: 1. Push off bad news 2. How deep to evaluate? A better mid-state evaluation can help compensate, but they are hard to find They are normally found by mimicking what expert human players do, and there is no systematic good way to find one

Forward pruning You can also use mid-state evaluations for alpha-beta type pruning However as these evaluations are estimates, you might prune the optimal answer if the heuristic is not perfect (which it won't be) In practice, this prospective pruning is useful as it allows you to prioritize spending more time exploring hopeful parts of the search tree

Forward pruning You can also save time searching by using expert knowledge about the problem For example, in both Go and Chess the start of the game has been very heavily analyzed over the years There is no reason to redo this search every time at the start of the game, instead we can just look up the best response

Random games If we are playing a game of chance, we can add chance nodes to the search tree Instead of either player picking max/min, it takes the expected value of its children This expected value is then passed up to the parent node which can choose to min/max this chance (or not)

Random games Here is a simple slot machine example: pull don't pull chance node 0-1 100 V(chance) =

Random games You might need to modify your mid-state evaluation if you add chance nodes Minimax just cares about the largest/smallest, but expected value is an implicit average:.9.9.1.1 1 4 2 2 R is better.9.9.1.1 1 40 2 2 L is better

Random games Some partially observable games (i.e. card games) can be searched with chance nodes As there is a high degree of chance, often it is better to just assume full observability (i.e. you know the order of cards in the deck) Then find which actions perform best over all possible chance outcomes (i.e. all possible deck orderings)

Random games For example in blackjack, you can see what cards have been played and a few of the current cards in play You then compute all possible decks that could lead to the cards in play (and used cards) Then find the value of all actions (hit or stand) averaged over all decks (assumed equal chance of possible decks happening)

Random games If there are too many possibilities for all the chance outcomes to average them all, you can sample This means you can search the chance-tree and just randomly select outcomes (based on probabilities) for each chance node If you have a large number of samples, this should converge to the average

MCTS This idea of sampling a limited part of the tree to estimate values is common and powerful In fact, in monte-carlo tree search there are no mid-state evaluations, just samples of terminal states This means you do not need to create a good mid-state evaluation function, but instead you assume sampling is effective (might not be so)

MCTS MCTS has four steps: 1. Find the action which looks best (selection) 2. Add this new action sequence to a tree 3. Play randomly until over 4. Update how good this choice was

MCTS How to find which actions are good? The Upper Confidence Bound applied to Trees UCT is commonly used: This ensures a trade off between checking branches you haven't explored much and exploring hopeful branches ( https://www.youtube.com/watch?v=fbs4lngls8m )

MCTS???

MCTS 0/0 0/0 0/0 0/0

MCTS 0/0 0/0 0/0 0/0 (parent)

MCTS UCB value 0/0 0/0 0/0 0/0 Pick max (I'll pick left-most)

MCTS 0/0 0/0 0/0 0/0 (random playout) lose

MCTS 0/1 0/1 0/0 0/0 update (all the way to root) (random playout) lose

MCTS 0/1 0 0/1 0/0 0/0 update UCB values (all nodes)

MCTS select max UCB & rollout 0/1 0 0/1 0/0 0/0 win

MCTS update statistics 1/2 0 0/1 1/1 0/0 win

MCTS update UCB vals 1/2 1.1 0/1 2.11/1 0/0

MCTS select max UCB & rollout 1/2 1.1 0/1 2.11/1 0/0 lose

MCTS update statistics 1/3 1.1 0/1 2.11/1 0/1 lose

MCTS update UCB vals 1/3 1.4 0/1 2.51/1 1.40/1

MCTS select max UCB 1/3 1.4 0/1 2.51/1 1.40/1 0/0 0/0

MCTS rollout 1/3 1.4 0/1 2.51/1 1.40/1 0/0 0/0 win

MCTS update statistics 2/4 1.4 0/1 2.52/2 1.40/1 1/1 0/0 win

MCTS update UCB vals 2/4 1.7 0/1 2.12/2 1.70/1 2.2 1/1 0/0

MCTS