CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

Similar documents
CS 387: GAME AI BOARD GAMES

CS 387/680: GAME AI BOARD GAMES

Game-playing: DeepBlue and AlphaGo

Monte Carlo Tree Search

Andrei Behel AC-43И 1

More on games (Ch )

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

More on games (Ch )

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1

TTIC 31230, Fundamentals of Deep Learning David McAllester, April AlphaZero

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Artificial Intelligence. Minimax and alpha-beta pruning

CS 387: GAME AI BOARD GAMES. 5/24/2016 Instructor: Santiago Ontañón

Artificial Intelligence Adversarial Search

Adversarial Reasoning: Sampling-Based Search with the UCT algorithm. Joint work with Raghuram Ramanujan and Ashish Sabharwal

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

CS 380: ARTIFICIAL INTELLIGENCE

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón

CS-E4800 Artificial Intelligence

How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997)

Monte Carlo Tree Search. Simon M. Lucas

CS 188: Artificial Intelligence

Foundations of Artificial Intelligence

CS 188: Artificial Intelligence

Adversary Search. Ref: Chapter 5

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Adversarial Search: Game Playing. Reading: Chapter

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

ARTIFICIAL INTELLIGENCE (CS 370D)

CS 331: Artificial Intelligence Adversarial Search II. Outline

CSC321 Lecture 23: Go

Programming Project 1: Pacman (Due )

Game Playing: Adversarial Search. Chapter 5

School of EECS Washington State University. Artificial Intelligence

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Adversarial Search Lecture 7

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm

Game-Playing & Adversarial Search

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

Adversarial Search 1

Artificial Intelligence

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

Computer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta

CS 188: Artificial Intelligence Spring Announcements

CS 771 Artificial Intelligence. Adversarial Search

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

AlphaGo and Artificial Intelligence GUEST LECTURE IN THE GAME OF GO AND SOCIETY

Artificial Intelligence

A Bandit Approach for Tree Search

mywbut.com Two agent games : alpha beta pruning

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

Foundations of Artificial Intelligence

CS 188: Artificial Intelligence. Overview

CS 5522: Artificial Intelligence II

Foundations of AI. 6. Board Games. Search Strategies for Games, Games with Chance, State of the Art

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Foundations of Artificial Intelligence

Lecture 33: How can computation Win games against you? Chess: Mechanical Turk

Artificial Intelligence Search III

CS 4700: Foundations of Artificial Intelligence

43.1 Introduction. Foundations of Artificial Intelligence Introduction Monte-Carlo Methods Monte-Carlo Tree Search. 43.

Lecture 5: Game Playing (Adversarial Search)

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

CS 680: GAME AI WEEK 4: DECISION MAKING IN RTS GAMES

Aja Huang Cho Chikun David Silver Demis Hassabis. Fan Hui Geoff Hinton Lee Sedol Michael Redmond

An AI for Dominion Based on Monte-Carlo Methods

Artificial Intelligence

CS 229 Final Project: Using Reinforcement Learning to Play Othello

Game Playing State-of-the-Art

Artificial Intelligence

Game-Playing & Adversarial Search Alpha-Beta Pruning, etc.

game tree complete all possible moves

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

CS 188: Artificial Intelligence Spring 2007

CPS 570: Artificial Intelligence Two-player, zero-sum, perfect-information Games

Adversarial Search (Game Playing)

Multiple Agents. Why can t we all just get along? (Rodney King)

Game Playing AI. Dr. Baldassano Yu s Elite Education

More Adversarial Search

Games and Adversarial Search

Adversarial Search and Game Playing. Russell and Norvig: Chapter 5

Foundations of Artificial Intelligence Introduction State of the Art Summary. classification: Board Games: Overview

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

CS 188: Artificial Intelligence Spring Game Playing in Practice

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Adversarial Search and Game Playing

Game Playing AI Class 8 Ch , 5.4.1, 5.5

Adversarial search (game playing)

Adversarial Search. Robert Platt Northeastern University. Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA

Computing Science (CMPUT) 496

Transcription:

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH Santiago Ontañón so367@drexel.edu

Recall: Adversarial Search Idea: When there is only one agent in the world, we can solve problems using DFS, BFS, ID, A*, etc. When more than one agent, we need adversarial search (minimax) Last week we saw: Minimax: Systematically expands a game tree (can bound depth) Apply eval function at leaves, and back-propagates, to find best action. Alpha-beta: Improvement over minimax to expand less nodes. Expectiminimax: Extend minimax with chance nodes when we have nondeterminisim Can still use Alpha-beta pruning.

Minimax in Practice Minimax (with alpha-beta pruning) can be used to create AI that plays games like Checkers or Chess. Problem is there are games that humans can play that are too complex to Minimax. For example: Go

Go Board is 19x19 Branching factor: Starts at 361 Decreases (more or less) in 1 after every move Compare Go and Chess: Chess, branching factor around 35: Search at depth 6: 37,515,625 nodes Go, branching factor around 300: Search at depth 6: 729,000,000,000,000 nodes What can we do?

Monte Carlo Methods Algorithms that rely on random sampling to find solution approximations. Example: Monte Carlo integration Imagine that I ask you to compute the following value: A = Z 3 1 sin(x) 1 1 x 2 dx

Monte Carlo Methods Method 1: Symbolic integration You could fetch your calculus book, integrate the function, etc. But this method you ll have to do by hand (did you know that automatic symbolic integration is still unsolved?) Method 2: Numerical computations Simpson method, etc. (recall from calculus?) Method 3: Monte Carlo

Monte Carlo Methods Method 3: Monte Carlo Repeat N times: Pick a random x between 1 and 3 Evaluate f(x) Now do the average and multiply by 2 (i.e. 3 1) Voilà! f(x) =sin(x) 1 1 x 2 dx The larger N, the better the approximation

Monte Carlo Methods Idea: Use random sampling to approximate the solution to complex problems How can we apply this idea to adversarial search? The answer to this question is the responsible for having computer programs that can play Go at master level.

Minimax vs Monte Carlo Search Minimax: Monte-Carlo: U U U U U U U U U U U U U U U U Minimax opens the complete tree (all possible moves) up to a fixed depth. Then, an evaluation function is applied to the leaves.

Minimax vs Monte Carlo Search Minimax: Monte-Carlo: U U U U U U U U U U U U U U U U

Minimax vs Monte Carlo Search Minimax: Monte-Carlo: U U U U U U U U U U U U U U U U Monte-Carlo search runs for each possible move at the root node a fixed number K of random complete games. No need for a Utility function (but it can be used), Complete Game

Monte Carlo Search For each possible move: Repeat n times: Play a game until the end, selecting moves at random Count the percentage of wins Select the action with the highest percentage of wins. Properties: Complete:? Optimal:? Time:? Memory:? Works much better than minimax for large games, but has many problems. We can do much better.

Monte Carlo Search For each possible move: Repeat n times: Play a game until the end, selecting moves at random Count the percentage of wins Select the action with the highest percentage of wins. Properties: Complete: no Optimal: no Time: d*n Memory: b Works much better than minimax for large games, but has many problems. We can do much better.

Monte Carlo Tree Search Tree Search 0/0 Current State Monte-Carlo Search Current state w/t is the account of how many games starting from this state have be found to be won out of the total games explored in the current search

Monte Carlo Tree Search Tree Search 0/0 0/0 Monte-Carlo Search At each iteration, one node of the tree (upper part) is selected and expanded (one node added to the tree). From this new node a complete game is played out at random (Monte-Carlo)

Monte Carlo Tree Search Tree Search 0/1 0/1 Monte-Carlo Search This is called a playout loss At each iteration, one node of the tree (upper part) is selected and expanded (one node added to the tree). From this new node a complete game is played out at random (Monte-Carlo)

Monte Carlo Tree Search Tree Search 1/2 1/1 0/1 Monte-Carlo Search At each iteration, one node of the tree (upper part) is selected and expanded (one node added to the tree). From this new node a complete game is played out at random (Monte-Carlo) win

Monte Carlo Tree Search Tree Search 2/3 2/2 0/1 1/1 Monte-Carlo Search The counts w/t are used to determine which nodes to explore next. Exploration/Exploitation, e.g: 1) Some probability of expanding the best node 2) Some probability of expanding one at random win

Monte Carlo Tree Search Tree Search 2/3 2/2 0/1 1/1 Monte-Carlo Search The counts w/t are used to determine which nodes to explore next. Exploration/Exploitation, e.g: 1) Some probability of expanding the best node 2) Some probability of expanding one at random win As we will see, we want to expand the best node with higher probability than any of the others

Monte Carlo Tree Search Tree Search 2/4 2/3 0/1 1/1 0/1 Monte-Carlo Search The tree ensures all relevant actions are explored (greatly alleviates the randomness that affects Monte-Carlo methods) loss

Monte Carlo Tree Search Tree Search 2/4 2/3 0/1 1/1 0/1 Monte-Carlo Search loss The random games played from each node of the tree serve to estimate the Utility function. They can be random, or use an opponent model (if available)

MCTS Algorithm MCTS(state, player) tree = new Node(state, player) Repeat until computation budget is exhausted node = treepolicy(tree) if (node.isterminal) child = node else child = node.nextchild(); R = playout(child) child.propagatereward(r) Return tree.bestchild();

Monte Carlo Tree Search Question is how to choose the next node to be added to the tree? Start at the root. Descend the tree choosing actions according to the current probability estimates. (Assume a uniform probability distribution for anything you haven t seen before) Add to the tree the first node that you reach that isn t already in it. Or we could use something other than just the current probability estimates. This looks like the multi armed bandit problem.

Tree Policy: ε-greedy Given a list of children, which one do we explore in a given iteration of MCTS? Ideally, we want: To spend more time exploring good children (no point wasting time on bad children) But spend some time exploring bad children, just in case they are actually good (since evaluation is stochastic, it might happen that we were just unlucky, and a child we thought was bad is actually good). Simplest idea: ε-greedy With probability 1-ε: choose the current best With probability ε: choose one at random

Monte Carlo Tree Search

Which is the best child? Tree Search 60/100 55/85 1/10 4/5 Monte-Carlo Search This one only wins about 65% of the time, but we have sampled it 85 times. This one seems to fin 80% of the times, but we only have sampled it 5 times.

Which is the best child? Tree Search 60/100 This one is safer (we cannot be sure the other one is good, unless we sample it more times) 55/85 1/10 4/5 Monte-Carlo Search This one only wins about 65% of the time, but we have sampled it 85 times. This one seems to fin 80% of the times, but we only have sampled it 5 times.

Monte Carlo Tree Search After a fixed number of iterations K (or after the assigned time is over), MCTS analyzes the resulting tree, and the selected action is that with the highest win ratio (or that with the highest visit count). MCTS algorithms do not explore the whole game tree: They sample the game tree They spend more time in those moves that are more promising Any-time algorithms (they can be stopped at any time) It can be shown theoretically that when K goes to infinity, the values assigned to each action in the MCTS tree converge to those computed by minimax. MCTS algorithms are the standard algorithms for modern Go playing programs

Tree Policy: Can we do Better? We just learned the ε-greedy policy, but is there a way to do better? ε-greedy is robust, but, for example: If there are 3 children: A: 40/100 B: 39/100 C: 2/100

UCB1 Upper Confidence Bounds Better balance between exploration and exploitation UCB value for a given arm i: v i + C ln N n i Number of coins used so far in total Expected reward so far Number of coins used in this arm so far

UCB1 Upper Confidence Bounds Better balance between exploration and exploitation UCB value for a given arm i: v i + C ln N n i This is high for arms that we believe to be good. This is high for arms that we have not explored much yet

UCT UCT(state, player) tree = new Node(state, player) Repeat until computation budget is exhausted node = selectnodeviaucb1(tree) if (node.isterminal) child = node else child = node.nextchild(); R = playout(child) child.propagatereward(r) Return tree.bestchild();

UCT UCT(state, player) tree = new Node(state, player) Repeat until computation budget is exhausted node = selectnodeviaucb1(tree) if (node.isterminal) child = node R = playout(child) child.propagatereward(r) Return tree.bestchild(); else child = node.nextchild(); This means using UCB1 at the root to select a child, and then recursively repeat, until we reach a leaf.

Monte Carlo Tree Search After a fixed number of iterations K (or after the assigned time is over), MCTS analyzes the resulting trees, and the selected action is that with the highest win ratio (or that with the highest visit count). MCTS algorithms do not explore the whole game tree: They sample the game tree They spend more time in those moves that are more promising Any-time algorithms (they can be stopped at any time) It can be shown theoretically that when K goes to infinity, the values assigned to each action in the MCTS tree converge to those computed by minimax. MCTS is the standard algorithm for modern Go playing programs. Uses UCT algorithm to decide the explore vs. exploit question.

Games using MCTS Variants Go playing programs: AlphaGo MoGo CrazyStone Valkyria Pachi Fuego The Many Faces of Go Zen

Games using MCTS Variants Card Games: Prismata

Games using MCTS Variants Strategy Games: TOTAL WAR: ROME II: http://aigamedev.com/open/coverage/mcts-rome-ii/

AlphaGo Google s AlphaGo defeated Lee Sedol in 2016, and Ke Jie in May 2017 How? Integrating MCTS with deep convolutional neural networks. Data set of 30 million position from the KGS Go Server Train a collection of neural networks to predict the probability of each move in the dataset and the expected value of a given board. Use the neural networks to inform the MCTS search.

AlphaGo 4 Deep Neural Networks trained: Integrated into MCTS: p p p v - Trained via Supervised Learning from 30million positions from the KGS Go server. - Predicts expert moves with 57% accuracy. - Simplification of (runs faster) - Predicts expert moves with 24.2% accuracy p p - Starts with and improves it via self-play (reinforcement learning) - 80% win ratio against p - 85% win ratio against Pachi (MCTS with 100K rollouts) - Predicts the winner given a position - Trained from 30million positions using p - Almost as accurate as rollouts with 15K times less CPU.

AlphaGo 4 Deep Neural Networks trained: Integrated into MCTS: p p p v - Trained via Supervised Learning from 30million positions from the KGS Go server. - Predicts expert moves with 57% accuracy. - Simplification of (runs faster) - Predicts expert moves with 24.2% accuracy p p - Starts with and improves it via self-play (reinforcement learning) - 80% win ratio against p - 85% win ratio against Pachi (MCTS with 100K rollouts) - Predicts the winner given a position - Trained from 30million positions using p - Almost as accurate as rollouts with 15K times less CPU. used to bias sampling of children during MCTS. Evaluation is the average of v, and a rollout with p.

Adversarial Search Summary Useful when more than one agent in the world Search on a: Game Tree Max layers: for our moves Min layers: for opponent s moves Average layers: for chance elements (e.g. dice rolls) Algorithms: Minimax (or expectiminimax) Alpha-beta Monte Carlo Search Monte Carlo Tree Search