Monte Carlo Tree Search. Simon M. Lucas

Similar documents
CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

A Bandit Approach for Tree Search

Adversarial Reasoning: Sampling-Based Search with the UCT algorithm. Joint work with Raghuram Ramanujan and Ashish Sabharwal

Game-playing: DeepBlue and AlphaGo

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

CS 387: GAME AI BOARD GAMES

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage

Computer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta

More on games (Ch )

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Playout Search for Monte-Carlo Tree Search in Multi-Player Games

CSC321 Lecture 23: Go

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

Adversarial Search (Game Playing)

ARTIFICIAL INTELLIGENCE (CS 370D)

A Study of UCT and its Enhancements in an Artificial Game

Foundations of Artificial Intelligence

Artificial Intelligence. Minimax and alpha-beta pruning

CS 387/680: GAME AI BOARD GAMES

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1

Monte Carlo Tree Search

More on games (Ch )

MONTE-CARLO TWIXT. Janik Steinhauer. Master Thesis 10-08

Artificial Intelligence

Monte-Carlo Tree Search for the Simultaneous Move Game Tron

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13

TTIC 31230, Fundamentals of Deep Learning David McAllester, April AlphaZero

Exploration exploitation in Go: UCT for Monte-Carlo Go

Learning from Hints: AI for Playing Threes

Monte-Carlo Tree Search Enhancements for Havannah

CS 229 Final Project: Using Reinforcement Learning to Play Othello

Feature Learning Using State Differences

Generalized Game Trees

Virtual Global Search: Application to 9x9 Go

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

CS-E4800 Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence

Adversarial Search Aka Games

43.1 Introduction. Foundations of Artificial Intelligence Introduction Monte-Carlo Methods Monte-Carlo Tree Search. 43.

Generalized Rapid Action Value Estimation

Recent Progress in Computer Go. Martin Müller University of Alberta Edmonton, Canada

CS 771 Artificial Intelligence. Adversarial Search

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

Artificial Intelligence Search III

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Adversary Search. Ref: Chapter 5

Lecture 5: Game Playing (Adversarial Search)

Application of UCT Search to the Connection Games of Hex, Y, *Star, and Renkula!

Implementation of Upper Confidence Bounds for Trees (UCT) on Gomoku

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

Creating a Havannah Playing Agent

How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997)

Artificial Intelligence. 4. Game Playing. Prof. Bojana Dalbelo Bašić Assoc. Prof. Jan Šnajder

Enhancements for Monte-Carlo Tree Search in Ms Pac-Man

AIs may use randomness to finally master this ancient game of strategy

mywbut.com Two agent games : alpha beta pruning

Programming Project 1: Pacman (Due )

Game Playing AI Class 8 Ch , 5.4.1, 5.5

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

V. Adamchik Data Structures. Game Trees. Lecture 1. Apr. 05, Plan: 1. Introduction. 2. Game of NIM. 3. Minimax

Foundations of AI. 6. Board Games. Search Strategies for Games, Games with Chance, State of the Art

Adversarial Search and Game Playing. Russell and Norvig: Chapter 5

Foundations of Artificial Intelligence

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

UCD : Upper Confidence bound for rooted Directed acyclic graphs

Enhancements for Monte-Carlo Tree Search in Ms Pac-Man

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

CS229 Project: Building an Intelligent Agent to play 9x9 Go

6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search

Pruning playouts in Monte-Carlo Tree Search for the game of Havannah

Contents. Foundations of Artificial Intelligence. Problems. Why Board Games?

Score Bounded Monte-Carlo Tree Search

CS 331: Artificial Intelligence Adversarial Search II. Outline

Computer Game Programming Board Games

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1

CS 387: GAME AI BOARD GAMES. 5/24/2016 Instructor: Santiago Ontañón

Search Depth. 8. Search Depth. Investing. Investing in Search. Jonathan Schaeffer

ボードゲームの着手評価関数の機械学習のためのパタ ーン特徴量の選択と進化. Description Supervisor: 池田心, 情報科学研究科, 博士

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

CMSC 671 Project Report- Google AI Challenge: Planet Wars

Midterm Examination. CSCI 561: Artificial Intelligence

Foundations of Artificial Intelligence

Monte Carlo tree search techniques in the game of Kriegspiel

Applications of Artificial Intelligence and Machine Learning in Othello TJHSST Computer Systems Lab

TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill

Ar#ficial)Intelligence!!

Theory and Practice of Artificial Intelligence

Small and large MCTS playouts applied to Chinese Dark Chess stochastic game

CS 188: Artificial Intelligence

Monte-Carlo Tree Search and Minimax Hybrids with Heuristic Evaluation Functions

Playing Games. Henry Z. Lo. June 23, We consider writing AI to play games with the following properties:

Artificial Intelligence Adversarial Search

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

Game Engineering CS F-24 Board / Strategy Games

School of EECS Washington State University. Artificial Intelligence

Transcription:

Monte Carlo Tree Search Simon M. Lucas

Outline MCTS: The Excitement! A tutorial: how it works Important heuristics: RAVE / AMAF Applications to video games and real-time control

The Excitement Game playing before MCTS MCTS and GO MCTS and General Game Playing

Conventional Game Tree Search Minimax with alpha-beta pruning, transposition tables Works well when: A good heuristic value function is known The branching factor is modest E.g. Chess, Deep Blue, Rybka etc.

Go Much tougher for computers High branching factor No good heuristic value function Although progress has been steady, it will take many decades of research and development before world-championship calibre go programs exist. Jonathan Schaeffer, 2001

Monte Carlo Tree Search (MCTS) Revolutionised the world of computer go Best GGP players (2008, 2009) use MCTS More CPU cycles leads to smarter play Typically lin / log: each doubling of CPU time adds a constant to playing strength Uses statistics of deep look-ahead from randomised roll-outs Anytime algorithm

Fuego versus GnuGo (from Fuego paper, IEEE T-CIAIG vol2 # 4)

General Game Playing (GGP) and Artificial General Intelligence (AGI) Original goal of AI was to develop general purpose machine intelligence Being good at a specific game is not a good test of this it s narrow AI But being able to play any game seems like a good test of AGI Hence general game playing (GGP)

GGP: How it works Games specified in predicate logic Two phases: GGP agents are given time to teach themselves how to play the game Then play commences on a time-limited basis Wonderful stuff! Great challenge for machine learning, But interesting to see which methods work best... Current best players all use MCTS

MCTS Tutorial How it works: MCTS general concepts Algorithm UCT formula Alternatives to UCT RAVE / AMAF Heuristics

MCTS Builds and searches an asymmetric game tree to make each move Phases are: Tree search: select node to expand using tree policy Perform random roll-out to end of game when true value is known Back the value up the tree

Sample MCTS Tree (fig from CadiaPlayer, Bjornsson and Finsson, IEEE T-CIAIG)

MCTS Algorithm for Action Selection repeat N times { // N might be between 100 and 1,000,000 // set up data structure to record line of play visited = new List<Node>() // select node to expand node = root visited.add(node) while (node is not a leaf) { node = select(node, node.children) // e.g. UCT selection visited.add(node) } // add a new child to the tree newchild = expand(node) visited.add(newchild) value = rollout(newchild) for (node : visited) // update the statistics of tree nodes traversed node.updatestats(value); } } return action that leads from root node to most valued child

Each iteration starts at the root Follows tree policy to reach a leaf node Then perform a random roll-out from there Node N is then added to tree Value of T backpropagated up tree MCTS Operation (fig from CadiaPlayer, Bjornsson and Finsson, IEEE T-CIAIG)

Upper Confidence Bounds on Trees (UCT) Node Selection Policy From Kocsis and Szepesvari (2006) Converges to optimal policy given infinite number of roll-outs Often not used in practice!

Tree Construction Example See Olivier Teytaud s slides from AIGamesNetwork.org summer 2010 MCTS workshop

AMAF / RAVE Heuristic Strictly speaking: each iteration should only update the value of a single child of the root node The child of the root node is the first move to be played AMAF (All Moves as First Move) is a type of RAVE heuristic (Rapid Action Value Estimate) the terms are often synonymous

How AMAF works Player A is player to move During an iteration (tree search + rollout) update the values in the AMAF table of all moves made by player A Add an AMAF term to the node selection policy Can also apply this to moves of opponent player?

Should AMAF work? Yes: a move might be good irrespective of when it is player (e.g. playing in the corner in Othello is ALWAYS a good move) No: the value of a move can depend very much on when it is player E.g. playing next to a corner in Othelo is usually bad, but might sometimes be very good Fact: works very well in some games (Go, Hex) Challenge: how to adapt similar principles for other games (Pac-Man)?

Improving MCTS Default roll-out policy is to make uniform random moves Can potentially improve on this by biasing move selections: Toward moves that players are more likely to make Can either program the heuristic a knowledgebased approach Or learn it (Temporal Difference Learning) Some promising work already done on this

MCTS for Video Games and Real-Time Control Requirements: Need a fast and accurate forward model i.e. taking action a in state s leads to state s (or a known probability distribution over a set of states) If no such model exists, then could maybe learn it? How accurate does the model need to be? For games, such a model always exists But may need to simplify it

Sample Games

MCTS Real-Time Approaches State space abstraction: Quantise state space mix of MCTS and Dynamic Programming search graph rather than tree Temporal Abstraction Don t need to make different actions 60 times per second! Instead, current action is usually the same (or predictable from) the previous one Action abstraction Consider higher-level action space

Initial Results on Video Games Tron (Google AI challenge) MCTS worked ok Ms Pac-Man Works brilliantly when given good ghost models Still works better than other techniques we ve tried when the ghost models are unknown

MCTS and Learning Some work already on this (Silver and Sutton, ICML 2008) Important step towards AGI (Artificial General Intelligence) MCTS that never learns anything is clearly missing some tricks Can be integrated very neatly with TD Learning

Multi-objective MCTS Currently the value of a node is expressed as a scalar quantity Can MCTS be improved by making this multidimensional E.g. for a line of play, balance effectiveness with variability / fun

Some Remarks MCTS: you have to get your hands dirty! The theory is not there yet (personal opinion) To work, roll-outs must be informative i.e. they must return information How NOT to use MCTS A planning domain where a long string of random actions is unlikely to reach goal Would need to bias roll-outs in some way to overcome this

Some More Remarks MCTS: a crazy idea that works surprisingly well! How well does it work? If there is a more applicable alternative (e.g. standard game tree search on a fully enumerated tree), MCTS may be terrible by comparison Best for tough problems for which other methods don t work