Monte Carlo Tree Search

Similar documents
Andrei Behel AC-43И 1

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

Game-playing: DeepBlue and AlphaGo

How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997)

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

Artificial Intelligence. Minimax and alpha-beta pruning

Adversarial Search Lecture 7

CSC321 Lecture 23: Go

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

Computer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS 188: Artificial Intelligence

Adversarial Search: Game Playing. Reading: Chapter

Game Playing: Adversarial Search. Chapter 5

Programming Project 1: Pacman (Due )

Artificial Intelligence Adversarial Search

Foundations of Artificial Intelligence

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Foundations of Artificial Intelligence

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

Adversarial Search (Game Playing)

CSE 473: Artificial Intelligence. Outline

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

CS 387: GAME AI BOARD GAMES

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

CS 5522: Artificial Intelligence II

Artificial Intelligence

CS 188: Artificial Intelligence

Game Playing State-of-the-Art

Foundations of AI. 6. Board Games. Search Strategies for Games, Games with Chance, State of the Art

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

More on games (Ch )

Game Playing AI Class 8 Ch , 5.4.1, 5.5

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Game Playing. Philipp Koehn. 29 September 2015

Game-Playing & Adversarial Search

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

CPS 570: Artificial Intelligence Two-player, zero-sum, perfect-information Games

Lecture 33: How can computation Win games against you? Chess: Mechanical Turk

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

game tree complete all possible moves

6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Artificial Intelligence

CS 188: Artificial Intelligence Spring Announcements

TTIC 31230, Fundamentals of Deep Learning David McAllester, April AlphaZero

CS 331: Artificial Intelligence Adversarial Search II. Outline

ARTIFICIAL INTELLIGENCE (CS 370D)

Foundations of Artificial Intelligence Introduction State of the Art Summary. classification: Board Games: Overview

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

Contents. Foundations of Artificial Intelligence. Problems. Why Board Games?

UNIT 13A AI: Games & Search Strategies. Announcements

Foundations of AI. 5. Board Games. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard and Luc De Raedt SA-1

A Bandit Approach for Tree Search

School of EECS Washington State University. Artificial Intelligence

CS 771 Artificial Intelligence. Adversarial Search

Adversarial Search Aka Games

Game Playing State of the Art

AI in Tabletop Games. Team 13 Josh Charnetsky Zachary Koch CSE Professor Anita Wasilewska

Game Playing AI. Dr. Baldassano Yu s Elite Education

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

CSE 573: Artificial Intelligence

CS 387/680: GAME AI BOARD GAMES

UNIT 13A AI: Games & Search Strategies

More on games (Ch )

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search

DIT411/TIN175, Artificial Intelligence. Peter Ljunglöf. 2 February, 2018

CS 4700: Foundations of Artificial Intelligence

Adversarial Search and Game Playing

Game Playing. Why do AI researchers study game playing? 1. It s a good reasoning problem, formal and nontrivial.

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

CS 387: GAME AI BOARD GAMES. 5/24/2016 Instructor: Santiago Ontañón

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

Computing Science (CMPUT) 496

Aja Huang Cho Chikun David Silver Demis Hassabis. Fan Hui Geoff Hinton Lee Sedol Michael Redmond

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5

Games and Adversarial Search

AlphaGo and Artificial Intelligence GUEST LECTURE IN THE GAME OF GO AND SOCIETY

CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017

CITS3001. Algorithms, Agents and Artificial Intelligence. Semester 2, 2016 Tim French

Ar#ficial)Intelligence!!

CSE 573: Artificial Intelligence Autumn 2010

MITOCW Advanced 4. Monte Carlo Tree Search

The Principles Of A.I Alphago

CPS331 Lecture: Search in Games last revised 2/16/10

CS885 Reinforcement Learning Lecture 13c: June 13, Adversarial Search [RusNor] Sec

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

CS 188: Artificial Intelligence. Overview

2 person perfect information

Automated Suicide: An Antichess Engine

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

Game-playing AIs: Games and Adversarial Search I AIMA

Ch.4 AI and Games. Hantao Zhang. The University of Iowa Department of Computer Science. hzhang/c145

Transcription:

Monte Carlo Tree Search 1

By the end, you will know Why we use Monte Carlo Search Trees The pros and cons of MCTS How it is applied to Super Mario Brothers and Alpha Go 2

Outline I. Pre-MCTS Algorithms II. Monte Carlo Tree Search III. Applications 3

Motivation Want to create programs to play games Want to play optimally Want to be able to do this in a reasonable amount of time 4

Deterministic Nondeterministic (Chance) Fully Observable Chess Checkers Go Backgammon Monopoly Partially Observable Battleship Card Games 5

Pre-MCTS Algorithms Deterministic, Fully Observable Games Perfect information Can construct a tree that contains all possible outcomes because everything is fully determined 6

Minimize the maximum possible loss 7

Minimax 8

Simple Pruning 9

Alpha-Beta Pruning Prunes away branches that cannot influence the final decision 10

Alpha - Beta 11

2 4 vs. 2 250 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/. 12

Outline I. Pre-MCTS Algorithms II. Monte Carlo Tree Search III. Applications 13

Asymmetric Tree Exploration From Bandit Algorithms for Tree Search, Coquelin and Munos, 2007 14

MCTS Outline 1. Descend through the tree 2. Create new node 3. Simulate 4. Update the tree Repeat! +Δ +Δ 5. When you re out of time, Return best child. +Δ Value = Δ 15

What do we store? For game state k: n k = # games played involving k w k,p = # games won (by player p) that involved k 3/4 1/2 0/2 1/2 1/1 0/1 0/1 1/1 w k,1 / n k 16

1. Descending We want to expand, but also to explore. 17 Zach Weinersmith. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

1. Descending Solution: Upper Confidence Bound expand explore 3/4 1/2 0/2 1/2 1/1 0/1 0/1 At each step, maximize UCB1(k, p) 1/1 w k,1 / n k 18

2. Expanding Not very complicated. Make a new node! 0/0 Set n k = 0, w k = 0 19

3. Simulating Simulating a real game is hard. Let s just play the game out randomly! If we win, Δ = +1. If we lose or tie, Δ = 0. X X O X O O X X X X O O X O O X X O X O X O O X X O O X X O O X wins X wins O wins A lot of options 20

4. Updating the Tree Propagate recursively up the parents. Given simulation result Δ, 3/4 4/5 for each k: 2/3 1/2 n k-new = n k-old + 1 w k,1-new = w k,1-old + Δ 0/0 1/1 Δ = +1 w k,1 / n k 21

5. Terminating Return the best-ranked first ancestor! X O What determines best? - Highest E[win k] - Highest E[win k] AND most visited 3/5 11/20 X X O X O X 22

5/6 4/5 5/7 1/2 2/3 3/4 0/2 0/3 2/4 2/3 1/1 2/2 0/1 1/2 0/0 1/1 0/1 0/1 0/0 Δ = +1 Δ = 0 1/2 1/1 1/1 1/1 0/0 0/0 1/1 Δ = +1 Δ = +1 0/0 0/1 Δ = 0 23 expand explore

Why use MCTS? Pros: - Grows tree asymmetrically, balancing expansion and exploration - Depends only on the rules - Easy to adapt to new games - Heuristics not required, but can also be integrated - Can finish on demand, CPU time is proportional to answer quality - Complete: guaranteed to find a solution given time - Trivially parallelizable Cons: - Can t handle extreme tree depth - Requires ease of simulation, massive computation resources - Relies on random play being weakly correlated - Many variants, need expertise to tune - Theoretical properties not yet understood 24

Screenshots of video games removed due to copyright restrictions. 25

Outline I. Pre-MCTS Algorithms II. Monte Carlo Tree Search III. Applications Wait for it 26

Part III Applications 27

MCTS-based Mario Controller! Nintendo Co., Ltd. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/. 28

MCTS modifications for Super Mario Bros Single player Multi-simulation Domain knowledge 5-40ms computation time 29

Problem Formulation jump left right Nodes State Mario position, speed, direction, etc Enemy position, speed, direction, etc Location of blocks etc Value Edges Mario s possible action (right, left, jump, etc) Nintendo Co., Ltd. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/. 30

Calculating Simulation Result Domain Knowledge: multi-objective weighted sum Distance 0.1 hiddenblocks 24 mariostatus 1024 Flower 64 killsbystomp 12 timeleft 2 Mushrooms 58 killsbyfire 4 mariomode 32 greenmushrooms 1 killsbyshell 17 Coins 16 Hurts -42 killstotal 42 Stomps 1 31

Simulation type Regular Best of N Multi-Simulation 32

Demo 33

Results Outperforms Astar 34

AlphaGo Saran Poroong. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/. 35

The Rules Board is 19x19. Starts empty. Players alternate placing one stone. Capture enemy stone by surrounding A player s territory is all the area surrounded Score = Territory + Captured pieces 36

Go vs Chess GO CHESS 250 options 35 options 150 turns 80 turns 10 761 games 10 120 games 37

MCTS modifications for Go Combines Neural Networks with MCTS 2 Policy Networks (slow and fast) 1 Value Network 38

2 Policy Networks Input is the game state, as an image Output is a probability distribution over legal actions Supervised learning on 30 million positions from human expert games Slow Policy Network Fast Policy Network 57% accuracy 24% accuracy 3,000 microseconds 2 microseconds 39

Policy Network Reinforcement Learning Next step: predict winning moves, rather than expert human moves Policy Networks play against themselves! Tested best Policy Network against Pachi Pachi relies on 100,000 MCTS simulations at each turn AlphaGo s Policy Network won 85% of the games (3ms per turn) Intuition tends to win over long reflection in Go? 40

Value Network Trained on positions from the Policy Network s reinforcement learning Similar to evaluation function (as in DeepBlue), but learned rather than designed. Predictions get better towards end game 41

Using Neural Networks with MCTS Slow Policy Network guides tree search Value of state = Fast Policy Network simulation + Value Network Output 42

Why use Policy and Value Networks? They work hand-in-hand. The VN learns from the PN, and the PN is improved by the VN. Value Network Alone Would have to exhaustively compare the value of all children PN Predicts the best move, narrows the search space by only considering moves that are most likely victorious Policy Network Alone Unable to directly compare nodes in different parts of the tree VN gives estimate of winner as if the game were played according to the PN Values direct later searches towards moves that are actually evaluated to be better 43

Why combine Neural Networks with MCTS? How does MCTS improve a Policy Network? Recall: MCTS (Pachi) beat the Policy Network in 15% of games Policy Network is just a prediction MCTS and Monte-Carlo rollouts help the policy adjust towards moves that are actually evaluated to be good How do Neural Networks improve MCTS? The Slow Policy more intelligently guides tree exploration The Fast Policy Network more intelligently guides simulations Value Network and Simulation Value are complementary 44

AlphaGo vs Other AI AI name Distributed AlphaGo (2015) Elo rating 3140 AlphaGo (2015) 2890 CrazyStone 1929 Zen 1888 Pachi 1298 Fuego 1148 GnuGo 431 Distributed AlphaGo won 77% of games against single-machine AlphaGo Distributed AlphaGo won 100% of games against other AI 45

AlphaGo vs Lee Sedol AlphaGO Lee Sedol 4 wins 3,586 Elo 1 win 3,520 Elo Reuters. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/. Only one human with a higher Elo. Ke Jie (Elo 3,621) 46

Timeline 1952 computer masters Tic-Tac-Toe 1994 computer master Checkers 1997 IBM s Deep Blue defeats Garry Kasparov in chess 2011 IBM s Watson defeats to Jeopardy champions 2014 Google algorithms learn to play Atari games 2015 Wikipedia: Thus, it is very unlikely that it will be possible to program a reasonably fast algorithm for playing the Go endgame flawlessly, let alone the whole Go game. 2015 Google s AlphaGo defeats Fan Hui (2-dan player) in Go 2016 Google s AlphaGo defeats Lee Sedol 4-1 (9-dan player) in Go 47

Conclusion MCTS expands the search tree based on random sampling of the search space (game board). 1. Descend 2. Create New Node 3. Simulate 4. Update 48

References Mario: http://www.slideshare.net/ssuser7713a0/monte-carlo-tree-search-for-the-super-mario-bros AlphaGo Full: http://airesearch.com/wp-content/uploads/2016/01/deepmind-mastering-go.pdf AlphaGo Summary: https://www.tastehit.com/blog/google-deepmind-alphago-how-it-works/ 49

Sample Tree X X X O O O X O O O X O O X X O O X X O O O X X O O X X O O O X X O O O X X O X O O O X X X O O X X X O O O X X X O O... O X X X O O 50

MIT OpenCourseWare https://ocw.mit.edu 16.412J / 6.834J Cognitive Robotics Spring 2016 For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.