Recent Progress in Computer Go. Martin Müller University of Alberta Edmonton, Canada

Similar documents
Computer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta

A Bandit Approach for Tree Search

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

Old-fashioned Computer Go vs Monte-Carlo Go

Building Opening Books for 9 9 Go Without Relying on Human Go Expertise

A Parallel Monte-Carlo Tree Search Algorithm

Computing Science (CMPUT) 496

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

Exploration exploitation in Go: UCT for Monte-Carlo Go

CS229 Project: Building an Intelligent Agent to play 9x9 Go

Challenges in Monte Carlo Tree Search. Martin Müller University of Alberta

A Study of UCT and its Enhancements in an Artificial Game

Adversarial Reasoning: Sampling-Based Search with the UCT algorithm. Joint work with Raghuram Ramanujan and Ashish Sabharwal

GO for IT. Guillaume Chaslot. Mark Winands

Monte Carlo Tree Search. Simon M. Lucas

Combining Final Score with Winning Percentage by Sigmoid Function in Monte-Carlo Simulations

Artificial Intelligence

Application of UCT Search to the Connection Games of Hex, Y, *Star, and Renkula!

Playing Othello Using Monte Carlo

Foundations of AI. 6. Board Games. Search Strategies for Games, Games with Chance, State of the Art

Adding expert knowledge and exploration in Monte-Carlo Tree Search

Fuego An Open-source Framework for Board Games and Go Engine Based on Monte-Carlo Tree Search

Foundations of Artificial Intelligence Introduction State of the Art Summary. classification: Board Games: Overview

More on games (Ch )

Virtual Global Search: Application to 9x9 Go

Computing Elo Ratings of Move Patterns. Game of Go

CS 387: GAME AI BOARD GAMES

Probability of Potential Model Pruning in Monte-Carlo Go

Generalized Rapid Action Value Estimation

CS 188: Artificial Intelligence

Contents. Foundations of Artificial Intelligence. Problems. Why Board Games?

TTIC 31230, Fundamentals of Deep Learning David McAllester, April AlphaZero

Monte Carlo Go Has a Way to Go

Analyzing Simulations in Monte Carlo Tree Search for the Game of Go

game tree complete all possible moves

More on games (Ch )

Score Bounded Monte-Carlo Tree Search

Monte Carlo Tree Search

The Computational Intelligence of MoGo Revealed in Taiwan s Computer Go Tournaments

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Computing Elo Ratings of Move Patterns in the Game of Go

ARTIFICIAL INTELLIGENCE (CS 370D)

CS 387/680: GAME AI BOARD GAMES

Implementation of Upper Confidence Bounds for Trees (UCT) on Gomoku

2 person perfect information

mywbut.com Two agent games : alpha beta pruning

Artificial Intelligence for Go. Kristen Ying Advisors: Dr. Maxim Likhachev & Dr. Norm Badler

Adversarial Search: Game Playing. Reading: Chapter

DEVELOPMENTS ON MONTE CARLO GO

Monte Carlo Search in Games

Learning from Hints: AI for Playing Threes

Decomposition Search A Combinatorial Games Approach to Game Tree Search, with Applications to Solving Go Endgames

Game Algorithms Go and MCTS. Petr Baudiš, 2011

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1

AI, AlphaGo and computer Hex

43.1 Introduction. Foundations of Artificial Intelligence Introduction Monte-Carlo Methods Monte-Carlo Tree Search. 43.

Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar

Pruning playouts in Monte-Carlo Tree Search for the game of Havannah

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

MONTE-CARLO TWIXT. Janik Steinhauer. Master Thesis 10-08

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Goal threats, temperature and Monte-Carlo Go

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search

A Complex Systems Introduction to Go

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2010-GI-24 No /6/25 UCT UCT UCT UCB A new UCT search method using position evaluation function an

Analyzing the Impact of Knowledge and Search in Monte Carlo Tree Search in Go

Game-playing: DeepBlue and AlphaGo

Playout Search for Monte-Carlo Tree Search in Multi-Player Games

Five-In-Row with Local Evaluation and Beam Search

Theory and Practice of Artificial Intelligence

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

Monte Carlo tree search techniques in the game of Kriegspiel

Ar#ficial)Intelligence!!

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

Adversarial Search. Chapter 5. Mausam (Based on slides of Stuart Russell, Andrew Parks, Henry Kautz, Linda Shapiro) 1

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Creating a Havannah Playing Agent

Comparing UCT versus CFR in Simultaneous Games

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Monte-Carlo Tree Search Enhancements for Havannah

Artificial Intelligence. Minimax and alpha-beta pruning

Artificial Intelligence

Programming an Othello AI Michael An (man4), Evan Liang (liange)

CS 229 Final Project: Using Reinforcement Learning to Play Othello

Documentation and Discussion

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5

COMP219: Artificial Intelligence. Lecture 13: Game Playing

CS 4700: Foundations of Artificial Intelligence

ボードゲームの着手評価関数の機械学習のためのパタ ーン特徴量の選択と進化. Description Supervisor: 池田心, 情報科学研究科, 博士

Game Playing State-of-the-Art

Production of Various Strategies and Position Control for Monte-Carlo Go - Entertaining human players

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence

Adversary Search. Ref: Chapter 5

Monte-Carlo Game Tree Search: Advanced Techniques

Drafting Territories in the Board Game Risk

Transcription:

Recent Progress in Computer Go Martin Müller University of Alberta Edmonton, Canada

40 Years of Computer Go 1960 s: initial ideas 1970 s: first serious program - Reitman & Wilcox 1980 s: first PC programs, competitions 1990 s: slow progress, commercial successes 2000 s: GNU Go - strong open source program now: Monte-Carlo and UCT revolution, strong 9x9 programs

Classical Go Programs Goliath (Mark Boon) Go Intellect (Ken Chen) Handtalk (Chen Zhixing) Go++ (Michael Reiss) KCC (North Korean team) Many Faces of Go (David Fotland) GNU Go (international team)

Monte-Carlo Simulation and UCT for Go 1993 Bernd Brügmann - simulations for Go 200x Bouzy and students revive simulations 2006 Kocsis and Szepesvari - UCT algorithm Sylvain Gelly, Yizao Wang - MoGo Remi Coulom - Crazy Stone Don Dailey - CGOS server, new programs

Classic vs New Go Programs Classic Knowledge intensive Problem: heuristic position evaluation Local goal search New Search intensive No (!) heuristic evaluation Global search + simulations

How Strong? almost perfect on 7x7 amateur Dan level on 9x9 5 kyu on 19x19? Similar to top classic program

Dec. 2006 - My Wakeup Call Martin Müller vs Valkyria by Magnus Persson Komi 7.5

Games vs Guo Juan 5 Dan Aug. 2006 Match CrazyStone vs Guo Juan 7x7 Board, 9 komi CrazyStone white: always wins or jigo Guo white: often wins June 2007 Match MoGo vs Guo Juan 9x9 Board, MoGo black, 0.5 komi 9 wins : 5 losses for MoGo

Examples guojuan-mogobot.sgf, guojuan-mogobot-9.sgf

Playing Style Monte-Carlo based programs play many strange moves but they are very good at winning! only care about winning, not the score play safe when ahead try invasions when behind

Cosmic Style Opening Ruky-MoGoBot-2.sgf moves 16-31

Example: Random Play in Decided Games GNU-StoneCrazy.sgf moves 122-132

How Does it Work? Monte-Carlo Simulations Basic Idea Refinements UCT method (Upper Confidence bounds applied to Trees) Building a Game Tree Evaluation

Simulations Monte-Carlo simulation Popular in physics Study behavior of complex system by running many random simulations Go: play random game from current position

Simulation - Example Random legal move Do not fill one point eyes Game over after both pass Evaluate by Chinese rules 1 for win 0 for loss valkyria-exboss-randomgame.sgf

Simulation-Based Player Play many random games Win/loss statistics for each possible move Play move with highest win percentage Fast Over 1 Million moves/sec. Typical 100.000 simulations per move Weakness: loves to play threats

Example - Bad Threat C1 is a bad threat, if White captures on B1 Black cannot save F1 stones In pure random simulations, C1 works very often!

Refinement of Simulations Add Go knowledge Capture/escape from capture Avoid self-atari Simple cutting/blocking patterns Play near last move(s) Must be extremely fast to compute

The MoGo Patterns Hane/Extend Cut/Connect Edge of board

Example of Biased Simulation valkyria-exboss-biased-random-game.sgf

Adding Game Tree Search Pure simulation is limited Weak in tactics Classical game-playing uses game tree search minimax, alpha-beta new selective search method - UCT

UCT Idea Follow best moves down the tree At leaf, start a simulation Add first new move to tree Image by Sylvain Gelly

What is the Best Move Where can we gain most valuable information? Move that looks good so far Move that has not been analyzed much yet UCT is a compromise Select move where success rate + uncertainty is highest.

UCT Evaluation Classical Minimax: Value = value of position after best move UCT: Value = weighted average of moves Weight = number of simulations for that move

Example Very selective search Concentrates on few promising moves approaches minimax value if optimal move(s) get most simulations

Refinements to Tree Search RAVE (Gelly & Silver 2007) Add Go knowledge Patterns (Coulom 2007) Reinforcement learning (Gelly & Silver 2007)

RAVE - Rapid Action Value Estimation UCT needs many samples of all moves - slow Idea: moves later in simulation also important All moves as first (Brügmann 1993) Win statistics for each move in all games Use at beginning Phase out gradually

Using Go Knowledge Use Go knowledge to initialize value of moves Also phase out gradually Use RLGO evaluation function in MoGo (Gelly & Silver 2007) Can be combined with RAVE Learn feature values for pruning and progressive widening of tree (Coulom 2007)

Why Does it Work so Well? No theoretical explanation Excellent empirical results Simulations: good move in random Go is often a good move in Go UCT: good moves in random Go are interesting moves to try in search

Future - Scaling Up Scales well with increasing computer power No limit in sight - Don Dailey s experiment Challenge: parallel search Shared memory Computer clusters Bottleneck: update tree, select best line

Summary Revolution through Monte-Carlo simulations and UCT Strong 9x9 programs When will we see strong 19x19?