CSC321 Lecture 23: Go

Similar documents
Game-playing: DeepBlue and AlphaGo

Andrei Behel AC-43И 1

Monte Carlo Tree Search

How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997)

AI in Tabletop Games. Team 13 Josh Charnetsky Zachary Koch CSE Professor Anita Wasilewska

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

TTIC 31230, Fundamentals of Deep Learning David McAllester, April AlphaZero

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Game Playing: Adversarial Search. Chapter 5

6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm

Artificial Intelligence Adversarial Search

Computer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Artificial Intelligence Search III

Foundations of Artificial Intelligence

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Foundations of Artificial Intelligence

SDS PODCAST EPISODE 110 ALPHAGO ZERO

Adversarial Search. CMPSCI 383 September 29, 2011

More on games (Ch )

CS 188: Artificial Intelligence

Game Playing AI Class 8 Ch , 5.4.1, 5.5

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

COMP219: Artificial Intelligence. Lecture 13: Game Playing

Foundations of AI. 6. Board Games. Search Strategies for Games, Games with Chance, State of the Art

Foundations of Artificial Intelligence Introduction State of the Art Summary. classification: Board Games: Overview

Game AI Challenges: Past, Present, and Future

Adversarial Search Lecture 7

CSE 40171: Artificial Intelligence. Adversarial Search: Games and Optimality

CS 331: Artificial Intelligence Adversarial Search II. Outline

CS 4700: Foundations of Artificial Intelligence

Today. Types of Game. Games and Search 1/18/2010. COMP210: Artificial Intelligence. Lecture 10. Game playing

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

CITS3001. Algorithms, Agents and Artificial Intelligence. Semester 2, 2016 Tim French

Artificial Intelligence. Minimax and alpha-beta pruning

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Lecture 5: Game Playing (Adversarial Search)

Game Playing. Garry Kasparov and Deep Blue. 1997, GM Gabriel Schwartzman's Chess Camera, courtesy IBM.

Decision Making in Multiplayer Environments Application in Backgammon Variants

Monte Carlo Tree Search. Simon M. Lucas

Game Playing. Why do AI researchers study game playing? 1. It s a good reasoning problem, formal and nontrivial.

Mastering the game of Go without human knowledge

AlphaGo and Artificial Intelligence GUEST LECTURE IN THE GAME OF GO AND SOCIETY

Adversarial Search (Game Playing)

Adversarial Search and Game Playing

CS 188: Artificial Intelligence

CS-E4800 Artificial Intelligence

Game-Playing & Adversarial Search

Programming Project 1: Pacman (Due )

Playing Othello Using Monte Carlo

Game playing. Outline

CS 188: Artificial Intelligence Spring Announcements

CS 4700: Foundations of Artificial Intelligence

Game Playing AI. Dr. Baldassano Yu s Elite Education

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

CSE 573: Artificial Intelligence Autumn 2010

Computing Science (CMPUT) 496

Game Playing. Philipp Koehn. 29 September 2015

Games and Adversarial Search

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

A Bandit Approach for Tree Search

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Game playing. Chapter 6. Chapter 6 1

More on games (Ch )

CS 188: Artificial Intelligence Spring Game Playing in Practice

Automated Suicide: An Antichess Engine

Contents. Foundations of Artificial Intelligence. Problems. Why Board Games?

CS 771 Artificial Intelligence. Adversarial Search

Quick work: Memory allocation

CSE 473: Artificial Intelligence. Outline

Foundations of AI. 5. Board Games. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard and Luc De Raedt SA-1

Success Stories of Deep RL. David Silver

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

UNIT 13A AI: Games & Search Strategies. Announcements

Outline. Game playing. Types of games. Games vs. search problems. Minimax. Game tree (2-player, deterministic, turns) Games

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

UNIT 13A AI: Games & Search Strategies

CS440/ECE448 Lecture 9: Minimax Search. Slides by Svetlana Lazebnik 9/2016 Modified by Mark Hasegawa-Johnson 9/2017

Adversarial Search: Game Playing. Reading: Chapter

Adversarial Search Aka Games

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5

Artificial Intelligence. Topic 5. Game playing

School of EECS Washington State University. Artificial Intelligence

Game playing. Chapter 6. Chapter 6 1

Game Playing. Dr. Richard J. Povinelli. Page 1. rev 1.1, 9/14/2003

Game playing. Chapter 5, Sections 1{5. AIMA Slides cstuart Russell and Peter Norvig, 1998 Chapter 5, Sections 1{5 1

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1

Bootstrapping from Game Tree Search

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

CS 380: ARTIFICIAL INTELLIGENCE

Games vs. search problems. Adversarial Search. Types of games. Outline

CS 387/680: GAME AI BOARD GAMES

Transcription:

CSC321 Lecture 23: Go Roger Grosse Roger Grosse CSC321 Lecture 23: Go 1 / 21

Final Exam Friday, April 20, 9am-noon Last names A Y: Clara Benson Building (BN) 2N Last names Z: Clara Benson Building (BN) 2S Covers all lectures, tutorials, homeworks, and programming assignments 1/3 from the first half, 2/3 from the second half If there s a question on Lectures 22 or 23, it will be easy Emphasis on concepts covered in multiple of the above Similar in format and difficulty to the midterm, but about 3x longer Practice exams are posted Roger Grosse CSC321 Lecture 23: Go 2 / 21

Overview Most of the problem domains we ve discussed so far were natural application areas for deep learning (e.g. vision, language) We know they can be done on a neural architecture (i.e. the human brain) The predictions are inherently ambiguous, so we need to find statistical structure Board games are a classic AI domain which relied heavily on sophisticated search techniques with a little bit of machine learning Full observations, deterministic environment why would we need uncertainty? This lecture is about AlphaGo, DeepMind s Go playing system which took the world by storm in 2016 by defeating the human Go champion Lee Sedol Combines ideas from our last two lectures (policy gradient and value function learning) Roger Grosse CSC321 Lecture 23: Go 3 / 21

Overview Some milestones in computer game playing: 1949 Claude Shannon proposes the idea of game tree search, explaining how games could be solved algorithmically in principle 1951 Alan Turing writes a chess program that he executes by hand 1956 Arthur Samuel writes a program that plays checkers better than he does 1968 An algorithm defeats human novices at Go...silence... 1992 TD-Gammon plays backgammon competitively with the best human players 1996 Chinook wins the US National Checkers Championship 1997 DeepBlue defeats world chess champion Garry Kasparov After chess, Go was humanity s last stand Roger Grosse CSC321 Lecture 23: Go 4 / 21

Go Played on a 19 19 board Two players, black and white, each place one stone per turn Capture opponent s stones by surrounding them Roger Grosse CSC321 Lecture 23: Go 5 / 21

Go Goal is to control as much territory as possible: Roger Grosse CSC321 Lecture 23: Go 6 / 21

Go What makes Go so challenging: Hundreds of legal moves from any position, many of which are plausible Games can last hundreds of moves Unlike Chess, endgames are too complicated to solve exactly (endgames had been a major strength of computer players for games like Chess) Heavily dependent on pattern recognition Roger Grosse CSC321 Lecture 23: Go 7 / 21

Game Trees Each node corresponds to a legal state of the game. The children of a node correspond to possible actions taken by a player. Leaf nodes are ones where we can compute the value since a win/draw condition was met https://www.cs.cmu.edu/~adamchik/15-121/lectures/game%20trees/game%20trees.html Roger Grosse CSC321 Lecture 23: Go 8 / 21

Game Trees To label the internal nodes, take the max over the children if it s Player 1 s turn, min over the children if it s Player 2 s turn https://www.cs.cmu.edu/~adamchik/15-121/lectures/game%20trees/game%20trees.html Roger Grosse CSC321 Lecture 23: Go 9 / 21

Game Trees As Claude Shannon pointed out in 1949, for games with finite numbers of states, you can solve them in principle by drawing out the whole game tree. Ways to deal with the exponential blowup Search to some fixed depth, and then estimate the value using an evaluation function Prioritize exploring the most promising actions for each player (according to the evaluation function) Having a good evaluation function is key to good performance Traditionally, this was the main application of machine learning to game playing For programs like Deep Blue, the evaluation function would be a learned linear function of carefully hand-designed features Roger Grosse CSC321 Lecture 23: Go 10 / 21

Monte Carlo Tree Search In 2006, computer Go was revolutionized by a technique called Monte Carlo Tree Search. Silver et al., 2016 Estimate the value of a position by simulating lots of rollouts, i.e. games played randomly using a quick-and-dirty policy Keep track of number of wins and losses for each node in the tree Key question: how to select which parts of the tree to evaluate? Roger Grosse CSC321 Lecture 23: Go 11 / 21

Monte Carlo Tree Search The selection step determines which part of the game tree to spend computational resources on simulating. This is an instance of the exploration-exploitation tradeoff from last lecture Want to focus on good actions for the current player But want to explore parts of the tree we re still uncertain about Uniform Confidence Bound (UCB) is a common heuristic; choose the node which has the largest frequentist upper confidence bound on its value: 2 log N µ i + N i µ i = fraction of wins for action i, N i = number of times we ve tried action i, N = total times we ve visited this node Roger Grosse CSC321 Lecture 23: Go 12 / 21

Monte Carlo Tree Search Improvement of computer Go since MCTS (plot is within the amateur range) Roger Grosse CSC321 Lecture 23: Go 13 / 21

Now for DeepMind s computer Go player, AlphaGo... Roger Grosse CSC321 Lecture 23: Go 14 / 21

Predicting Expert Moves Can a computer play Go without any search? Ilya Sutskever s argument: experts players can identify a set of good moves in half a second This is only enough time for information to propagate forward through the visual system not enough time for complex reasoning Therefore, it ought to be possible for a conv net to identify good moves Roger Grosse CSC321 Lecture 23: Go 15 / 21

Predicting Expert Moves Can a computer play Go without any search? Ilya Sutskever s argument: experts players can identify a set of good moves in half a second This is only enough time for information to propagate forward through the visual system not enough time for complex reasoning Therefore, it ought to be possible for a conv net to identify good moves Input: a 19 19 ternary (black/white/empty) image about half the size of MNIST! Prediction: a distribution over all (legal) next moves Training data: KGS Go Server, consisting of 160,000 games and 29 million board/next-move pairs Architecture: fairly generic conv net When playing for real, choose the highest-probability move rather than sampling from the distribution Roger Grosse CSC321 Lecture 23: Go 15 / 21

Predicting Expert Moves Can a computer play Go without any search? Ilya Sutskever s argument: experts players can identify a set of good moves in half a second This is only enough time for information to propagate forward through the visual system not enough time for complex reasoning Therefore, it ought to be possible for a conv net to identify good moves Input: a 19 19 ternary (black/white/empty) image about half the size of MNIST! Prediction: a distribution over all (legal) next moves Training data: KGS Go Server, consisting of 160,000 games and 29 million board/next-move pairs Architecture: fairly generic conv net When playing for real, choose the highest-probability move rather than sampling from the distribution This network, which just predicted expert moves, could beat a fairly strong program called GnuGo 97% of the time. This was amazing basically all strong game players had been based on some sort of search over the game tree Roger Grosse CSC321 Lecture 23: Go 15 / 21

Self-Play and REINFORCE The problem from training with expert data: there are only 160,000 games in the database. What if we overfit? There is effecitvely infinite data from self-play Have the network repeatedly play against itself as its opponent For stability, it should also play against older versions of itself Start with the policy which samples from the predictive distribution over expert moves The network which computes the policy is called the policy network REINFORCE algorithm: update the policy to maximize the expected reward r at the end of the game (in this case, r = +1 for win, 1 for loss) If θ denotes the parameters of the policy network, a t is the action at time t, and s t is the state of the board, and z the rollout of the rest of the game using the current policy R = E at p θ (a t s t)[e[r(z) s t, a t ]] Roger Grosse CSC321 Lecture 23: Go 16 / 21

Policy and Value Networks We just saw the policy network. But AlphaGo also has another network called a value network. This network tries to predict, for a given position, which player has the advantage. This is just a vanilla conv net trained with least-squares regression. Data comes from the board positions and outcomes encountered during self-play. Silver et al., 2016 Roger Grosse CSC321 Lecture 23: Go 17 / 21

Policy and Value Networks AlphaGo combined the policy and value networks with Monte Carlo Tree Search Policy network used to simulate rollouts Value network used to evaluate leaf positions Roger Grosse CSC321 Lecture 23: Go 18 / 21

AlphaGo Timeline Summer 2014 start of the project (internship project for UofT grad student Chris Maddison) October 2015 AlphaGo defeats European champion First time a computer Go player defeated a human professional without handicap previously believed to be a decade away January 2016 publication of Nature article Mastering the game of Go with deep neural networks and tree search March 2016 AlphaGo defeats gradmaster Lee Sedol October 2017 AlphaGo Zero far surpasses the original AlphaGo without training on any human data Decemter 2017 it beats the best chess programs too, for good measure Roger Grosse CSC321 Lecture 23: Go 19 / 21

AlphaGo Most of the Go world expected AlphaGo to lose 5-0 (even after it had beaten the European champion) It won the match 4-1 Some of its moves seemed bizarre to human experts, but turned out to be really good Its one loss occurred when Lee Sedol played a move unlike anything in the training data Roger Grosse CSC321 Lecture 23: Go 20 / 21

AlphaGo Further reading: Silver et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature http://www.nature.com/ nature/journal/v529/n7587/full/nature16961.html Scientific American: https://www.scientificamerican.com/ article/how-the-computer-beat-the-go-master/ Talk by the DeepMind CEO: https://www.youtube.com/watch?v=aiwqsa_7ziq&list= PLqYmG7hTraZCGIymT8wVVIXLWkKPNBoFN&index=8 Roger Grosse CSC321 Lecture 23: Go 21 / 21