Andrei Behel AC-43И 1

Similar documents
Monte Carlo Tree Search

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

Game-playing: DeepBlue and AlphaGo

How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997)

Computer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta

CSC321 Lecture 23: Go

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm

The Principles Of A.I Alphago

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

SDS PODCAST EPISODE 110 ALPHAGO ZERO

AlphaGo and Artificial Intelligence GUEST LECTURE IN THE GAME OF GO AND SOCIETY

TTIC 31230, Fundamentals of Deep Learning David McAllester, April AlphaZero

AI in Tabletop Games. Team 13 Josh Charnetsky Zachary Koch CSE Professor Anita Wasilewska

Foundations of AI. 6. Board Games. Search Strategies for Games, Games with Chance, State of the Art

Mastering the game of Go without human knowledge

Foundations of Artificial Intelligence

AI, AlphaGo and computer Hex

Foundations of Artificial Intelligence

CS 387: GAME AI BOARD GAMES

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

Foundations of Artificial Intelligence Introduction State of the Art Summary. classification: Board Games: Overview

Artificial Intelligence. Minimax and alpha-beta pruning

School of EECS Washington State University. Artificial Intelligence

Experiments with Tensor Flow Roman Weber (Geschäftsführer) Richard Schmid (Senior Consultant)

Department of Computer Science and Engineering. The Chinese University of Hong Kong. Final Year Project Report LYU1601

Aja Huang Cho Chikun David Silver Demis Hassabis. Fan Hui Geoff Hinton Lee Sedol Michael Redmond

Game AI Challenges: Past, Present, and Future

Contents. Foundations of Artificial Intelligence. Problems. Why Board Games?

Monte Carlo tree search techniques in the game of Kriegspiel

Computing Science (CMPUT) 496

Improving MCTS and Neural Network Communication in Computer Go

CS 188: Artificial Intelligence

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

Game Playing: Adversarial Search. Chapter 5

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Intelligent Non-Player Character with Deep Learning. Intelligent Non-Player Character with Deep Learning 1

6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search

Lecture 33: How can computation Win games against you? Chess: Mechanical Turk

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

Analyzing the Impact of Knowledge and Search in Monte Carlo Tree Search in Go

Multi-Labelled Value Networks for Computer Go

CS 387/680: GAME AI BOARD GAMES

The Roller-Coaster History of Artificial Intelligence and its Impact on the Practice of Law

Adversarial Search Lecture 7

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Building Opening Books for 9 9 Go Without Relying on Human Go Expertise

Quick work: Memory allocation

DeepMind s Demis Hassabis inspires London schoolchildren

AIs may use randomness to finally master this ancient game of strategy

Foundations of AI. 5. Board Games. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard and Luc De Raedt SA-1

CS 387: GAME AI BOARD GAMES. 5/24/2016 Instructor: Santiago Ontañón

Artificial Intelligence Adversarial Search

Applied Applied Artificial Intelligence - a (short) Silicon Valley appetizer

AI in Games: Achievements and Challenges. Yuandong Tian Facebook AI Research

Game Playing. Why do AI researchers study game playing? 1. It s a good reasoning problem, formal and nontrivial.

LONDON S BEST BUSINESS MINDS TO COMPETE FOR PRESTIGIOUS CHESS TITLE

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

CS6700: The Emergence of Intelligent Machines. Prof. Carla Gomes Prof. Bart Selman Cornell University

All about Go, the ancient game in which AI bested a master 10 March 2016, by Youkyung Lee

Adversarial Search: Game Playing. Reading: Chapter

MyPawns OppPawns MyKings OppKings MyThreatened OppThreatened MyWins OppWins Draws

Game Playing. Garry Kasparov and Deep Blue. 1997, GM Gabriel Schwartzman's Chess Camera, courtesy IBM.

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

CSE 40171: Artificial Intelligence. Adversarial Search: Games and Optimality

Universiteit Leiden Opleiding Informatica

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN

LONDON S BEST BUSINESS MINDS TO COMPETE FOR PRESTIGIOUS CHESS TITLE

UNIT 13A AI: Games & Search Strategies. Announcements

UNIT 13A AI: Games & Search Strategies

Mastering the game of Omok

CS 188: Artificial Intelligence

CSE 473: Artificial Intelligence. Outline

Games and Adversarial Search

Success Stories of Deep RL. David Silver

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

COMP219: Artificial Intelligence. Lecture 13: Game Playing

Artificial intelligence: past, present and future

Challenges in Monte Carlo Tree Search. Martin Müller University of Alberta

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage

Data-Starved Artificial Intelligence

MOVE EVALUATION IN GO USING DEEP CONVOLUTIONAL NEURAL NETWORKS

Computer Go and Monte Carlo Tree Search: Book and Parallel Solutions

CITS3001. Algorithms, Agents and Artificial Intelligence. Semester 2, 2016 Tim French

The Future of Artificial Intelligence

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

A Bandit Approach for Tree Search

Monte Carlo Tree Search. Simon M. Lucas

Automated Suicide: An Antichess Engine

CS 331: Artificial Intelligence Adversarial Search II. Outline

The game of Bridge: a challenge for ILP

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

More on games (Ch )

Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar

Transcription:

Andrei Behel AC-43И 1

History The game of Go originated in China more than 2,500 years ago. The rules of the game are simple: Players take turns to place black or white stones on a board, trying to capture the opponent's stones or surround empty space to make points of territory. 2

Introduction Go and chess are very popular board games, which are similar in some respects: both are played by two players taking turns, and there is no random element involved. 3

In 1997, Garry Kasparov was defeated by Deep Blue, a computer program written by IBM, running on a supercomputer. 4 In March 2016, Professional Go player Lee Sedol, one of the best players at Go, was beaten by AlphaGo computer program developed by Google DeepMind.

Chess vs. Go In chess, each player begins with 16 pieces of six different types. Each piece type moves differently. The goal of the game is to capture the opponent's king. Go starts with an empty board. At each turn, a player places a stone on the board. Stones all obey the same rules. The goal of the game is to capture as much territory as possible. It can therefore be argued that Go has simpler rules than chess. The complexity of Go is higher than chess. At each game state, a player is faced with a choice of a greater number of possible moves compared to chess (about 250 vs. 35). A typical game in Go might last for 150 moves vs. 80 in chess. 5

Structure relies on two different components: 1) A tree search procedure 2) Convolutional networks that guide the tree search procedure. 6

Network Structure In total, three convolutional networks are trained, of two different kinds: two policy networks and one value network. Both types of networks take as input the current game state, represented as an image. 7

The value network The value network provides an estimate of the value of the current state of the game: what is the probability of the black player to ultimately win the game, given the current state. The input to the value network is the whole game board, and the output is a single number, representing the probability of a win. 8

The policy network The policy networks provide guidance regarding which action to choose, given the current state of the game. The output is a probability value for each possible legal move (i.e. the output of the network is as large as the board). Actions (moves) with higher probability values correspond to actions that have a higher chance of leading to a win. 9

Results of trainings A policy network was trained on 30 million positions from games played by human exports, available at the KGS GO server. An accuracy on a withheld test-set of 57% was achieved. A smaller policy network is trained as well. Its accuracy is much lower (24.2%), but is much faster (2 microseconds instead of 3 milliseconds). 10

The policy network The goal should not be to be as good as possible at predicting human moves, the goal should be to have networks that are optimized to win the game. The policy networks were improved by letting them play against each other, using the outcome of these games as a training signal. This is called reinforcement learning, or even deep reinforcement learning (the networks being trained are deep). 11

Performance The AlphaGo team then tested the performance of the policy networks. They tested their best-performing policy network against Pachi, the strongest open-source Go program. AlphaGo's policy network won 85% of the games against Pachi. A convolutional network was able to outperform a system that relies extensively on search. 12

13

MCTS Monte Carlo Tree Search (MCTS) is used to search the game tree. The idea is to run many game simulations. Each simulation starts at the current game state and stops when the game is won by one of the two players. At first, the simulations are completely random: actions are chosen randomly at each state, for both players. At each simulation, some values are stored, such as how often each node has been visited, and how often this has led to a win. These numbers guide the later simulations in selecting actions. 14

Algorithm AlphaGo's tree search procedure is somewhat similar to MCTS, but is guided by all three types of networks in an innovative manner. AlphaGo uses a mixture of the output of the value network and the result of a self-play simulation of the fast policy network: value of a state = value network output + simulation result. This method suggests a mixture of intuition and reflection. 15

Performance Analysis AI name Elo rating Distributed AlphaGo (2015) 3140 AlphaGo (2015) 2890 CrazyStone 1929 Zen 1888 Pachi 1298 Fuego 1148 GnuGo 431 Elo rating system is used for comparing the strength of players. The difference in the ratings between two players serves as a predictor of the outcome of a match, where higher ratings indicate a higher chance of winning. 16

Performance Analysis AlphaGo ran on 48 CPUs and 8 GPUs and the distributed version of AlphaGo ran on 1202 CPUs and 176 GPUs. 17

Performance Analysis On March 15, 2016, the distributed version of AlphaGo won 4-1 against Lee Sedol, whose Elo rating is now estimated at 3520. The distributed version of AlphaGo is now estimated at 3586. 18

Conclusion The importance of AlphaGo is enormous. The same techniques could be applied not only to robotics and scientific research, but so many other tasks, from Siri-like mobile digital assistants to financial investments. 19

Thank you for your attention 20