Aja Huang Cho Chikun David Silver Demis Hassabis. Fan Hui Geoff Hinton Lee Sedol Michael Redmond

Similar documents
CMPUT 396 Tic-Tac-Toe Game

ARTIFICIAL INTELLIGENCE (CS 370D)

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

Computer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta

Monte Carlo Tree Search

game tree complete all possible moves

Game-playing: DeepBlue and AlphaGo

Playing Games. Henry Z. Lo. June 23, We consider writing AI to play games with the following properties:

INF September 25, The deadline is postponed to Tuesday, October 3

mywbut.com Two agent games : alpha beta pruning

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

Andrei Behel AC-43И 1

Game-playing AIs: Games and Adversarial Search I AIMA

Computing Science (CMPUT) 496

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Adversary Search. Ref: Chapter 5

AI, AlphaGo and computer Hex

2 person perfect information

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

More on games (Ch )

Adversarial Search 1

CS 188: Artificial Intelligence

Artificial Intelligence Adversarial Search

CSE 473: Artificial Intelligence. Outline

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN

CS 771 Artificial Intelligence. Adversarial Search

COMP219: Artificial Intelligence. Lecture 13: Game Playing

AlphaGo and Artificial Intelligence GUEST LECTURE IN THE GAME OF GO AND SOCIETY

Artificial Intelligence. Minimax and alpha-beta pruning

Announcements. Homework 1 solutions posted. Test in 2 weeks (27 th ) -Covers up to and including HW2 (informed search)

More on games (Ch )

CPS 570: Artificial Intelligence Two-player, zero-sum, perfect-information Games

Adversarial Search Lecture 7

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

Using Artificial intelligent to solve the game of 2048

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Today. Types of Game. Games and Search 1/18/2010. COMP210: Artificial Intelligence. Lecture 10. Game playing

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Mastering the game of Go without human knowledge

Game-Playing & Adversarial Search

Adversarial Reasoning: Sampling-Based Search with the UCT algorithm. Joint work with Raghuram Ramanujan and Ashish Sabharwal

Game Playing AI Class 8 Ch , 5.4.1, 5.5

Game Playing. Why do AI researchers study game playing? 1. It s a good reasoning problem, formal and nontrivial.

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

Game Playing AI. Dr. Baldassano Yu s Elite Education

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Adversarial Search (Game Playing)

Artificial Intelligence Lecture 3

Programming Project 1: Pacman (Due )

6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search

Game-Playing & Adversarial Search Alpha-Beta Pruning, etc.

Improving MCTS and Neural Network Communication in Computer Go

CSE 573: Artificial Intelligence

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

Game Playing State-of-the-Art

Conversion Masters in IT (MIT) AI as Representation and Search. (Representation and Search Strategies) Lecture 002. Sandro Spina

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

Adversarial Search and Game Playing. Russell and Norvig: Chapter 5

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

CS885 Reinforcement Learning Lecture 13c: June 13, Adversarial Search [RusNor] Sec

CPS331 Lecture: Search in Games last revised 2/16/10

CS 188: Artificial Intelligence Spring Announcements

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search

AI Approaches to Ultimate Tic-Tac-Toe

Artificial Intelligence

CS229 Project: Building an Intelligent Agent to play 9x9 Go

Learning from Hints: AI for Playing Threes

Applications of Artificial Intelligence and Machine Learning in Othello TJHSST Computer Systems Lab

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Challenges in Monte Carlo Tree Search. Martin Müller University of Alberta

CS 4700: Artificial Intelligence

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Checkpoint Questions Due Monday, October 7 at 2:15 PM Remaining Questions Due Friday, October 11 at 2:15 PM

V. Adamchik Data Structures. Game Trees. Lecture 1. Apr. 05, Plan: 1. Introduction. 2. Game of NIM. 3. Minimax

Game Playing. Garry Kasparov and Deep Blue. 1997, GM Gabriel Schwartzman's Chess Camera, courtesy IBM.

CS 5522: Artificial Intelligence II

CSC321 Lecture 23: Go

CS 4700: Foundations of Artificial Intelligence

Adversarial Search Aka Games

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm

Artificial Intelligence Search III

CSC 380 Final Presentation. Connect 4 David Alligood, Scott Swiger, Jo Van Voorhis

Monte Carlo Tree Search. Simon M. Lucas

Lecture 33: How can computation Win games against you? Chess: Mechanical Turk

ADVERSARIAL SEARCH. Today. Reading. Goals. AIMA Chapter , 5.7,5.8

UNIT 13A AI: Games & Search Strategies

Game Playing State of the Art

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

University of Alberta. Playing and Solving Havannah. Timo Ewalds. Master of Science

CS 188: Artificial Intelligence. Overview

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997)

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Chess Algorithms Theory and Practice. Rune Djurhuus Chess Grandmaster / September 23, 2013

Ar#ficial)Intelligence!!

Transcription:

CMPUT 396 3 hr closedbook 6 pages, 7 marks/page page 1 1. [3 marks] For each person or program, give the label of its description. Aja Huang Cho Chikun David Silver Demis Hassabis Fan Hui Geoff Hinton Lee Sedol Michael Redmond (a) former European go champion who lost to a strong computer program in a 5-game match in 2015 (b) CEO of a leading AI research company (c) lead programmer of AlphaGo (d) former #1 go player who lost to a strong computer program in a 5-game match in 2016 (e) 9dan professional go player and commentator (f) neural net expert whowroteapaperonimage classification (g) first author of Nature paper on AlphaGo (h) former #1 go player who defeated astrongcomputerprogramina3-gamematchin2016. 2. [4 marks] Fill in the blanks, and circle correct answers. AlphaGo integrates neural net calls into search, overcoming the slowness of net calls by (circle all that apply) a) building shallow nets that reply quicker than the initial deeper versions b) handling net calls with GPUs c) having the master algorithm continue to operate while net calls are executing d) having each net call distributed over parallel processes. AlphaGo child-selection uses (circle all that apply) a) a deep policy net c) a shallow policy net d) a shallow value net e) simulations. b) a deep value net In AlphaGo, once a leaf is reached, using a (circle all that apply) cpu / gpu, a call is made on a (circle all that apply) a) deep policy net b) deep value net c) shallow policy net d) shallow value net e) simulation net. Also, at the leaf, a simulation is performed on a (circle all that apply) gpu / cpu. Then, using afractionalweightingof for the call and (these 2 fractions sum to 1.0) for the simulation, the leaf score is backed up the search tree.

CMPUT 396 3 hr closedbook 6 pages, 7 marks/page page 2 3. [4 marks] This is a minimax tree. The root player is max. Each leaf label is the root player s score for that leaf. i) On the diagram, beside each non-leaf node, write the root player s minimax value for that node. ii) Assume that minimax values are found by (recursive) alphabeta search, with children of a node considered starting from the left. For this search, onthediagram, draw2short lines through each edge that is pruned, and draw a box around each leaf node that is examined. A 10 B 9 C 11 D 8 E 7 F 5 G 6 H 4 I 12 J 3 K 2 L 13 4. [3 marks] a b c 1... 2.. x 3 o.. For this tic-tac-toe position with x to move, the minimax value is (circle one) x-win draw o-win. A best move for x is and a best reply for o is.asimpleminimaxsearchwould consider about this many states: (circle one) 3 30 300 3000 30000 300000 7! = 5040 8! = 40320 9! = 362880. for........................... rough........................... work...........................

CMPUT 396 3 hr closedbook 6 pages, 7 marks/page page 3 5. [4 marks] In MCTS, for a child with w wins and v visits, the function f(w, v) =(w + t)/(v +2t) is used to measure win rate instead of g(w, v) =w/v because (circle all that apply) a) f allows the true value to be estimated more quickly b) f returns a value when v is 0, so never divides by 0 c) f increases the statisical significance of the simulations d) f allows quicker recovery from initial unlucky simulations. The MCTS UCB1 formula balances the exploitation of a search with the of children which have received fewer than their siblings. For each child j, theformulais(circle one) a) f(w j,v j )+c ln(v 1 +...v t )/v j b) f(w, v)+c (v 1 +...v t )/v j c) f(w j,v j )+c v j /(v 1 +...v t ) d) f(w, v)+c v j / ln(v 1 +...v t ). MCTS can be improved by adding patterns to simulations: eg. in Go,aftereachsimulationmove, if a move creates a match with a local (ie. around that move) pattern, then (circle one) a) a random move is performed b) the reply move for that pattern is played c) the appropriate player is designated the winner d) the leaf node has its RAVE count increased by 1. Eg. in Go, if a white simulation move is as shown in this local 2 2pattern sequence, then what happens next is. 6. [3 marks] Before 2000, the strongest Go program was as strong as a human with rank (circle one) a) 5 dan b) 9 dan c) 15 kyu d) 30 kyu. MCTS was first used in Go programs around the year. Later Clark and Storkey used records from professional games to build a deep with probability about neural net that predicts the most popular move correctly.eg.their net predicts that the first move on the empty x board will be at the (circle one) a) 4 4 point b) 5 5 point c) 6 6 point d) 7 7 point. Later the company (owned by Google) wrote AlphaGo, which is about as strong a human with rank (circle one) a) 5 dan b) 9 dan c) 15 kyu d) 30 kyu.

CMPUT 396 3 hr closedbook 6 pages, 7 marks/page page 4 7. [4 marks] The Tromp-Taylor rules use superko: a move cannot recreate any previous position. Eg. assume from this 1 5 Gostate black moves to cell 5, resulting in :nowwhitecannotmovetocell because of superko. From this state 1 2 3 with white to play the minimax value for white is with principal variation (circle one) a) w5 pass pass b) w5 pass w2 pass pass c) w5 pass w2 b4 pass b1 w2 pass pass d) w5 pass w2 b4 pass b1 w2 pass w3 pass pass. For each n in the table, give the first-player minimax value for 1 n Go. n 1 2 3 4 5 6 value 8. [3 marks] For a position in a 2-player game with players x,o and player-to-move x, hereisamcts tree at some point in execution. Node labels show the associated move. Now a simulation occurs at the leaf node whose path from the root is -c-e-b, playout-f-d-a, resultx win. For this extended playout, the moves made by x were and by o were. In the table below, give the change (leave it blank if no change) to each node s wins, visits, rave-wins, rave-visits that happens during backup. Column a will be all blank. a b c d e f ca cb cd ce cf cea ceb ced cef w v ravew ravev a b c d e f ca cb cd ce cf cea ceb ced cef

CMPUT 396 3 hr closedbook 6 pages, 7 marks/page page 5 9. [3 marks] For this Hex position, after black plays at b4, black has a winning virtual connection using cells {a5,b5} and {a1,a2,a3,b1,b2,c1,c2,d1}. Similarly,afterblack plays at c2, black has a winning virtual connection using cells {c1,d1} and {a4,a5,b4} and {e2,d3} and {e4,d5,e5}. So, for this position with white to play, white must play at one of the cells in { }, otherwiseblackcan win. For this position with white to play, the set of all winning white moves is { }. So far, the largest Hex boardsize on which winning opening moves have been found is. 1 a b c d e 2 3 4 5 10. [2 marks] For the nim state with piles 10 9 8 5, list all winning moves below (if there is no winning move, leave the blank empty). On the side, show your work. From the 10 pile, remove From the 9 pile, remove From the 8 pile, remove From the 5 pile, remove 11. [2 marks] 5 2 3 1 8 4 7 6 _ The number of inversions of this sliding tile puzzle is (an number) and the number of columns is an odd number, so this puzzle (circle one) is / is not solvable. Consider Python implementations of these algorithms that solve 5x5 (and smaller) sliding tile puzzles: the A* algorithm described in class, and a special-purpose (SP) algorithm using the method from the youtube video discussed in class. (circle all that apply) a) A* with the Manhattan heuristic is usually faster than A* with themisplacedtilesheuristic. b) A* with the Manhattan heuristic is usually faster than SP, because it finds a shortest path through the state space. c) A* with the Manhattan heuristic is usually slower than SP, because SP does not search the whole state space. d) The runtime for SP on 5x5 puzzles is about 1.25 time the runtime for SP on 4x4 puzzles.

CMPUT 396 3 hr closedbook 6 pages, 7 marks/page page 6 12. [1 marks] Recall: in Go a group of stones is unconditionally safe if the opponent cannot kill the group, even if the player always passes. The simplest kind of unconditionally safe group is one that has at least. 13. [6 marks] Consider these games: hex on an 6 6 board,goona1 18 board, and tic-tac-toe on an 6 6 board(wheretowin,youneed4inarow). Foreachgameyouwanttowriteacomputer solver (so, an agent that finds a move with best minimax score). For 6x6 hex, (circle all that apply) a) the game can be solved in a reasonable amount of time using only alphabeta search, since the game tree has only 36! 4 10 41 nodes b) implementing a transposition table is not difficult, especially since there are no draws c) the game can be solved in a reasonable amount of time using only alphabeta search, a transposition table, and symmetry pruning, since the solving dag then has about 10 30 states d) the game can be solved in a reasonable amount of time using alphabeta search, a transposition table, and pruning with symmetry, mustplay, and inferior cells, since the solving dag then has about 10 15 states For 1x18 go, (circle all that apply) a) the game can be solved in a reasonable amount of time using only alphabeta search since the game tree has only 18! 6 10 15 nodes b) implementing a transposition table is not difficult, since the winner scores at most 18 points c) the game can be solved in a reasonable amount of time using only alphabeta search and a transposition table since the solving dag then has about 10 12 states d) the solver can be improved by recognizing commonly occuring unconditionallly safe groups. For 6x6 tic-tac-toe, (circle all that apply) a) the game can be solved in a reasonable amount of time using only alphabeta search since the game tree has only 36! nodes b) implementing a transposition table is not difficult, as there are only 3 possible outcome values, c) the game can be solved in a reasonable amount of time using only alphabeta search and a transposition table, since the game is likely to end in a draw, d) the solver can be improved using mustplay pruning (ie. play hereorlose).