CS 387: GAME AI BOARD GAMES

Similar documents
CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

CS 387/680: GAME AI BOARD GAMES

CS 387: GAME AI BOARD GAMES. 5/24/2016 Instructor: Santiago Ontañón

More on games (Ch )

More on games (Ch )

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversary Search. Ref: Chapter 5

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

Monte Carlo Tree Search

CS 680: GAME AI WEEK 4: DECISION MAKING IN RTS GAMES

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1

CS-E4800 Artificial Intelligence

Adversarial Reasoning: Sampling-Based Search with the UCT algorithm. Joint work with Raghuram Ramanujan and Ashish Sabharwal

43.1 Introduction. Foundations of Artificial Intelligence Introduction Monte-Carlo Methods Monte-Carlo Tree Search. 43.

game tree complete all possible moves

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

Game-playing: DeepBlue and AlphaGo

A Bandit Approach for Tree Search

CS 380: ARTIFICIAL INTELLIGENCE

Monte Carlo tree search techniques in the game of Kriegspiel

CS 188: Artificial Intelligence

Andrei Behel AC-43И 1

mywbut.com Two agent games : alpha beta pruning

Advanced Game AI. Level 6 Search in Games. Prof Alexiei Dingli

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

Monte Carlo Tree Search. Simon M. Lucas

Adversarial Search 1

The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games

Playing Games. Henry Z. Lo. June 23, We consider writing AI to play games with the following properties:

CS188 Spring 2014 Section 3: Games

Foundations of Artificial Intelligence

Artificial Intelligence. Minimax and alpha-beta pruning

Exploration exploitation in Go: UCT for Monte-Carlo Go

Artificial Intelligence

An AI for Dominion Based on Monte-Carlo Methods

Programming Project 1: Pacman (Due )

Adversarial Search: Game Playing. Reading: Chapter

Artificial Intelligence

Game Playing AI Class 8 Ch , 5.4.1, 5.5

TTIC 31230, Fundamentals of Deep Learning David McAllester, April AlphaZero

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search

CS188 Spring 2010 Section 3: Game Trees

ARTIFICIAL INTELLIGENCE (CS 370D)

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

Game Playing Part 1 Minimax Search

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13

Adversarial Search and Game Playing. Russell and Norvig: Chapter 5

Adversarial Search (Game Playing)

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Game-Playing & Adversarial Search

CS510 \ Lecture Ariel Stolerman

CS 5522: Artificial Intelligence II

TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Game-playing AIs: Games and Adversarial Search I AIMA

CS 4700: Artificial Intelligence

Game Playing State-of-the-Art

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

CS 188: Artificial Intelligence. Overview

Game playing. Outline

Foundations of AI. 6. Board Games. Search Strategies for Games, Games with Chance, State of the Art

Artificial Intelligence

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

CS188 Spring 2010 Section 3: Game Trees

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

Artificial Intelligence

Artificial Intelligence Adversarial Search

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

Playing Othello Using Monte Carlo

MONTE-CARLO TWIXT. Janik Steinhauer. Master Thesis 10-08

CS 331: Artificial Intelligence Adversarial Search II. Outline

More Adversarial Search

Ar#ficial)Intelligence!!

CS 771 Artificial Intelligence. Adversarial Search

Artificial Intelligence

Multiple Agents. Why can t we all just get along? (Rodney King)

Game Engineering CS F-24 Board / Strategy Games

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

Adversarial Search. Chapter 5. Mausam (Based on slides of Stuart Russell, Andrew Parks, Henry Kautz, Linda Shapiro) 1

Adversarial Search. Robert Platt Northeastern University. Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA

CS 188: Artificial Intelligence Spring Announcements

Game Playing: Adversarial Search. Chapter 5

CS61B Lecture #22. Today: Backtracking searches, game trees (DSIJ, Section 6.5) Last modified: Mon Oct 17 20:55: CS61B: Lecture #22 1

School of EECS Washington State University. Artificial Intelligence

CMPUT 396 Tic-Tac-Toe Game

Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning

2 person perfect information

Games (adversarial search problems)

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Transcription:

CS 387: GAME AI BOARD GAMES 5/28/2015 Instructor: Santiago Ontañón santi@cs.drexel.edu Class website: https://www.cs.drexel.edu/~santi/teaching/2015/cs387/intro.html

Reminders Check BBVista site for the course regularly Also: https://www.cs.drexel.edu/~santi/teaching/2015/cs387/intro.html Today, project 4 submission deadline

Outline Board Games Game Tree Search Portfolio Search Monte Carlo Methods UCT

Outline Board Games Game Tree Search Portfolio Search Monte Carlo Methods UCT

Go Board is 19x19 Branching factor: Starts at 361 Decreases (more or less) in 1 after every move Compare Go and Chess: Chess, assume a branching factor of 35: Search at depth 4: 1,500,625 nodes Go, assume a branching factor of 300: Search at depth 4: 8,100,000,000 nodes What can we do?

Monte Carlo Methods Algorithms that rely on random sampling to find solution approximations. Example: Monte Carlo integration Imagine that I ask you to compute the following value: A = Z 3 1 sin(x) 1 1 x 2 dx

Monte Carlo Methods Method 1: Symbolic integration You could fetch your calculus book, integrate the function, etc. But this method you ll have to do by hand (did you know that automatic symbolic integration is still unsolved?) Method 2: Numerical computations Simpson method, etc. (recall from calculus?) Method 3: Monte Carlo

Monte Carlo Methods Method 3: Monte Carlo Repeat N times: Pick a random x between 1 and 3 Evaluate f(x) Now do the average and multiply by 2 (3 1) Voilà! f(x) =sin(x) 1 1 x 2 dx The larger N, the better the approximation

Monte Carlo Methods Idea: Use random sampling to approximate the solution to complex problems How can we apply this idea to adversarial search? The answer to this question is the responsible for having computer programs that can play Go at master level. See http://en.wikipedia.org/wiki/computer_go#recent_results for recent results

Outline Board Games Game Tree Search Monte Carlo Methods UCT

Monte-Carlo Tree Search Upper Confidence Tree (UCT) is a state of the art, simple variant of Monte-Carlo Tree Search, responsible for the recent success of Computer Go programs Ideas: Sampling smartly (UCB) Instead of opening the whole Minimax tree or play N random games open only the upper part of the tree, and play random games from there

Monte Carlo Tree Search Tree Search 0/0 Current State Monte-Carlo Search Current state w/t is the account of how many games starting from this state have be found to be won out of the total games explored in the current search

Monte Carlo Tree Search Tree Search 0/0 0/0 Monte-Carlo Search At each iteration, one node o the tree (upper part) is selecte and expanded (one node adde to the tree). From this new nod a complete game is played ou at random (Monte-Carlo)

Monte Carlo Tree Search Tree Search 0/1 0/1 Monte-Carlo Search This is called a playout loss At each iteration, one node o the tree (upper part) is selecte and expanded (one node adde to the tree). From this new nod a complete game is played ou at random (Monte-Carlo)

Monte Carlo Tree Search Tree Search 1/2 1/1 0/1 Monte-Carlo Search At each iteration, one node o the tree (upper part) is selecte and expanded (one node adde to the tree). From this new nod a complete game is played ou at random (Monte-Carlo) win

Monte Carlo Tree Search Tree Search 2/3 2/2 0/1 1/1 Monte-Carlo Search The counts w/t are used to determine which nodes to explore next. Exploration/Exploitation, e.g: 1) Some probability of expanding the best node 2) Some probability of expanding one at random win

Monte Carlo Tree Search Tree Search 2/3 2/2 0/1 1/1 Monte-Carlo Search The counts w/t are used to determine which nodes to explore next. Exploration/Exploitation, e.g: 1) Some probability of expanding the best node 2) Some probability of expanding one at random win As we will see, we want to expand the best node with higher probability than any of the others

Monte Carlo Tree Search Tree Search 2/4 2/3 0/1 1/1 0/1 Monte-Carlo Search The tree ensures all relevant actions are explored (greatly alleviates the randomness that affects Monte-Carlo methods) loss

Monte Carlo Tree Search Tree Search 2/4 2/3 0/1 1/1 0/1 Monte-Carlo Search loss The random games played from each node of the tree serve to estimate the Utility function. They can be random, or use an opponent model (if available)

MCTS Algorithm MCTS(state, player) tree = new Node(state, player) Repeat until computation budget is exhausted node = treepolicy(tree) if (node.isterminal) child = node else child = node.nextchild(); R = playout(child) child.propagatereward(r) Return tree.bestchild();

Tree Policy: ε-greedy Given a list of children, which one do we explore in a given iteration of MCTS? Ideally, we want: To spend more time exploring good children (no point wasting time on bad children) But spend some time exploring bad children, just in case they are actually good (since evaluation is stochastic, it might happen that we were just unlucky, and a child we thought was bad is actually good). Simplest idea: ε-greedy With probability 1-ε: choose the current best With probability ε: choose one at random

Which is the best child? Tree Search 60/100 55/85 1/10 4/5 Monte-Carlo Search

Which is the best child? Tree Search 60/100 55/85 1/10 4/5 Monte-Carlo Search This one only wins about 65% of the time, but we have sampled it 85 times. This one seems to fin 80% of the times, but we only have sampled it 5 times.

Which is the best child? Tree Search 60/100 This one is safer (we cannot be sure the other one is good, unless we sample it more times) 55/85 1/10 4/5 Monte-Carlo Search This one only wins about 65% of the time, but we have sampled it 85 times. This one seems to fin 80% of the times, but we only have sampled it 5 times.

Monte Carlo Tree Search After a fixed number of iterations K (or after the assigned time is over), MCTS analyzes the resulting tree, and the selected action is that with the highest win ratio (or that with the highest visit count). MCTS algorithms do not explore the whole game tree: They sample the game tree They spend more time in those moves that are more promising Any-time algorithms (they can be stopped at any time) It can be shown theoretically that when K goes to infinity, the values assigned to each action in the MCTS tree converge to those computed by minimax. MCTS algorithms are the standard algorithms for modern Go playing programs

Tree Policy: Can we do Better? We just learned the ε-greedy policy, but is there a way to do better? ε-greedy is robust, but, for example: If there are 3 children: A: 40/100 B: 39/100 C: 2/100

Tree Policy: Can we do Better? We just learned the ε-greedy policy, but is there a way to do better? ε-greedy is robust, but, for example: If there are 3 children: A: 40/100 B: 39/100 C: 2/100 B and C have the same probability of being chosen, while B is clearly better! After lots of iterations, when one child is already clearly better, ε- greedy still keeps sampling bad children with probability ε. So can we do better?

Multi-Armed Bandit (MAB) Imagine you go to Vegas, and you see a row of bandits : You have no idea which machine gives you the best chance of winning. You have 100 tokens to use. How can you maximize your expected winnings?

Multi-Armed Bandit (MAB) Imagine you go to Vegas, and you see a row of bandits : You have no idea which machine gives you the best chance of winning. How do you spread your CPU You have 100 tokens to use. In terms of game-tree search: - You have a collection of moves - You only have CPU time to sample them a few times resources as for figuring out which is the best one? How can you maximize your expected winnings?

MAB: ε-greedy ε-greedy is the simplest strategy: Input: ε (number between 0 and 1, usually small: e.g., 0.1) For each arm : Keep track of the average earnings so far: earnings/(coins spent) Before spending the next coin: Figure out which is the best arm With probability 1 ε: put the coin in the best arm With probability ε: choose one at random

MAB: ε-greedy How well does ε-greedy work? Imagine that there were 3 arms: A: expected reward $1 per coin B: expected reward $2 per coin C: expected reward $0.1 per coin Expected earnings per coin over time (ε = 0.1):

MAB: ε-greedy How well does ε-greedy work? Imagine that there were 3 arms: A: expected reward $1 per coin B: expected reward $2 per coin C: expected reward $0.1 per coin Expected earnings per coin over time (ε = 0.1): 2.5 2 1.5 1 A B C e-greedy 0.5 0

MAB: ε-greedy How well does ε-greedy work? Imagine that there were 3 arms: A: expected reward $1 per coin Even after many coins spent, when B: expected reward $2 per coin we clearly know which is the best C: expected reward $0.1 per coin arm, we are still spending coins with probability ε in other arms. Expected earnings per coin over time (ε = 0.1): 2.5 2 1.5 1 A B C e-greedy 0.5 0

MAB: UCB1 Upper Confidence Bounds Better balance between exploration and exploitation UCB value for a given arm i: v i + C ln N n i Number of coins used so far in total Expected reward so far Number of coins used in this arm so far

MAB: UCB1 Upper Confidence Bounds Better balance between exploration and exploitation UCB value for a given arm i: v i + C ln N n i This is high for arms that we believe to be good. This is high for arms that we have not explore much yet

MAB: UCB1 Upper Confidence Bounds Better balance between exploration and exploitation UCB value for a given arm i: v + C 2.5 i ln N UCB1 employs less and less coins on suboptimal arms over time. 2 1.5 This is high for arms that we 1 believe to be good. 0.5 n i This is high for arms UCB1 that we have not explore much yet A B C 0

UCT UCT(state, player) tree = new Node(state, player) Repeat until computation budget is exhausted node = selectnodeviaucb1(tree) if (node.isterminal) child = node else child = node.nextchild(); R = playout(child) child.propagatereward(r) Return tree.bestchild();

UCT UCT(state, player) tree = new Node(state, player) Repeat until computation budget is exhausted node = selectnodeviaucb1(tree) if (node.isterminal) child = node else child = node.nextchild(); R = playout(child) child.propagatereward(r) Return tree.bestchild(); This means using UCB1 at the root to select a child, and then recursively repeat, until we reach a leaf.

Games using MCTS Variants Go playing programs: MoGo CrazyStone Valkyria Pachi Fuego The Many Faces of Go Zen

Games using MCTS Variants Card Games: Prismata

Games using MCTS Variants Strategy Games: TOTAL WAR: ROME II: http://aigamedev.com/open/coverage/mcts-rome-ii/

Extensions What happens to Game Tree Search when a game has an element of chance? E.g., rolling a dice? What happens to Game Tree Search when the state is not fully observable? (this is more complex)

Extensions What happens to Game Tree Search when a game has an element of chance? E.g., rolling a dice? Add additional average layers on the tree What happens to Game Tree Search when the state is not fully observable? (this is more complex) One possibility: sample the possible states, and then do game tree search as if it was fully observable.

Summary Minimax (+ alpha-beta): For perfect information, deterministic games Low branching factor When game is too complex (large branching factor): Portfolio Search Monte Carlo Tree Search Extensions for: Stochastic games (e.g., dice rolling) Partial information (e.g., players cannot see the cards of others)