Learning from Hints: AI for Playing Threes

Size: px
Start display at page:

Download "Learning from Hints: AI for Playing Threes"

Transcription

1 Learning from Hints: AI for Playing Threes Hao Sheng (haosheng), Chen Guo (cguo2) December 17, Introduction The highly addictive stochastic puzzle game Threes by Sirvo LLC. is Apple Game of the Year 2014, and also the ancestor of all the number merging game like However, Threes has more complicated rules and suffer less from random effects, hence needs more planning and strategy. Another big difference is that most of the sliding games are time homogeneous in the sense of incrementation: Although there might be some randomness after each action, the distribution of incoming blocks/numbers does not depend on action or current state. This classical property makes the problem easy to tackle (with the help of reinforcement learning) but hard to learn from when we try to move back to real world problems. The game we choose to work on, Threes, however, is adaptive as the game goes on (see the next section). It will change the transition distribution but also provides hints on it so a player can get a rough sense of what next step would look like. Ideally, not only current action but also some long-term policy/strategy should shift as the distribution of future changes. We would like to model the problem as MDP and compare Expetimax Algorithm, Monte Carlo Tree Search, and Q-learning. 2 Literature Review A lot of theoretical and practical research has been done on the similar game 2048, which is more popular because of simpler rules. Some algorithms are employed for 2048, such as minimax search with alpha-beta pruning and expectimax optimization, as well as other hard-coded solutions. In [4], temporal difference learning is also used to compute the value function as an intermediate step towards learning decision making. However, suffering more from randomness, it seems more difficult for Q-learning to outperform expetimax with glaring flaws of gigantic branching factor and the need for a human-programmed heuristic to evaluate any given board/state. Monte Carlo Tree Search is known for achieving successful results when playing finiteaction, finite-length, perfect information games. Although there s not much work related with using MCTS to play sliding games like Threes or 2048, the mechanic of games is 1

2 appropriate for it. In [5], the method is used to play Settlers of Catan. They made use of domain knowledge pretty well, not only by setting weights to actions, but also giving virtual wins to preferred states and virtual losses to bad states. Deep reinforcement learning plays a more and more important role in Q-Learning role recently. With enough amount of data, the neural network can help the agent learn well from only raw information from the game, compared with handcrafted features. For example, [6] chose to directly operate on RGB images. For our game, deep reinforcement learning might be helpful for evaluating the interactions of different tiles on the board. 3 Set-up Figure 1: The game Threes 3.1 Rules of Threes The rules of Threes can be described as follows (See?? for the mobile application): 1. Merging Rules: 1 and 2 can be merged to get to get a 3. Two same tiles with number larger than 3 can be merged. 2. Scoring Rules: The total score is the sum of the scores of every single tile, which is 3 log 2 (k/3)+1 with number k. 3. New Tile: 2

3 The new tile is drawn from a shuffled deck with 12 cards of 4 ones, 4 twos and 4 threes. And the deck would be regenerated after used up. When the largest tile on board m > 96, with probability 1/24, the new tile isn t drawn from deck. It s randomly generated in the range [6,..., m/4] instead. The new tile would only appear in the columns/rows moved in the last step. 3.2 Modeling as MDP We can model this problem as a Markov Decision Process. We introduce some notations here for convenience. D = [D 1, D 2, D 3 ] is the current deck, where D[k] is the number of ks in the deck. The tiles on board are a sequence of B where the tile on the position [i, j] is mapped to B[4(i 1) + j]. The hint for next tile is H. To make the game more intuitive, we simplify the hint mechanism from the original game. In original game, the hint wold be given as a list of three possible cards when the card isn t drawn from deck. Considering that this hardly happens, we simplify the hint to be determined. As the number distribution on the board after taking an action differs a lot and it s hard to describe in language due to the game mechanism, we define Succ(b, a, h) is the set of possible board situations after taking action a on b with new tile h, where every situations are equally possible. In practical implementation, it wouldn t be difficult to retrieve Succ(b, a, h), whose size is at most 4. Now we have: State: s = (b, d, h) Action(s) = {a {Up, Down, Left, Right} a is valid action} Trans Prob: T (s, a, s ) = 1 max{b} Succ(b, a, h) log 3 (max{b}/2) 2 for b Succ(b, a, h), d = [d i 1 h=i ] i=1,2,3, h [6,..., max{b} ] 4 = (1 1 max{b} 96 1 d [h ] ) 24 Succ(b, a, h) d for b Succ(b, a, h), d = [d i 1 h=i ] i=1,2,3, h {1, 2, 3} = 0 otherwise Reward:R(s, a, s ) = t b 3 log 2 (t/3)+1 t b 3 log 2 (t/3)+1 3

4 The MDP described above might look a little complicated. But it s much easier to actually implemented in programs since the sliding and merging mechanism would give all the possible successive states. 4 Approaches 4.1 Expectimax Algorithm We used Expectimax Algorithm to solve this problem at first. The value evaluation function is the key to optimizing the algorithm performance. By simply using the reward of every step R(s, a, s ), we can benchmark our performance. Notice that the reward has nothing to do with the random stuff in s like the new hint, this baseline is actually a greedy algorithm. We also tried a heuristic evaluation function as Value(s) =Score(s) merges merges 48 monotonicity + corner large + 50 (maxrank maxrank ) empty2 filled t t b where merges is the number of potentially mergeable tile pairs in next step, 1-2-merges is the number of mergable one-two pairs, corner is the indicator that if the maximum tile is in the corner, monotonicity is a special measure of monotonicity of rows and columns, maxrank is the log 2 maxtile/3 + 3, large is the indicator that if the maxrank is larger than 7, empty is the number of empty tiles while filled is the number of non-empty tiles. 4.2 Monte Carlo Tree Search The idea of using Monte Carlo Tree Search is from poster session. The method keeps track of a certain number of successive states with different actions given a starting state. The number of valid actions for Threes is pretty small, which makes things easier to keep track of more states. Running Monte Carlo Simulation on the afterward states keeps updating the values of the states we have in record. Practically, we keep track of different number of nodes and do experiments on them. When selecting nodes during recursive tree search, we consider Upper Confidence Bounds (UCB) formula log N v i + C n i where v i is the estimated value of the node, n i is the number of times the nodes has been visited. N is the number of times of its parent has been visited. This method actually finds a balance between exploration and exploitation. 4

5 4.3 Q-Learning We tried deep Q-Learning to both capture the information on the board and also the future value of the potential position of the hint. The structure of our model is shown in Figure 2. Figure 2: Deep Q-Learning 5 Experiments We did experiments for: Expectimax Algorithm with 2,000 examples separately for: baseline with d = 1 or d = 2, and evaluation function with d = 1 and d = 2. MCTS with different number of nodes in record: 5, 10, 20, 30, 40. Q-Learning with linear model with direct information and heuristic feature mapper. However, we didn t got the time to train neural network model yet. The basic statistics for these experiments are in Table 1. For Expectimax Algorithm, The histogram plots and log-scale ranking plot are in the Figure??. From the results, we can see that the evaluation function helps a lot in terms of performance, while simply increasing the depth from 1 to 2 helps a little. Also, noticing that the running time increases a lot with depth as 2, we probably don t want to increase it more. For MCTS Algorithm, we did experiments on different number of nodes in record. The results are shown in Figure 4. 5

6 Mean Min Max Time Per Run Baseline (d = 1) , s Baseline (d = 2) ,239 1s Expectimax (d = 1) , s Expectimax (d = 2) ,682 17s MCTS (n = 5) 2, ,401 5s MCTS (n = 10) 4, ,068 10, s MCTS (n = 20) 5, ,395 22, s MCTS (n = 30) 5, ,310 20, s MCTS (n = 40) 3, ,795 1m21s Q-Learning , s Table 1: Basic Statistics of Experiments Figure 3: Results of Experiments for Expectimax Algorithm To compare different methods, we can refer to Figure 5. We can see that MCTS performs much stabler. We can see that it deals with randomness pretty well. However, Q-Learning doesn t perform well given the limited training time we have. 6

7 (a) Histogram for MCTS Algorithm (b) Ranking for MCTS Algorithm Figure 4: Results of Experiments for MCTS Algorithm 6 Summary and Discussion Threes is a game with few actions and many states. Compared with other similar games, it gives more information and needs more strategies. We modeled the game as MDP and then tried different approaches. Figure 5: Results of Experiments for Different Algorithms The simple Expectimax Algorithm is short-sighted, and it is hard to transfer our domain knowledge into evaluation functions. Simply increasing searching depth provides limited improvement. MCTS performs well, though it fits better with the mechanism of the game and finds a 7

8 balance between randomness and strategy. It chooses the action that copes best with the future randomness. The result is better than an intermediate human player. Q-Learning takes more time to train the feature parameters. By now, we have not got the time to train the model thoroughly because of the huge number of all the possible states. The current performance is just mediocre, 50% better than the baseline. 8

9 References [1] Jason Lewis, Alec Powell & Jai Sajnani 2048: An AI Agent [2] Hadi Pouransari & Saman Ghili, AI algorithms for the game 2048, CS221 project [3] Prithvi RamakrishnanSolving 2048, CS221 project [4] Szubert, M., & Jakowski, W. (2014, August). Temporal difference learning of N-tuple networks for the game In 2014 IEEE Conference on Computational Intelligence and Games (pp. 1-8). IEEE. [5] Szita, I., Chaslot, G., & Spronck, P. (2009, May). Monte-carlo tree search in settlers of catan. In Advances in Computer Games (pp ). Springer Berlin Heidelberg. [6] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arxiv preprint arxiv:

Using Artificial intelligent to solve the game of 2048

Using Artificial intelligent to solve the game of 2048 Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial

More information

More on games (Ch )

More on games (Ch ) More on games (Ch. 5.4-5.6) Alpha-beta pruning Previously on CSci 4511... We talked about how to modify the minimax algorithm to prune only bad searches (i.e. alpha-beta pruning) This rule of checking

More information

More on games (Ch )

More on games (Ch ) More on games (Ch. 5.4-5.6) Announcements Midterm next Tuesday: covers weeks 1-4 (Chapters 1-4) Take the full class period Open book/notes (can use ebook) ^^ No programing/code, internet searches or friends

More information

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage Richard Kelly and David Churchill Computer Science Faculty of Science Memorial University {richard.kelly, dchurchill}@mun.ca

More information

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence Adversarial Search CS 486/686: Introduction to Artificial Intelligence 1 Introduction So far we have only been concerned with a single agent Today, we introduce an adversary! 2 Outline Games Minimax search

More information

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) AI Plays 2048 Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) Abstract The strategy game 2048 gained great popularity quickly. Although it is easy to play, people cannot win the game easily,

More information

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence Adversarial Search CS 486/686: Introduction to Artificial Intelligence 1 AccessAbility Services Volunteer Notetaker Required Interested? Complete an online application using your WATIAM: https://york.accessiblelearning.com/uwaterloo/

More information

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.

More information

CS221 Project Final Report Gomoku Game Agent

CS221 Project Final Report Gomoku Game Agent CS221 Project Final Report Gomoku Game Agent Qiao Tan qtan@stanford.edu Xiaoti Hu xiaotihu@stanford.edu 1 Introduction Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally

More information

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH Santiago Ontañón so367@drexel.edu Recall: Adversarial Search Idea: When there is only one agent in the world, we can solve problems using DFS, BFS, ID,

More information

2048: An Autonomous Solver

2048: An Autonomous Solver 2048: An Autonomous Solver Final Project in Introduction to Artificial Intelligence ABSTRACT. Our goal in this project was to create an automatic solver for the wellknown game 2048 and to analyze how different

More information

Game-playing: DeepBlue and AlphaGo

Game-playing: DeepBlue and AlphaGo Game-playing: DeepBlue and AlphaGo Brief history of gameplaying frontiers 1990s: Othello world champions refuse to play computers 1994: Chinook defeats Checkers world champion 1997: DeepBlue defeats world

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search 1 By the end, you will know Why we use Monte Carlo Search Trees The pros and cons of MCTS How it is applied to Super Mario Brothers and Alpha Go 2 Outline I. Pre-MCTS Algorithms

More information

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault CS221 Project Final Report Deep Q-Learning on Arcade Game Assault Fabian Chan (fabianc), Xueyuan Mei (xmei9), You Guan (you17) Joint-project with CS229 1 Introduction Atari 2600 Assault is a game environment

More information

Intuition Mini-Max 2

Intuition Mini-Max 2 Games Today Saying Deep Blue doesn t really think about chess is like saying an airplane doesn t really fly because it doesn t flap its wings. Drew McDermott I could feel I could smell a new kind of intelligence

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

Monte Carlo Tree Search. Simon M. Lucas

Monte Carlo Tree Search. Simon M. Lucas Monte Carlo Tree Search Simon M. Lucas Outline MCTS: The Excitement! A tutorial: how it works Important heuristics: RAVE / AMAF Applications to video games and real-time control The Excitement Game playing

More information

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

Implementation of Upper Confidence Bounds for Trees (UCT) on Gomoku

Implementation of Upper Confidence Bounds for Trees (UCT) on Gomoku Implementation of Upper Confidence Bounds for Trees (UCT) on Gomoku Guanlin Zhou (gz2250), Nan Yu (ny2263), Yanqing Dai (yd2369), Yingtao Zhong (yz3276) 1. Introduction: Reinforcement Learning for Gomoku

More information

mywbut.com Two agent games : alpha beta pruning

mywbut.com Two agent games : alpha beta pruning Two agent games : alpha beta pruning 1 3.5 Alpha-Beta Pruning ALPHA-BETA pruning is a method that reduces the number of nodes explored in Minimax strategy. It reduces the time required for the search and

More information

CS188 Spring 2014 Section 3: Games

CS188 Spring 2014 Section 3: Games CS188 Spring 2014 Section 3: Games 1 Nearly Zero Sum Games The standard Minimax algorithm calculates worst-case values in a zero-sum two player game, i.e. a game in which for all terminal states s, the

More information

Playing Atari Games with Deep Reinforcement Learning

Playing Atari Games with Deep Reinforcement Learning Playing Atari Games with Deep Reinforcement Learning 1 Playing Atari Games with Deep Reinforcement Learning Varsha Lalwani (varshajn@iitk.ac.in) Masare Akshay Sunil (amasare@iitk.ac.in) IIT Kanpur CS365A

More information

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1 Last update: March 9, 2010 Game playing CMSC 421, Chapter 6 CMSC 421, Chapter 6 1 Finite perfect-information zero-sum games Finite: finitely many agents, actions, states Perfect information: every agent

More information

Experiments on Alternatives to Minimax

Experiments on Alternatives to Minimax Experiments on Alternatives to Minimax Dana Nau University of Maryland Paul Purdom Indiana University April 23, 1993 Chun-Hung Tzeng Ball State University Abstract In the field of Artificial Intelligence,

More information

Adversary Search. Ref: Chapter 5

Adversary Search. Ref: Chapter 5 Adversary Search Ref: Chapter 5 1 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans is possible. Many games can be modeled very easily, although

More information

CS-E4800 Artificial Intelligence

CS-E4800 Artificial Intelligence CS-E4800 Artificial Intelligence Jussi Rintanen Department of Computer Science Aalto University March 9, 2017 Difficulties in Rational Collective Behavior Individual utility in conflict with collective

More information

An AI for Dominion Based on Monte-Carlo Methods

An AI for Dominion Based on Monte-Carlo Methods An AI for Dominion Based on Monte-Carlo Methods by Jon Vegard Jansen and Robin Tollisen Supervisors: Morten Goodwin, Associate Professor, Ph.D Sondre Glimsdal, Ph.D Fellow June 2, 2014 Abstract To the

More information

Five-In-Row with Local Evaluation and Beam Search

Five-In-Row with Local Evaluation and Beam Search Five-In-Row with Local Evaluation and Beam Search Jiun-Hung Chen and Adrienne X. Wang jhchen@cs axwang@cs Abstract This report provides a brief overview of the game of five-in-row, also known as Go-Moku,

More information

CS 387: GAME AI BOARD GAMES

CS 387: GAME AI BOARD GAMES CS 387: GAME AI BOARD GAMES 5/28/2015 Instructor: Santiago Ontañón santi@cs.drexel.edu Class website: https://www.cs.drexel.edu/~santi/teaching/2015/cs387/intro.html Reminders Check BBVista site for the

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory AI Challenge One 140 Challenge 1 grades 120 100 80 60 AI Challenge One Transform to graph Explore the

More information

AI Approaches to Ultimate Tic-Tac-Toe

AI Approaches to Ultimate Tic-Tac-Toe AI Approaches to Ultimate Tic-Tac-Toe Eytan Lifshitz CS Department Hebrew University of Jerusalem, Israel David Tsurel CS Department Hebrew University of Jerusalem, Israel I. INTRODUCTION This report is

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 7: Minimax and Alpha-Beta Search 2/9/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein 1 Announcements W1 out and due Monday 4:59pm P2

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificial Intelligence Spring 2007 Lecture 7: CSP-II and Adversarial Search 2/6/2007 Srini Narayanan ICSI and UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell or

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

Programming Project 1: Pacman (Due )

Programming Project 1: Pacman (Due ) Programming Project 1: Pacman (Due 8.2.18) Registration to the exams 521495A: Artificial Intelligence Adversarial Search (Min-Max) Lectured by Abdenour Hadid Adjunct Professor, CMVS, University of Oulu

More information

CS885 Reinforcement Learning Lecture 13c: June 13, Adversarial Search [RusNor] Sec

CS885 Reinforcement Learning Lecture 13c: June 13, Adversarial Search [RusNor] Sec CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4 CS885 Spring 2018 Pascal Poupart 1 Outline Minimax search Evaluation functions Alpha-beta pruning CS885

More information

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur Module 3 Problem Solving using Search- (Two agent) 3.1 Instructional Objective The students should understand the formulation of multi-agent search and in detail two-agent search. Students should b familiar

More information

Automated Suicide: An Antichess Engine

Automated Suicide: An Antichess Engine Automated Suicide: An Antichess Engine Jim Andress and Prasanna Ramakrishnan 1 Introduction Antichess (also known as Suicide Chess or Loser s Chess) is a popular variant of chess where the objective of

More information

Playing Games. Henry Z. Lo. June 23, We consider writing AI to play games with the following properties:

Playing Games. Henry Z. Lo. June 23, We consider writing AI to play games with the following properties: Playing Games Henry Z. Lo June 23, 2014 1 Games We consider writing AI to play games with the following properties: Two players. Determinism: no chance is involved; game state based purely on decisions

More information

Playing Othello Using Monte Carlo

Playing Othello Using Monte Carlo June 22, 2007 Abstract This paper deals with the construction of an AI player to play the game Othello. A lot of techniques are already known to let AI players play the game Othello. Some of these techniques

More information

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions Slides by Svetlana Lazebnik, 9/2016 Modified by Mark Hasegawa Johnson, 9/2017 Types of game environments Perfect

More information

Aja Huang Cho Chikun David Silver Demis Hassabis. Fan Hui Geoff Hinton Lee Sedol Michael Redmond

Aja Huang Cho Chikun David Silver Demis Hassabis. Fan Hui Geoff Hinton Lee Sedol Michael Redmond CMPUT 396 3 hr closedbook 6 pages, 7 marks/page page 1 1. [3 marks] For each person or program, give the label of its description. Aja Huang Cho Chikun David Silver Demis Hassabis Fan Hui Geoff Hinton

More information

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( ) COMP3211 Project Artificial Intelligence for Tron game Group 7 Chiu Ka Wa (20369737) Chun Wai Wong (20265022) Ku Chun Kit (20123470) Abstract Tron is an old and popular game based on a movie of the same

More information

MONTE-CARLO TWIXT. Janik Steinhauer. Master Thesis 10-08

MONTE-CARLO TWIXT. Janik Steinhauer. Master Thesis 10-08 MONTE-CARLO TWIXT Janik Steinhauer Master Thesis 10-08 Thesis submitted in partial fulfilment of the requirements for the degree of Master of Science of Artificial Intelligence at the Faculty of Humanities

More information

ARTIFICIAL INTELLIGENCE (CS 370D)

ARTIFICIAL INTELLIGENCE (CS 370D) Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-5) ADVERSARIAL SEARCH ADVERSARIAL SEARCH Optimal decisions Min algorithm α-β pruning Imperfect,

More information

game tree complete all possible moves

game tree complete all possible moves Game Trees Game Tree A game tree is a tree the nodes of which are positions in a game and edges are moves. The complete game tree for a game is the game tree starting at the initial position and containing

More information

Game Playing: Adversarial Search. Chapter 5

Game Playing: Adversarial Search. Chapter 5 Game Playing: Adversarial Search Chapter 5 Outline Games Perfect play minimax search α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information Games vs. Search

More information

Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN

Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN Weijie Chen Fall 2017 Weijie Chen Page 1 of 7 1. INTRODUCTION Game TEN The traditional game Tic-Tac-Toe enjoys people s favor. Moreover,

More information

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Programming an Othello AI Michael An (man4), Evan Liang (liange) Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black

More information

The Parameterized Poker Squares EAAI NSG Challenge

The Parameterized Poker Squares EAAI NSG Challenge The Parameterized Poker Squares EAAI NSG Challenge What is the EAAI NSG Challenge? Goal: a fun way to encourage good, faculty-mentored undergraduate research experiences that includes an option for peer-reviewed

More information

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 42. Board Games: Alpha-Beta Search Malte Helmert University of Basel May 16, 2018 Board Games: Overview chapter overview: 40. Introduction and State of the Art 41.

More information

CS 188: Artificial Intelligence. Overview

CS 188: Artificial Intelligence. Overview CS 188: Artificial Intelligence Lecture 6 and 7: Search for Games Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein 1 Overview Deterministic zero-sum games Minimax Limited depth and evaluation

More information

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask Set 4: Game-Playing ICS 271 Fall 2017 Kalev Kask Overview Computer programs that play 2-player games game-playing as search with the complication of an opponent General principles of game-playing and search

More information

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1 Unit-III Chap-II Adversarial Search Created by: Ashish Shah 1 Alpha beta Pruning In case of standard ALPHA BETA PRUNING minimax tree, it returns the same move as minimax would, but prunes away branches

More information

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written

More information

Creating a Havannah Playing Agent

Creating a Havannah Playing Agent Creating a Havannah Playing Agent B. Joosten August 27, 2009 Abstract This paper delves into the complexities of Havannah, which is a 2-person zero-sum perfectinformation board game. After determining

More information

Artificial Intelligence Search III

Artificial Intelligence Search III Artificial Intelligence Search III Lecture 5 Content: Search III Quick Review on Lecture 4 Why Study Games? Game Playing as Search Special Characteristics of Game Playing Search Ingredients of 2-Person

More information

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1 Announcements Homework 1 Due tonight at 11:59pm Project 1 Electronic HW1 Written HW1 Due Friday 2/8 at 4:00pm CS 188: Artificial Intelligence Adversarial Search and Game Trees Instructors: Sergey Levine

More information

Artificial Intelligence Adversarial Search

Artificial Intelligence Adversarial Search Artificial Intelligence Adversarial Search Adversarial Search Adversarial search problems games They occur in multiagent competitive environments There is an opponent we can t control planning again us!

More information

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1 Lecture 14 Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1 Outline Chapter 5 - Adversarial Search Alpha-Beta Pruning Imperfect Real-Time Decisions Stochastic Games Friday,

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Adversarial Search Vibhav Gogate The University of Texas at Dallas Some material courtesy of Rina Dechter, Alex Ihler and Stuart Russell, Luke Zettlemoyer, Dan Weld Adversarial

More information

2 person perfect information

2 person perfect information Why Study Games? Games offer: Intellectual Engagement Abstraction Representability Performance Measure Not all games are suitable for AI research. We will restrict ourselves to 2 person perfect information

More information

A Bandit Approach for Tree Search

A Bandit Approach for Tree Search A An Example in Computer-Go Department of Statistics, University of Michigan March 27th, 2008 A 1 Bandit Problem K-Armed Bandit UCB Algorithms for K-Armed Bandit Problem 2 Classical Tree Search UCT Algorithm

More information

Adversarial Search 1

Adversarial Search 1 Adversarial Search 1 Adversarial Search The ghosts trying to make pacman loose Can not come up with a giant program that plans to the end, because of the ghosts and their actions Goal: Eat lots of dots

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Adversarial Search Prof. Scott Niekum The University of Texas at Austin [These slides are based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

UMBC 671 Midterm Exam 19 October 2009

UMBC 671 Midterm Exam 19 October 2009 Name: 0 1 2 3 4 5 6 total 0 20 25 30 30 25 20 150 UMBC 671 Midterm Exam 19 October 2009 Write all of your answers on this exam, which is closed book and consists of six problems, summing to 160 points.

More information

AI Agent for Ants vs. SomeBees: Final Report

AI Agent for Ants vs. SomeBees: Final Report CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 1 AI Agent for Ants vs. SomeBees: Final Report Wanyi Qian, Yundong Zhang, Xiaotong Duan Abstract This project aims to build a real-time game playing

More information

Artificial Intelligence. 4. Game Playing. Prof. Bojana Dalbelo Bašić Assoc. Prof. Jan Šnajder

Artificial Intelligence. 4. Game Playing. Prof. Bojana Dalbelo Bašić Assoc. Prof. Jan Šnajder Artificial Intelligence 4. Game Playing Prof. Bojana Dalbelo Bašić Assoc. Prof. Jan Šnajder University of Zagreb Faculty of Electrical Engineering and Computing Academic Year 2017/2018 Creative Commons

More information

Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar

Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar Othello Rules Two Players (Black and White) 8x8 board Black plays first Every move should Flip over at least

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Adversarial Search Instructor: Stuart Russell University of California, Berkeley Game Playing State-of-the-Art Checkers: 1950: First computer player. 1959: Samuel s self-taught

More information

Success Stories of Deep RL. David Silver

Success Stories of Deep RL. David Silver Success Stories of Deep RL David Silver Reinforcement Learning (RL) RL is a general-purpose framework for decision-making An agent selects actions Its actions influence its future observations Success

More information

CS151 - Assignment 2 Mancala Due: Tuesday March 5 at the beginning of class

CS151 - Assignment 2 Mancala Due: Tuesday March 5 at the beginning of class CS151 - Assignment 2 Mancala Due: Tuesday March 5 at the beginning of class http://www.clubpenguinsaraapril.com/2009/07/mancala-game-in-club-penguin.html The purpose of this assignment is to program some

More information

Optimizing UCT for Settlers of Catan

Optimizing UCT for Settlers of Catan Optimizing UCT for Settlers of Catan Gabriel Rubin Bruno Paz Felipe Meneguzzi Pontifical Catholic University of Rio Grande do Sul, Computer Science Department, Brazil A BSTRACT Settlers of Catan is one

More information

An Empirical Evaluation of Policy Rollout for Clue

An Empirical Evaluation of Policy Rollout for Clue An Empirical Evaluation of Policy Rollout for Clue Eric Marshall Oregon State University M.S. Final Project marshaer@oregonstate.edu Adviser: Professor Alan Fern Abstract We model the popular board game

More information

6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search

6. Games. COMP9414/ 9814/ 3411: Artificial Intelligence. Outline. Mechanical Turk. Origins. origins. motivation. minimax search COMP9414/9814/3411 16s1 Games 1 COMP9414/ 9814/ 3411: Artificial Intelligence 6. Games Outline origins motivation Russell & Norvig, Chapter 5. minimax search resource limits and heuristic evaluation α-β

More information

Virtual Global Search: Application to 9x9 Go

Virtual Global Search: Application to 9x9 Go Virtual Global Search: Application to 9x9 Go Tristan Cazenave LIASD Dept. Informatique Université Paris 8, 93526, Saint-Denis, France cazenave@ai.univ-paris8.fr Abstract. Monte-Carlo simulations can be

More information

Computing Science (CMPUT) 496

Computing Science (CMPUT) 496 Computing Science (CMPUT) 496 Search, Knowledge, and Simulations Martin Müller Department of Computing Science University of Alberta mmueller@ualberta.ca Winter 2017 Part IV Knowledge 496 Today - Mar 9

More information

Swing Copters AI. Monisha White and Nolan Walsh Fall 2015, CS229, Stanford University

Swing Copters AI. Monisha White and Nolan Walsh  Fall 2015, CS229, Stanford University Swing Copters AI Monisha White and Nolan Walsh mewhite@stanford.edu njwalsh@stanford.edu Fall 2015, CS229, Stanford University 1. Introduction For our project we created an autonomous player for the game

More information

An Intelligent Agent for Connect-6

An Intelligent Agent for Connect-6 An Intelligent Agent for Connect-6 Sagar Vare, Sherrie Wang, Andrea Zanette {svare, sherwang, zanette}@stanford.edu Institute for Computational and Mathematical Engineering Huang Building 475 Via Ortega

More information

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel Foundations of AI 6. Adversarial Search Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard & Bernhard Nebel Contents Game Theory Board Games Minimax Search Alpha-Beta Search

More information

Artificial Intelligence. Minimax and alpha-beta pruning

Artificial Intelligence. Minimax and alpha-beta pruning Artificial Intelligence Minimax and alpha-beta pruning In which we examine the problems that arise when we try to plan ahead to get the best result in a world that includes a hostile agent (other agent

More information

CS 387/680: GAME AI BOARD GAMES

CS 387/680: GAME AI BOARD GAMES CS 387/680: GAME AI BOARD GAMES 6/2/2014 Instructor: Santiago Ontañón santi@cs.drexel.edu TA: Alberto Uriarte office hours: Tuesday 4-6pm, Cyber Learning Center Class website: https://www.cs.drexel.edu/~santi/teaching/2014/cs387-680/intro.html

More information

The game of Reversi was invented around 1880 by two. Englishmen, Lewis Waterman and John W. Mollett. It later became

The game of Reversi was invented around 1880 by two. Englishmen, Lewis Waterman and John W. Mollett. It later became Reversi Meng Tran tranm@seas.upenn.edu Faculty Advisor: Dr. Barry Silverman Abstract: The game of Reversi was invented around 1880 by two Englishmen, Lewis Waterman and John W. Mollett. It later became

More information

CS 331: Artificial Intelligence Adversarial Search II. Outline

CS 331: Artificial Intelligence Adversarial Search II. Outline CS 331: Artificial Intelligence Adversarial Search II 1 Outline 1. Evaluation Functions 2. State-of-the-art game playing programs 3. 2 player zero-sum finite stochastic games of perfect information 2 1

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence selman@cs.cornell.edu Module: Adversarial Search R&N: Chapter 5 1 Outline Adversarial Search Optimal decisions Minimax α-β pruning Case study: Deep Blue

More information

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here: Adversarial Search 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/adversarial.pdf Slides are largely based

More information

Instability of Scoring Heuristic In games with value exchange, the heuristics are very bumpy Make smoothing assumptions search for "quiesence"

Instability of Scoring Heuristic In games with value exchange, the heuristics are very bumpy Make smoothing assumptions search for quiesence More on games Gaming Complications Instability of Scoring Heuristic In games with value exchange, the heuristics are very bumpy Make smoothing assumptions search for "quiesence" The Horizon Effect No matter

More information

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search CSE 473: Artificial Intelligence Fall 2017 Adversarial Search Mini, pruning, Expecti Dieter Fox Based on slides adapted Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Dan Weld, Stuart Russell or Andrew Moore

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Adversarial Search Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]

More information

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley Adversarial Search Rob Platt Northeastern University Some images and slides are used from: AIMA CS188 UC Berkeley What is adversarial search? Adversarial search: planning used to play a game such as chess

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46. Foundations of Artificial Intelligence May 30, 2016 46. AlphaGo and Outlook Foundations of Artificial Intelligence 46. AlphaGo and Outlook Thomas Keller Universität Basel May 30, 2016 46.1 Introduction

More information

CS 771 Artificial Intelligence. Adversarial Search

CS 771 Artificial Intelligence. Adversarial Search CS 771 Artificial Intelligence Adversarial Search Typical assumptions Two agents whose actions alternate Utility values for each agent are the opposite of the other This creates the adversarial situation

More information

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13 Algorithms for Data Structures: Search for Games Phillip Smith 27/11/13 Search for Games Following this lecture you should be able to: Understand the search process in games How an AI decides on the best

More information

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks 2015 IEEE Symposium Series on Computational Intelligence Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks Michiel van de Steeg Institute of Artificial Intelligence

More information

Nested Monte-Carlo Search

Nested Monte-Carlo Search Nested Monte-Carlo Search Tristan Cazenave LAMSADE Université Paris-Dauphine Paris, France cazenave@lamsade.dauphine.fr Abstract Many problems have a huge state space and no good heuristic to order moves

More information

Artificial Intelligence Lecture 3

Artificial Intelligence Lecture 3 Artificial Intelligence Lecture 3 The problem Depth first Not optimal Uses O(n) space Optimal Uses O(B n ) space Can we combine the advantages of both approaches? 2 Iterative deepening (IDA) Let M be a

More information

CS188 Spring 2010 Section 3: Game Trees

CS188 Spring 2010 Section 3: Game Trees CS188 Spring 2010 Section 3: Game Trees 1 Warm-Up: Column-Row You have a 3x3 matrix of values like the one below. In a somewhat boring game, player A first selects a row, and then player B selects a column.

More information