Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models

Similar documents
Estimation of Rates Arriving at the Winning Hands in Multi-Player Games with Imperfect Information

Searching Optimal Movements in Multi-Player Games with Imperfect Information

Imperfect Information. Lecture 10: Imperfect Information. What is the size of a game with ii? Example Tree

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

CS221 Final Project Report Learn to Play Texas hold em

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning

Heads-up Limit Texas Hold em Poker Agent

CS510 \ Lecture Ariel Stolerman

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

Speeding-Up Poker Game Abstraction Computation: Average Rank Strength

Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search

Comp 3211 Final Project - Poker AI

ARTIFICIAL INTELLIGENCE (CS 370D)

Optimal Rhode Island Hold em Poker

Classic Dominoes. Number of Players: 2-4

Creating a Poker Playing Program Using Evolutionary Computation

Learning to play Dominoes

Decision Making in Multiplayer Environments Application in Backgammon Variants

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment

An Adaptive Intelligence For Heads-Up No-Limit Texas Hold em

An Adaptive-Learning Analysis of the Dice Game Hog Rounds

Game Theory: The Basics. Theory of Games and Economics Behavior John Von Neumann and Oskar Morgenstern (1943)

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang

the gamedesigninitiative at cornell university Lecture 6 Uncertainty & Risk

Automatic Bidding for the Game of Skat

Learning Dota 2 Team Compositions

Virtual Global Search: Application to 9x9 Go

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage

Learning a Value Analysis Tool For Agent Evaluation

Solution to Heads-Up Limit Hold Em Poker

arxiv: v1 [cs.gt] 23 May 2018

Math 152: Applicable Mathematics and Computing

Texas hold em Poker AI implementation:

Computing Science (CMPUT) 496

JAIST Reposi. Detection and Labeling of Bad Moves Go. Title. Author(s)Ikeda, Kokolo; Viennot, Simon; Sato,

An evaluation of how Dynamic Programming and Game Theory are applied to Liar s Dice

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

The Game-Theoretic Approach to Machine Learning and Adaptation

43.1 Introduction. Foundations of Artificial Intelligence Introduction Monte-Carlo Methods Monte-Carlo Tree Search. 43.

Probability and Statistics

Andrei Behel AC-43И 1

Learning to Play Love Letter with Deep Reinforcement Learning

Pengju

Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN

CS221 Project Final Report Gomoku Game Agent

A Brief Introduction to Game Theory

Evaluating State-Space Abstractions in Extensive-Form Games

Simple Poker Game Design, Simulation, and Probability

A Heuristic Based Approach for a Betting Strategy. in Texas Hold em Poker

Computer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta

2. The Extensive Form of a Game

CS221 Project Final Report Learning to play bridge

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Age of Empires 2: Forgotten Empires Tournament Rules. by ForTheSwarm March 16, 2018

Automatic Public State Space Abstraction in Imperfect Information Games

An Introduction to Poker Opponent Modeling

Computer Poker Research at LIACC

LEARN HOW TO PLAY MINI-BRIDGE

ultimate texas hold em 10 J Q K A

Opponent Modeling in Texas Hold em

Dota2 is a very popular video game currently.

Derive Poker Winning Probability by Statistical JAVA Simulation

POKER AGENTS LD Miller & Adam Eck April 14 & 19, 2011

Opponent Modelling by Expectation-Maximisation and Sequence Prediction in Simplified Poker

Inference of Opponent s Uncertain States in Ghosts Game using Machine Learning

Introduction to Auction Theory: Or How it Sometimes

A Brief Introduction to Game Theory

The game of Bridge: a challenge for ILP

Extensive Form Games. Mihai Manea MIT

An Exploitative Monte-Carlo Poker Agent

Five-In-Row with Local Evaluation and Beam Search

Reflections on the First Man vs. Machine No-Limit Texas Hold 'em Competition

Fall 2017 March 13, Written Homework 4

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Models of Strategic Deficiency and Poker

Fictitious Play applied on a simplified poker game

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

Poker Hand Rankings Highest to Lowest A Poker Hand s Rank determines the winner of the pot!

Texas Hold em Poker Rules

JINX - 2 Players / 15 Minutes

HAND & FOOT CARD GAME RULES

A Competitive Texas Hold em Poker Player Via Automated Abstraction and Real-time Equilibrium Computation

Game AI Challenges: Past, Present, and Future

Game-playing: DeepBlue and AlphaGo

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

Game theory and AI: a unified approach to poker games

Exploitability and Game Theory Optimal Play in Poker

Using Sliding Windows to Generate Action Abstractions in Extensive-Form Games

Adversarial Search and Game Theory. CS 510 Lecture 5 October 26, 2017

Monte Carlo Tree Search

Computer Game Programming Board Games

Kernels and Support Vector Machines

From Model to Application: Developing a believable opponent in the game of Set!

Bootstrapping from Game Tree Search

Learning Artificial Intelligence in Large-Scale Video Games

Analyzing the Impact of Knowledge and Search in Monte Carlo Tree Search in Go

On Feature Selection, Bias-Variance, and Bagging

More on games (Ch )

Transcription:

Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models Naoki Mizukami 1 and Yoshimasa Tsuruoka 1 1 The University of Tokyo 1

Introduction Imperfect information games are challenging research Contract bridge [Ginsberg 2001] Skat [Buro et al 2009] Texas Hold em [Bowling et al 2015] We focus on Japanese Mahjong Multiplayer Imperfect information Enormous number of information sets Mahjong : 10 60 Texas Hold em: 10 18 2

Related work Computer poker Nash equilibrium strategy CFR+ method has solved Heads-up limit hold em poker [Bowling et al 2015] Opponent modeling Opponent modeling and Monte Carlo tree search for exploitation [Van der Kleij 2010] The program updates a hand rank distribution in the current game state when the showdown occurs [Aaron 2002] 3

Japanese Mahjong Rules It play with four players A player can win round by completing a winning hand consisting of 13 tiles One game of mahjong consists of 4 or 8 rounds Terms Waiting A player s hand needs only one tile to win Folding A player gives up to win and only tries to avoid discarding a winning tile for opponents Is not action but strategy 4

One-player mahjong [Mizukami et al 2014] Implement folding system One-player Mahjong A One-player Mahjong player only tries to win It is trained by supervised learning using game records It plays an important role in our Monte Carlo simulation Recognizing Folding situations Folding system is realized by supervised learning Positions in game records are annotated manually Result: Beyond average human players Problem: It is difficult to annotate required data 5

Proposed method Overview Original game Opponent modeling by supervised learning Abstraction of opponent Waiting Winning tile Hand score Monte Carlo Simulation Decides moves Abstracted game Advantage It is not necessary to predict opponents specific hands Can be trained models only using game records 6

Training setting Game records Internet Mahjong site called ``Tenhou Dataset Training data 1.7 10 7 Test data 100 Models Waiting: Winning tile: Hand score: logistic regression model logistic regression model Linear regression model 7

Waiting The model predicts whether an opponent is waiting or not Input Discarded tiles Opponent s hand revealed melds Label: waiting Output P opponent = waiting = 0.8 8

Evaluation and result Evaluation Area Under the Curve Player AUC Expert player 0.778 Prediction model 0.777 -Discarded tiles 0.772 -Number of revealed melds 0.770 Same prediction ability as the expert player Expert player: Top 0.1% of the players 9

Winning tiles Model predicts opponents winning tiles In general, there are one or more winning tiles Build prediction models for all kinds of tiles Input Discarded tiles Opponent s hand revealed melds Winning tile Output 0. 0 0.10 0.15 or 10

Evaluation method 1: Input opponents information e.g winning tiles 2: Tiles that a player has are arranged in ascending order of probability of being a winning tile for opponent Ranking about winning tiles for opponent Evaluation value = 6 / (14-2)=0.5 11

Result Random: Tiles are arranged randomly Player Evaluation value Expert player 0.744 Prediction model 0.676 -Revealed melds 0.675 -Discarded tiles 0.673 Random 0.502 12

Hand Score (HS) The model predicts the score that the player has to pay Input Discarded tiles Opponent s hand revealed melds Hand Score 2,600 Output 2,000 13

Evaluation method and result Evaluation method Mean Squared Error (MSE) Player MSE Prediction model 0.37 -Revealed Melds 0.38 -Revealed fan value 0.38 Expert player 0.40 Performance of prediction model is higher than that of an expert player 14

Overview of proposed method Abstraction of opponent Waiting P p = waiting = 1 1 + exp( w T x p ) Winning tile Hand score P Tile = winning = 1 1 + exp( w T x p ) HS = w T x Abstracted game 15

Application of opponent models Using three prediction models to estimate an expected value LP (Losing probability) LP p, Tile = P p = waiting P(Tile = winning) EL (Expected Loss) EL p, Tile = LP p, Tile HS(p, Tile) 16

Monte Carlo simulation The program calculates Score Tile for each tile Program selects the tile that has the highest Score Tile Score Tile = sim Tile p opponents 1 LP p, Tile EL p, Tile p opponents Procedure of sim Tile 1: Discard a tile 2: Opponent s turn 3: Program s turn 4: Repeat 2,3 5: Get reward My hand Tile 1 Tile 2 sim(tile 1 ) sim(tile 2 ) 17

Evaluation setting Compared to our previous work Moves are computed in a second Length of a game is four rounds VS state-of-the-art program Mattari Mahjong Duplicate mode can generate same tile sequences can compare the result VS human players Internet Mahjong site ``Tenhou 18

Result VS Mattari Mahjong 1st (%) 2nd(%) 3rd(%) 4th(%) Average rank Games Proposed method 25.2 25.6 24.7 24.5 2.48±0.07 1000 Mattari Mahjong 24.8 24.7 25.0 25.5 2.51±0.07 1000 [Mizukami+ 2014] 24.3 22.6 22.2 30.9 2.59±0.07 1000 VS Human players 1st (%) 2nd(%) 3rd(%) 4th(%) Average rank games Proposed method 24.1 28.1 24.8 23.0 2.46±0.04 2634 [Mizukami + 2014] 25.3 24.8 25.1 24.8 2.49±0.07 1441 19

Conclusion and Future work Conclusion Performance of the three prediction models is high Our program outperforms state-of-the-art program by Monte Carlo simulation Future work Consider final rank Improve players actions in simulation 20

Training of 1-player mahjong players A weight vector is updated so that the player can make moves as expert players. We used the averaged perceptron Evaluation value 31-1 2 Record of a game s move Update weight vector W = W + X X x:feaure vector W weight vector 21

Recognizing folding situations We train a classifier for folding situations using a machine learning approach This approach requires training data. Positions in game records are annotated manually Human players The player folded. Because were discarded Before discarding Folding situations 22

Setting Dataset Training data 1.77 10 7 Test data 100 Features Discarded tiles, number of revealed melds, and so on 6,888 dimension logistic regression model P p = waiting = 1 1 + exp( w T x p ) 23

Setting Dataset Training data 1.77 10 7 Test data 100 Features Discarded tiles, number of revealed melds, and so on 31,416 dimension logistic regression model P Tile = winning = 1 1 + exp( w T x p ) 24

Setting Dataset Training data 5.92 10 7 Test data 100 Features Revealed Melds, Revealed fan value and so on 26,889 dimension Linear regression model HS = w T x 25

True positive rate Evaluation and result Evaluation Area Under the Curve Player AUC Expert player 0.778 Prediction model 0.777 -Discarded tiles 0.772 -Number of revealed melds 0.770 False positive rate Same prediction ability as the expert player Expert player: Top 0.1% of the players 26

Flowchart of program s turn Pick up a tile Win check YES Win Decide one-player mahjong moves ODEV (One-Depth Expected Value) is an expected value that is calculated by searching game trees until the program s next turn. Fold: a player picks up a tile and discards no tiles Discard a tile and compute ODEV Fold Win check for opponent YES Win Next player 27

Flowchart of opponent s turn Pick up a tile Fold Win check YES YES Next players Win Opponent player has two binary parameters indicating whether he is waiting or folding Change two parameters, fold and waiting Discard a tile or fold Win check for opponent Fold Next player YES Win Next players 28