Estimation of Rates Arriving at the Winning Hands in Multi-Player Games with Imperfect Information

Similar documents
Searching Optimal Movements in Multi-Player Games with Imperfect Information

Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models

Balsall Common U3A Mahjong Rules

CS221 Final Project Report Learn to Play Texas hold em

Decision Tree Analysis in Game Informatics

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

Riichi. Rules for Japanese Mahjong

SCRABBLE ARTIFICIAL INTELLIGENCE GAME. CS 297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University

Speeding-Up Poker Game Abstraction Computation: Average Rank Strength

TABLE OF CONTENTS TEXAS HOLD EM... 1 OMAHA... 2 PINEAPPLE HOLD EM... 2 BETTING...2 SEVEN CARD STUD... 3

An Adaptive Intelligence For Heads-Up No-Limit Texas Hold em

Playing Othello Using Monte Carlo

Mah Jongg Tiles Page Card

After receiving his initial two cards, the player has four standard options: he can "Hit," "Stand," "Double Down," or "Split a pair.

Al-Jabar A mathematical game of strategy Cyrus Hettle and Robert Schneider

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

Experiments on Alternatives to Minimax

Monte Carlo based battleship agent

Automatic Public State Space Abstraction in Imperfect Information Games

Section Summary. Finite Probability Probabilities of Complements and Unions of Events Probabilistic Reasoning

A Factorial Representation of Permutations and Its Application to Flow-Shop Scheduling

Poker Rules Friday Night Poker Club

CS 229 Final Project: Using Reinforcement Learning to Play Othello

Derive Poker Winning Probability by Statistical JAVA Simulation

arxiv: v1 [cs.gt] 23 May 2018

Biased Opponent Pockets

Using Sliding Windows to Generate Action Abstractions in Extensive-Form Games

Mahjong British Rules

Optimal Yahtzee performance in multi-player games

Move Evaluation Tree System

Distributed Collaborative Path Planning in Sensor Networks with Multiple Mobile Sensor Nodes

MAHJONG COMPETITION RULES

Let s Make. Math Fun. Volume 19 January/February Dice Challenges. Telling the Time. Printable Games. Mastering Multiplication.

1. A factory makes calculators. Over a long period, 2 % of them are found to be faulty. A random sample of 100 calculators is tested.

Programming an Othello AI Michael An (man4), Evan Liang (liange)

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

CS221 Project Final Report Gomoku Game Agent

Pengju

ultimate texas hold em 10 J Q K A

Free Cell Solver. Copyright 2001 Kevin Atkinson Shari Holstege December 11, 2001

Final Exam, Math 6105

Rummikub Competition Start-up Kit

AI Approaches to Ultimate Tic-Tac-Toe

Learning from Hints: AI for Playing Threes

5.4 Imperfect, Real-Time Decisions

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

Alberta 55 plus Cribbage Rules

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

SEARCHING is both a method of solving problems and

The Game of Mah Jongg

Poker Hand Rankings Highest to Lowest A Poker Hand s Rank determines the winner of the pot!

Mah Jongg FAQs. Answers to Frequently Asked Questions with hints. Q. How do we exchange seats when we are playing a four-player game?

ARTIFICIAL INTELLIGENCE (CS 370D)

AL-JABAR. Concepts. A Mathematical Game of Strategy. Robert P. Schneider and Cyrus Hettle University of Kentucky

CMS.608 / CMS.864 Game Design Spring 2008

Introduction. 1st North American Riichi Open 3

Heads-up Limit Texas Hold em Poker Agent

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment

Computing Science (CMPUT) 496

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

HEADS UP HOLD EM. "Cover card" - means a yellow or green plastic card used during the cut process and then to conceal the bottom card of the deck.

2048: An Autonomous Solver

Richard Gibson. Co-authored 5 refereed journal papers in the areas of graph theory and mathematical biology.

5.4 Imperfect, Real-Time Decisions

CS510 \ Lecture Ariel Stolerman

Grey Wolf Optimization Algorithm for Single Mobile Robot Scheduling

Simple Poker Game Design, Simulation, and Probability

Equipment for the basic dice game

Virtual Global Search: Application to 9x9 Go

Sequential Placement Optimization Games: Poker Squares, Word Squares, and Take It Easy! Todd W. Neller

November 6, Chapter 8: Probability: The Mathematics of Chance

TRAFFIC SIGNAL CONTROL WITH ANT COLONY OPTIMIZATION. A Thesis presented to the Faculty of California Polytechnic State University, San Luis Obispo

CHASE THE FLUSH. Ante wager-- means a wager required by the game to initiate the start to the round of play.

Game Theory and Algorithms Lecture 19: Nim & Impartial Combinatorial Games

How to Make the Perfect Fireworks Display: Two Strategies for Hanabi

Roll & Make. Represent It a Different Way. Show Your Number as a Number Bond. Show Your Number on a Number Line. Show Your Number as a Strip Diagram

Checkpoint Questions Due Monday, October 7 at 2:15 PM Remaining Questions Due Friday, October 11 at 2:15 PM

Artificial Intelligence

Fictitious Play applied on a simplified poker game

10, J, Q, K, A all of the same suit. Any five card sequence in the same suit. (Ex: 5, 6, 7, 8, 9.) All four cards of the same index. (Ex: A, A, A, A.

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Basic Bidding. Review

Artificial Intelligence

Ar#ficial)Intelligence!!

A Quoridor-playing Agent

Research Article A New Iterated Local Search Algorithm for Solving Broadcast Scheduling Problems in Packet Radio Networks

STEFAN RISTHAUS. A game by. for 2 4 players. 12 years and up

Al-Jabar A mathematical game of strategy Designed by Robert P. Schneider and Cyrus Hettle

Rules for Japanese Mahjong

E190Q Lecture 15 Autonomous Robot Navigation

LEARN HOW TO PLAY MINI-BRIDGE

YEW TEE SCRABBLE OPEN CHAMPIONSHIP 2010 Primary / Secondary School Student Category

CS 387/680: GAME AI BOARD GAMES

The Game of Mah Jongg

Meta-Heuristic Approach for Supporting Design-for- Disassembly towards Efficient Material Utilization

A Probability Work Sheet

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Transcription:

2016 4th Intl Conf on Applied Computing and Information Technology/3rd Intl Conf on Computational Science/Intelligence and Applied Informatics/1st Intl Conf on Big Data, Cloud Computing, Data Science & Engineering Estimation of Rates Arriving at the Winning Hands in Multi-Player Games with Imperfect Information Kenshi Yoshimura Teruhisa Hochin Hiroki Nomiya Graduate School of Information Science Kyoto Institute of Technology Kyoto, Japan e-mail: m6622049@edu.kit.ac.jp, {hochin, nomiya}@kit.ac.jp Abstract In multi-player games with imperfect information, e.g., Poker and Mahjong, they have imperfect information differing from Shogi and Reversi. Therefore, it is difficult to decide optimal movements. In Mahjong, fold is very important, and it is necessary to check predominance between a player s hand and other players hands. To this end, it is required to estimate the rate arriving at a winning hand. This paper proposes the estimation methods of rates arriving at the winning hands. The proposed methods use random simulation taking the discarded tiles in consideration, and tabu search without using the record of a game. In the experiment of evaluating the rate arriving at the winning hands, it can be estimated to some extent depending on the state by using tabu search. As the result, the possibility of the effective search of the estimation of the rate of tenpai by using tabu search is indicated. An application of the proposed method to other games with imperfect information is expected because these methods don t use the record of a game. Index Terms meta-heuristics; multi-player games with imperfect information; Mahjong; tabu search; winning hands; I. INTRODUCTION In multi-player games with imperfect information, e.g., Poker [1] and Mahjong [2] [6], they have imperfect information differing from Shogi and Reversi. Therefore, it is difficult to decide optimal movements. Multi-player games mean the number of players is over two. Mahjong is a four-player, zerosum game with imperfect information. So, it is a multi-player game. In these games, the movement of getting maximum profit is not always the best. This is different from two-player games. From these properties, it is hard to decide optimal movements in Mahjong. The method deciding optimal movements for the state of discarding tiles and naki is proposed [2]. This method uses tabu search [7]. This research shows that the decision of the movement by using tabu search is presumably effective. In Mahjong, fold is very important, and it is necessary to check predominance between a player s hand and other players hands. Some researches for fold are conducted [3] [6], but a successful case not using the record is not reported yet. It is important to estimate the rates arriving at winning hands of the opponents in order to decide to do fold or not. The rate arriving at a winning hand, however, is not estimated well yet. This paper proposes the estimation methods of rates arriving at a winning hand (hereinafter, this is called rates of tenpai) in order to implement a Mahjong player exceeding human top players. The proposed methods use random simulation taking the discarded tiles in consideration, and use tabu search. In the experiment of evaluating the rate arriving at the winning hands, it can be estimated to some extent depending on the state by using tabu search. As the result, the possibility of the effective search of the estimation of the rate of tenpai by using tabu search is indicated. An application of the proposed method to other games with imperfect information is expected because these methods don t use the record of a game. The reminder of the paper is as follows. Section 2 describes Mahjong. Section 3 surveys related works. Section 4 and Section 5 proposes estimation methods of rates of tenpai, and evaluates the proposed methods, respectively. Finally, Section 6 concludes the paper. II. MAHJONG Mahjong [8] is a four-player, zero-sum game with imperfect information. A snapshot of a Mahjong game is shown in Fig. 1. 136 tiles are used in Mahjong. Suits tiles are three types, numbered from one to nine. Honor tiles are two types: winds (east, south, west, and north) and dragons (red, green, and white). Each tile has four identical copies. Every player can take the following actions: draw: getting one tile in the player s turn. discard: throwing one tile in the player s turn. naki: using the tile when an other player discarded it. Using these actions, the score is made by arriving at a winning hand. A winning action is as follows: ron: winning by picking up a discarded tile. tsumo: winning by drawing the tile. A winning hand consists of four melds (a specific pattern of 3 pieces) and the eyes (a pair of two identical pieces). Melds are as follows: pung: a set of three identical tiles. kong: a set of four identical tiles. chow: a set of three suited tiles in sequence. The terminology of Mahjong used in this paper is as follows: tenpai: the state that a hand is one tile short of a winning hand. shanten: the number of tiles required to tenpai. fold: to give up own winning and not discard effective tiles of opponents. 978-1-5090-4871-7/16 $31.00 2016 IEEE DOI 10.1109/ACIT-CSII-BCD.2016.25 99

Fig. 1. a snapshot of Mahjong game naki: using the tile an other player discarded in order to complete a meld. effective tile: a tile decreasing shanten. drawn game: a finished round no player has won. A player of being tenpai get a score from a player of not being tenpai. yama: a set of rest tiles which are not drawn yet. In Fig. 1, the yama is a set of blind tiles. river: a set of discarded tiles. In Fig. 1, the river appears at the center on the board. flush: a winning hand consisted of one type of suited tiles, or one type of suited tiles and honor tiles. dora: a tile increasing the score. Dora is not a part of special patterns. Red fives are the same as dora. dealer: player discarding a tile in the first. pre-meld: the state which is one tile short a meld. isolation tile: the tile which is not a meld, the eyes, or a pre-meld. tedashi: discarding a tile except for the drawn tile in this turn. In Fig. 1, tedashi tiles are shown as bright tiles in the river. tsumo-giri: discarding the drawn tile in this turn. In Fig. 1, tsumo-giri tiles are shown as dark tiles in the river. III. RELATED WORK Yoshimura et al. [2] proposed the search method of optimal movements using tabu search. In this research, the method was applied to the state of discarding tiles and naki, and the movement is decided. In the experiment of evaluating the rate of concordance of discarding tile, the maximum rate of concordance reached to 83%. This means effective winning hands could be found in the initial states. In the experiment of playing a game with benchmark players, it is shown that the proposed method is better than benchmark players. From these results, the possibility of the effective search of optimal solution by using tabu search was indicated. Mizukami et al. proposed a one-player Mahjong player adding naki and fold better than average human players. Moreover, a predicting model was created by using an opponent s model consisted of tenpai, finishing tiles, and winning score [3]. By combining this model with a one-player Mahjong player [3], and Monte Carlo simulation, the player having similar performance to intermediate players was implemented [4]. Here, the model of tenpai was created by the binary logistic regression model. As the result, the accuracy of prediction is the same degree as advanced. However, this player doesn t play in proportion to the current rank and the number of rounds. Moreover, in the final round, this player makes the hand determining the lowest rank. To improve this, the model predicting the final rank was created from the situation of scores appearing on the record. As the result, the player acts based on the final rank by usage as the reward in Monte Carlo simulation. The player gets higher performance than intermediate players [5]. Nemoto et al. [6] estimated an opponent s hand to use CRF (Conditional Random Fields), which is a recognition model for the sequence labeling problem. As the result of the experiment, the rate of concordance of the hand obtained by the proposed method was 42 %, and the rate of concordance was increasing as the game proceeds. It was shown that shanten is lower than that of the real player. IV. THE RANDOM SIMULATION TAKING THE DISCARDED TILES IN CONSIDERATION In Mahjong, fold is very important. It is necessary to check predominance between a player s hand and other players hands. Therefore, the rates arriving at winning hands of the opponents are strongly required to be estimated in order to decide to do fold or not. A. Proposed Method The first method searches by focusing opponent s tedashi tiles. When the discarded tile is a tedashi tile, if the hand of opponent heads for a winning hand by the optimal method, it is considered that a better tile is added in the opponent hand. Therefore, if the number of tedashi tiles increases, the hand is considered to be approaching to the winning hand. The search uses the following algorithm. 1) The naki tiles and the tedashi tiles are added into the opponent initial hand. Next, the tiles in the yama are randomly selected and added until the hand has thirteen tiles. 2) The initial hand becomes the best hand. Set the index of the tedashi tile k = 0. 3) Set the number of iterations i = 0. Select the tedashi tile having the current index, select a possible tile in the yama, exchange those tiles, and evaluate the new hand. 4) Update the best hand if the current hand is better than the best hand. Set i = i + 1. If the number of iterations does not reach to the value decided apriori, return to Step 3. Otherwise, set the new hand to the current hand. 5) If there is no tedashi tile, or reach to the end of execution time, then stop. Otherwise, return to Step 1. The maximum number of tedashi tiles is set to six. The reason is as follows. Though this tile was included in the 100

opponent hand in one or more turns, it is finally discarded. So, the search should not excessively depend on the tile. However, if the number is too small, arriving at a winning hand is difficult because the number of changes is the same as the number of tedashi tiles. Therefore, the maximum number is set to six, which is equal to the maximum depth of the previous work [2]. If the number of tedashi tiles is over six, recent six tiles are used. The number of iterations is seven. Only shanten is used as the evaluation criteria. B. Experiment The rate of tenpai in the record of a game is examined to evaluate the proposed method. Records of top players in Tenhou [9] are used. These players are top 0.1% players. Therefore, the accuracy of the estimation can be checked because it is considered that the human player s movement is mostly optimal. The rate of tenpai R is calculated by Eq. (1). the number of hands arriving at tenpai R = *100 [%] (1) the number of initial hands generated The records using red fives are not used because this research does not consider them. In this experiment, the input is the chosen state, and the output is the rate of tenpai. The search is done at the first turn, the sixth turn, and the twelfth turn. Every search time is one second. Ten operations are conducted. It is assumed that the opponent player don t adopt fold. 1) Experiment 1: The tiles in the yama are randomly added to the hand one by one until the hand includes thirteen tiles. The result is shown in TABLE I. The number in the test data name is the number of tiles discarded by every player (A, B, C). The rate of tenpai does not improve as the game progresses. It is considered that this result is not good. TABLE I RATE OF TENPAI OF EXPERIMENT 1 test data the rate of tenpai R[%] the number of the number of tedashi tiles naki 1-A 0.035 1 0 1-B 0.040 1 0 1-C 0.021 1 0 6-A 0.905 5 0 6-B 0.243 3 0 6-C 0.222 3 0 12-A 3.426 6 1 12-B 0.448 5 0 12-C 1.081 6 0 2) Experiment 2: In the result of Experiment 1, the rate of tenpai is very low. In order to improve the rate of tenpai, the isolation tile, the eyes and pre-melds, and melds are randomly added as units until the hand has thirteen tiles. Here, we conduct the experiment by varying the ratio of the generation probabilities of an isolation tile, a pre-meld or the eyes, and a meld as 1 :1:1to5:5:5.Thereason why these values are chosen is that these values are needed to satisfy the condition that the probability of a pre-meld or the eyes generated is the probability of the isolation tile generated and over, and the probability of a meld generated is over the probability of a pre-meld or the eyes generated. The result is shown in TABLE II. Although the minimum rate of tenpai becomes high as the game progresses, the maximum rate of tenpai becomes low as the game progresses. The expected result could not be obtained. TABLE II RATE OF TENPAI OF EXPERIMENT 2 test data the minimum rate the maximum rate of tenpai R[%] (the proportion) of tenpai R[%] (the proportion) 1-A 1.895(5:1:1) 50.800(1:1:5) 1-B 1.775(5:1:1) 51.411(1:1:5) 1-C 1:754(5:1:1) 51.270(1:1:5) 6-A 5.070(5:1:1) 19.103(1:3:5) 6-B 3.301(5:1:1) 22.942(1:1:5) 6-C 3.372(5:2:1) 24.983(1:1:5) 12-A 6.494(5:2:1) 11.308(1:1:5) 12-B 3.624(5:2:1) 14.415(1:2:5) 12-C 4.474(5:1:1) 14.007(1:1:5) C. Discussion TABLE I shows that the rate of tenpai tends to be affected by the number of tedashi tiles. Therefore, it is considered that the number of tedashi tiles has a relationship with the rate of tenpai. As for, 12-A, which is the unique player doing naki, estimated the high rate of tenpai can be. It is considered that the number of states decreases by doing naki. The result of 6-A is better than those of 6-B and 6-C because the number of tedashi tiles is five, so the number of the tiles randomly chosen is few. Although the number of tedashi tiles of 12-A and 12-C is similar to that of 12-B, the rate of tenpai tends to have big difference. Hence, it is considered that the search is conducted to take which tile the opponent discarded in consideration. TABLE II shows that the rate of tenpai improves by adding units of tiles rather than adding tiles one by one. Moreover, the result shows that if the proportion of isolation tiles generated is high, the rate of tenpai becomes low, and if the proportion of melds generated is high, the rate of tenpai becomes high. Therefore, if the rate of tenpai becomes high as the game progresses, the probability of tenpai must become high as the game progresses. The maximum rate of tenpai, however, becomes low as the game progresses. This is because the number of melds increases if the number of tedashi tiles and naki decreases. Hence, the maximum rate of tenpai at the twelfth turn is slightly over 10%, and the rate of tenpai of the player doing naki is lower than that of the player not doing naki. For the above reason, it may be difficult to use this proposed method. V. ESTIMATION OF THE RATE OF TENPAI USING TABU SEARCH In the previous proposed method, the rate of tenpai decreases as the game progresses. This is because a pre-meld, the eyes, and a meld form by using tedashi tiles which will 101

be discovered later, and because isolation tiles increase as the number of the tedashi tiles increase. Therefore, the whole tiles of an initial hand consists of the tiles randomly chosen from tiles in the yama. As the result, tabu search can be used in the estimation of the rate of tenpai. See Appendix for the detail of tabu search. Tabu search is effective to some extent in movement decision of Mahjong [2]. We propose a method of estimating the rate of tenpai by using tabu search. A. Evaluation Function The evaluation function, which is important for tabu search, consists of four evaluation criteria. These are shanten, the number of effective tiles, the degree of similarity, and the score. They are applied according to the priority. After all evaluation criteria are applied, the evaluation value of the current solution is compared with that of the best solution. The best solution is updated if the evaluation value of the current solution is better than that of the best solution, and other evaluation values having higher priority are the same. In this paper, the degree of similarity is defined as how similar is the winning hand to the initial hand. It is obtained by multiplying probabilities that the changed tiles are drawn to compare a winning hand with the initial hand. The probability P(i, x) that player i draws a tile x is represented as Eq. (2) by using the number of tiles N(i, x), which is the number of tiles x the player i has, and T(x), which is the number of visible tiles x by all players. 4 N(i, x) T(x) P(i, x) = 136 i x N(i, x) (2) x T(x) If the change does not occur, the degree of similarity becomes the maximum value 1. B. Method Using Tabu Search to Mahjong To use tabu search, the number of iterations, neighborhood values, the number of moves in a tabu list, and how to discard tiles are necessary to be decided [2]. The number of iterations is set to the number of tedashi tiles in order that the player discarding a lot of tedashi tiles has predominance. Hence, the maximum number of iterations is eighteen. A neighborhood value should be similar to the pre-value. In Mahjong, neighborhood values can be obtained by changing one tile with another one. Evaluating several searched neighborhood values, the solution is moved to the best value. The current hand and the best neighborhood value are recorded in a tabu list and the move to these tiles is prohibited. As the current hand is close to a winning hand, the effective tiles decrease in Mahjong. If the number of neighborhood values generated is small, the probability of finding a winning hand by search becomes low. So, the number of neighborhood values generated is necessary to be large in order to arrive at a winning hand. Therefore, in this method, a neighborhood value is set to the number of tiles the target player discards. The number of moves in a tabu list is set to six. The value is the same as the previous work [2]. Moreover, it is said that the effective size of a tabu list is in the range from 5 to 12, and 7 is the best [7]. The whole tiles of the initial hand consists of the tiles chosen randomly from blind tiles differing from the first method. C. Experiment The experimental method is the same as that of the first method. 1) Experiment 3: The blind tile is randomly added one by one until the hand has thirteen tiles. The tedashi tile is not added. The result is shown in TABLE III. As the result, the rate of tenpai doing naki is higher than that of tenpai not doing naki. Moreover, the result is similar to that of Experiment 2. The estimation is possible even if the blind tile is randomly added one by one. TABLE III RATE OF TENPAI OF EXPERIMENT 3 test data the rate of tenpai R[%] 1-A 0 1-B 0 1-C 0 6-A 0.613 6-B 0.128 6-C 0 12-A 11.546 12-B 2.790 12-C 4.474 2) Experiment 4: This experiment changes the ratio of the generation probability. Here, we conduct the experiment by varying the ratio of the generation probabilities of an isolation tile, a pre-meld or the eyes, and a meld as max (5-D, 1) : T : D. The value D is the number of tiles the target player discards, and the value T is the number of tsumo-giri tiles. The result is shown in TABLE IV. The number added to the tail of the test data name is the serial number of the game. The data of 12-F-2 do not exist because the game is over. As the result, the rate of tenpai tends to be high as the game progresses. D. Discussion Comparing TABLE I with TABLE III, the result at the sixth turn was not improved, while the result at the twelfth turn was much improved. The reason why the result is improved is that the deeper the number of iterations is, the better the probability arriving at the winning hand is. In the first turn, it is considered that the result is good because the probability of tenpai is much low. The rates of tenpai that a one-player Mahjong player only approaches the winning hand have been calculated to 0.08% at the first turn, 6.36% at the sixth turn, and 34.76% at the twelfth turn [9]. Comparing TABLE IV with them, the tendency of the rates is quite similar. It is considered that the rates of tenpai could be estimated well. It is considered that the rate of tenpai could be estimated depending on the state represented only with the proportion of the number of the tedashi tiles and that 102

TABLE IV RATE OF TENPAI OF EXPERIMENT 4 test data the rate of the number of the number the number tenpai R[%] tedashi tiles of shanten of naki 1-A-1 0.014 1 5 0 1-B-1 0.028 1 2 0 1-C-1 0.028 1 6 0 1-D-2 0.029 1 6 0 1-E-2 0.014 1 2 0 1-F-2 0 0 3 0 1-G-3 0 0 3 0 1-H-3 0 1 5 0 1-I-3 0 1 3 0 1-J-4 0.029 1 3 0 1-K-4 0 1 3 0 1-L-4 0 1 3 0 6-A-1 10.617 5 4 0 6-B-1 7.078 3 1 0 6-C-1 4.267 3 5 0 6-D-2 7.082 4 4 0 6-E-2 0 0 2 0 6-F-2 12.664 5 2 0 6-G-3 4.902 3 2 0 6-H-3 6.712 6 2 0 6-I-3 13.788 6 2 0 6-J-4 8.791 5 1 0 6-K-4 8.575 4 2 0 6-L-4 17.981 6 2 0 12-A-1 29.818 6 2 1 12-B-1 17.764 5 1 0 12-C-1 25.659 6 6 0 12-D-2 45.293 8 3 1 12-E-2 7.916 3 1 0 12-F-2 - - - - 12-G-3 33.333 7 3 0 12-H-3 40.836 8 1 0 12-I-3 52.756 10 0 0 12-J-4 33.427 7 0 0 12-K-4 29.147 6 0 0 12-L-4 75.352 12 3 1 consisted of tiles in the yama because the hand of an objective player (opponent player) is blind. From this difference, it is considered that it is difficult to estimate the optimal movement for opponent player, while the estimation of rates arriving at a winning hand is effective to some extent. VI. CONCLUSION In multi-player games with imperfect information, e.g., Poker and Mahjong, they have imperfect information differing from Shogi and Reversi. Therefore, it is difficult to decide optimal movements. In Mahjong, fold is very important, and it is necessary to check predominance between a player s hand and other players hands. To this end, it is required to estimate the rate arriving at a winning hand. This paper proposed the estimation method of the rate of tenpai in order to implement Mahjong players exceeding human top players. The proposed methods used random simulation taking the discarded tiles in consideration, and tabu search to decide optimal movement not using the record of a game. The experiments were carried out in order to evaluate the proposed methods. In the experiment of evaluating the rate of tenpai, the rate of tenpai can be estimated to some extent depending on the state by using tabu search. As the result, the possibility of the effective search of the estimation of the rate of tenpai by using tabu search was indicated. The proposed method does not consider shanten of the opponent s hand. Estimating shanten of the opponent s initial hand is in future work. Adding test data, and experimenting for a lot of states are also in the future work. Estimating the dangerous tile and proposing the method of fold by using the rate of tenpai are included in the future issues. of the tsumo-giri tiles, which is obtained by subtracting the number of tedashi tiles from that of the drawn tiles. Moreover, the rate of tenpai tends to be high when the player does naki, so the probability arriving at the winning hand increases by using tabu search according to the decrease of the number of states. For the above reason, the proposed method using tabu search is effective to some extent in the estimation of the rate of tenpai. However, the issue is that the hand whose shanten is low makes the rate of tenpai low, and the hand whose shanten is high makes the rate of tenpai high because the search only depends on the number of tedashi tiles and tumo-giri tiles, and does not consider the opponent s initial hand and the number of effective tiles opponent s player draws. For example, if the opponent s initial hand is extremely good, e.g., Player E, shanten becomes low when the number of tedashi tiles is small. In order to improve this, if the number of tedashi tiles is small in middle stage, it is necessary to make shanten of the initial hand low. Comparing previous work [2] with the proposed method, the proposed method is similar to the previous one. The major difference is an initial hand of every search. In the previous work, all initial hands become the hand of an objective player. In contrast, in the proposed method, an initial hand is randomly REFERENCES [1] N. A. Risk and D. Szafron, Using counterfactual regret minimization to create competitive multiplayer poker agents, Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, volume 1-Volume 1, International Foundation for Autonomous Agents and Multiagent Systems, pp. 159-166, 2010. [2] K. Yoshimura, T. Hochin H. Nomiya, Searching Optimal Movements in Multi-Player Games with Imperfect Information, Proceedings of 15th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2016), pp. 795-800, 2016. [3] N. Mizukami, R. Nakahari, A. Ura, M. Miwa, Y. Tsuruoka, T. Chikayama, Adapting One-Player Mahjong Players to Four-Player Mahjong by Recognizing Folding Situations, Proceedings of the 18th Game Programming Workshop, pp. 1-7, 2013. (in Japanese) [4] N. Mizukami, Y. Tsuruoka, Building computer mahjong players by modeling opponent players using game records and a Monte Carlo method, Proceedings of the 19th Game Programming Workshop, pp. 48-55, 2014. (in Japanese) [5] N. Mizukami, Y. Tsuruoka, Building Computer Mahjong Players Based on Expected Final Ranks, Proceedings of the 20th Game Programming Workshop, pp. 179-186, 2015. (in Japanese) [6] Y. Nemoto, K. Komiya, Y. Kotani, Estimation of Imperfect Information using CRF in Mahjong, Proceedings of the 17th Game Programming Workshop, pp. 155-158, 2012. (in Japanese) [7] F. Glover, Tabu Search PartI, ORSA Journal on Computing, Vol. 1, No. 3, pp. 190-206, 1989 [8] European Mahjong Association, http://mahjong-europe.org/, 2016. [9] S. Tsunoda, Tenhou, http://tenhou.net/, 2016. [10] http://mahjong.ara.black/sim/shanjun.htm, 2016. 103

A. Tabu Search APPENDIX 1) Outline: Tabu search is meta-heuristics used for solving combinatorial optimization, and devised by Glover in 1989 [7]. Traveling salesman problem, graph coloring, and scheduling problems are part of combinatorial optimization. This algorithm searches several neighborhoods of one state and moves it to the best one. At the same time, this move is recorded to a set called a tabu list. By prohibiting moves in the tabu list, it can search a path with preventing a loop. Tabu search can prevent a solution from converging to local optimum because no matter how bad things get, all moves except for the moves in a tabu list can be used. 2) Notation: Combinatorial optimization is represented as Eq. (3). From Eq. 7, T is represented as Eq. (8). T = {s 1 : s = s h for h k t} (8) The effective number of moves in a tabu list is experimentally obtained. This is from 5 to 12, especially 7. Minimize c(x) :x X. (3) The objective function c(x) may be linear or nonlinear, and x is a discrete value and a member of X. To solve this problem, a move s that leads from x to another is necessary to be defined. A move s is defined as a mapping defined on a subset X(s) ofx as shown in Eq. (4). s : X(s) X. (4) Using s S on x, the set S (x) is defined as Eq. (5). S (x) = {s S : x X(s)}. (5) The set S (x) is referred to as a neighborhood function. 3) Algorithm: The algorithm of tabu search consists of four steps as follows. 1) Select an initial x X and let x := x. Set the iteration counter k = 0 and begin with T empty. 2) If S (x) T is empty, go to Step 4. Otherwise, set k := k + 1 and select s k S (x) T such that s k (x) = OPTIMUM(s(x) :s S (x) T). 3) Let x := s k (x). If c(x) < c(x ), where x denotes the best solution currently found, let x := x. 4) If a chosen number of iterations has elapsed either in total or since x was the last improved, or if S (x) T = φ upon reaching this step directly from Step 2, stop. Otherwise, update T and return to Step 2. s(x) is obtained from a move s and x. Now, OPTIMUM and a tabu list T is necessary to be defined. Using s k (x), OPTIMUM is defined as Eq. (6). c(s k (x)) = Minimum(c(s(x)) : s S (x) T). (6) The update of T is defined as Eq. (7). T := T s 1 k t + s 1 k (7) where criterion s 1 is the inverse of the move s; i.e., s 1 (s(x)) = x. The variable t is the number of moves in a tabu list. When k t, the reference to s 1 k t is disregarded. 104