Intelligent Gaming Techniques for Poker: An Imperfect Information Game

Size: px

Start display at page:

Download "Intelligent Gaming Techniques for Poker: An Imperfect Information Game"

Daisy Evans
5 years ago
Views:

1 Intelligent Gaming Techniques for Poker: An Imperfect Information Game Samisa Abeysinghe and Ajantha S. Atukorale University of Colombo School of Computing, 35, Reid Avenue, Colombo 07, Sri Lanka Tel: [+94 77] , samisa_abeysinghe@yahoo.com Abstract This paper is a review paper on research done on imperfect information games. It is focused on the card game poker, on which much research has been done over the years. The paper reviews the solutions researchers have proposed for various problems of the game of poker. The main challenges in this imperfect information game are searching, learning, opponent modeling and deception. Machine learning, pattern recognition, adaptive evolutionary techniques, statistics and simulation are some of the technologies explored in this highly complex problem domain. This paper introduces the rationale for research on imperfect information games, describes the techniques used in solving the problems, review some successful programs built to solve such problems and finally discuss research opportunities in this area. Key words: Bayesian learning, Imperfect information, Intelligent gamming, Machine learning, Pattern recognition 1. Introduction Research on imperfect information has come to light in the recent past. With the advent of the Internet and the new forms of business opportunities made possible by the ICT technologies, there is elevated interest and activity in research on dealing with imperfect information. There are many situations that imperfect information could come in handy in the Internet era. Ecommerce applications such as online auctions, online gaming systems, and online business applications requiring negotiation /bargaining are some of the few applications that would benefit from the automated capability of dealing with imperfect information. While humans are so skilled in dealing with imperfect information, it will take some more time before the computers are made capable of dealing with imperfect information. This very reason has drawn many researchers to explore the possibilities in this field. Research on imperfect information often focuses on a particular case rather than taking up the challenge in general. The main reason for this is that dealing with imperfect information effectively is often problem specific. It does not make sense to build a generic model to deal with context specific situations for various problems. Poker has been one of the most researched problems to explore the techniques to deal with imperfect information. Poker is a card game that involves probability (some call it luck chance of getting a good card hand), imperfect information (hidden cards) and deception (bluffing). One s success depends not only on the cards that (s)he gets, but also on the ability to predict the opponents actions (often called opponent modeling)[3]. There have been numerous research efforts, many of them between 1998 and to date, on how to make computer programs play effective, world class poker. Various researchers have adopted different approaches to deal with the problem. 1.1 The Poker Game There are several variants of the Poker game. Various researchers have taken up different variants of the game for their research. Texas Hold'em and Five Card Stud are two variants of the Poker game that researchers often pick. In Texas Hold'em, multiple players can play the game and each player is dealt two hidden cards. Then five open cards are dealt face up, that are shared among the players. In Stud game each player gets his/her own cards, and there are no shared cards. Some of the cards are open and some are hidden. Billings et al, 2001 [3], provide

2 comprehensive details on the Texas Hold'em game. It also details the rules of the game (dealing cards and betting rounds). Korb et al, 1999 [12], provide details on the Poker Five Card Stud game and the hand evaluation rules. As the aforementioned papers provide ample details on these forms of the game, we do not wish to detail those in this paper. 2. Research The Department of Computer Science of University of Alberta, Canada, has a Poker research group. They have done much research in this area and have produced several valuable research papers on the subject. Their famous poker program Poki [1] uses learning techniques to build statistical models of each opponent and is capable of dynamically adopting itself to exploit observed patterns and techniques. They have used Texas Hold'em as their test bed. There has been some research at the School of Computer Science and Software Engineering, Monash University, Australia. Their poker program named Bayesian Poker Program (BPP) [12] uses a Bayesian network to model a poker player. BPP is built to play Five Card Stud game. Researchers from many other institutes have done considerable amount of research on Poker. However, none of the programs developed so far are yet capable of challenging best human players. The programs play effectively with humans, but there are still areas to be explored. 3. Techniques Billings et al, 2001 [3], in their paper titled The Challenge of Poker explains why, card games like poker are harder to solve than the games like chess. Traditional methods like deep search could be used to solve problems faced in games such as chess, because those are perfect information games. In other words, one could generate and evaluate the combinations of possibilities from one state to the next state of the game, with reasonable computing power and within acceptable time limits. However, in games like poker, where some or all of the opponents cards are hidden, it is not possible to do a deep search. There are several approaches to research on artificial intelligence with Poker. One is to study simplified variants. The other is to look at a subset of the game, and address sub problems in isolation [3]. Game theory and mathematical techniques have been used with simplified variants of Poker [11]. There have been some efforts to apply game theory for full scale poker [2]. Machine learning techniques have been used with a particular subset of Poker [16]. Many studies look into two player games. The simplifications, while reducing the complexity, may also destroy the most challenging aspects of the problem. However, given that the problem is too complex, it is often acceptable to focus on a simpler version of the game. As research becomes mature, solving the initial problems the real game would be taken up by researchers without simplifications. Poki program by Billings et al, 2001 [3], uses an approach that tackles the entire problem, choosing a real variant of poker and addressing all the considerations necessary. This minimizes the risk of loosing vital challenging aspects of the problem. Koller and Pfeffer, 1997 [11], have researched on the possibility of investigating Poker from a game-theoretic point of view. The algorithm they used in their system, named Gala, avoids the exponential blow up when converting the problem into normal form. However, the translated problem that they generate is still proportional to the size of the game tree. This makes it prohibitively large for most common variants of Poker. In his partition search, Ginsberg, 1997 [9], defined equivalence classes. The rationale was to use abstraction, collecting many instances of similar problems into a single class. He used this technique to solve the Bridge game. There are many states in Poker game that are isomorphic to each other, and hence the same concept could be used effectively. Another approach is to construct a shallower game tree. However unlike in perfect information games, the states of an imperfect information game tree are not independent. Often it is possible to take a move that looks good at shallow depth, but that could be revealed as bad at greater depth [13]. Alpha beta framework is effective for two player deterministic games. The search is done in full breadth but limited depth. For imperfect information games, full depth but limited breadth search is more appropriate, as it is not possible to examine full game tree. Even though this technique does not consider each and every possibility in the game tree, for non-deterministic games, this technique helps to reach a decision with sufficient confidence level. Repeated trials on the search tree have diminishing returns beyond a certain point [14]. Billings et al, 1998 [5], have defined five requirements for a world class Poker player. They are hand strength, hand potential, bluffing, unpredictability and opponent modeling. Unpredictability is to make sure that the other players could not use the imperfect information available to them to do successful predictions. modeling is the reverse of unpredictability where a player tries to utilize the available imperfect information to the maximum to predict the opponents successfully.

3 Korb et al, 1999 [12], have researched on the use of Bayesian network for solving the problems related to Poker. The rationale is that if the accurate probability of winning could be computed, then decisions could be taken on the next action to be taken with confidence. However, probability alone cannot ensure the success in the game, as it is dependent on opponent actions and their hand strength. 4. Poki The Poki program [3] has a dealer and multiple players, either human or computer. In the design of Poki, the betting strategy is divided into two parts, pre-flop strategy and post-flop strategy. Flop is the three shared cards that are dealt in the second round in Texas Hold'em. 4.1 Betting Strategy In pre-flop betting, there is little information as all the cards are hidden. A technique called roll-out simulation [3] could be used here. As the information available is less, a formula based expert system that uses information from roll-out simulations could be used in pre-flop stage. A future enhancement to pre-flop betting is to make the program autonomous, adopting to the observed game conditions and making context sensitive decisions on its own. A refinement to the roll-out simulation is to use repeated iterations of the technique, where the previous results govern the betting decision. After flop, there is more information available. In Poki, post-flop betting decision is taken using three steps. Compute effective hand strength (EHS) Translate EHS into action probability Pick an action randomly from probability distribution (unpredictability) In order to make the betting decisions based on the current game context and historical information, simulation based betting strategy is used. Here an estimate is established for each betting action. In order to solve the problem of not being able to carry out exhaustive game tree search, a selective sampling technique is used [6]. 4.2 modeling modeling is based on opponent's actions and is useful in two ways, first, deduce the strength of opponent's hand and second, predict opponent's actions in a given situation Generic opponent modeling (GOM) is a technique where one assumes the opponents would use a betting strategy similar to his/her own. Another technique is to assume that an opponent's actions are to be similar to the actions in the past. This is called specific opponent modeling (SOM) [15]. However when multiple players are involved in a game, blended with many complexities of many possible contexts, opponent modeling becomes one of the most difficult problems of machine learning [3]. Billings et al, 1998 [4], describes a statistical model for opponent modeling. They estimated some initial weights based on the behavior at the start of the game play and used a reweighing mechanism using EHS. Poki uses a standard feed forward neural network for opponent modeling. This has been trained on contextual data. The input layer takes in values corresponding to the properties of game context. The network consists of three outputs corresponding to the three actions fold, call and raise. Outputs of the network is cross validated against data collected from past games with each opponent [8]. Decision trees and expert systems could also be used for predicting opponents. However empirically it has proven that neural network based solution outperforms those techniques [7]. There have been some efforts to use evolutionary strategies for opponent modeling. Kendall and Willdig, 2001 [10], have looked into simple adaptation strategies to test out the use of adaptation with poker. They have used four playing styles: loose passive, loose aggressive, tight passive and tight aggressive, and tested the adoptive player against those. The evolution algorithm they have used is very simple; if player wins the learning value is increased and if player looses the learning value is decreased. Barone and While, 1999 [1], used a more advanced evolutionary strategy. Their poker player consists of a hypercube of populations of candidates (i.e. possible actions). The hypercube is segmented into two dimensions, one representing the position of play (early, middle or late) and the other representing risk management (possible actions taken by player representing conservative or aggressive play). Each element of the hypercube consists of a population of N candidates representing the functions corresponding to the likelihood of an action. Player's success on a given hand is used as feedback, based on which N/2 candidates are kept as parents of the next generation and rest discarded. Discarded ones are replaced with mutated copies of the selected parents.

4 5. BPP The Bayesian Poker Program (BPP) by Korb et al, 1999 [12], details a Bayesian Poker Network that is based on Bayesian network theory. This program is limited to two players. It incorporates the opponent's actions and known cards to model the opponent. There are some more assumptions in BPP that simplifies the problem to be solved: final hand types are independent (this makes sure that the network structure is a simple poly tree); opponent's action depends only on hand type (this reduces the number of nodes in the tree). The nodes representing hand types should consider not only the main hand categories, but also need to consider sub categories within main hand categories to be successful. For this BPP subdivides main hand categories into several sub categories. As an example, a hand with a pair can fall into one of 9 or lower, 10 to J and Q to A. Similarly other hand categories could be sub divided. BPP uses action matrices to represent rounds of betting. The matrices use conditional probabilities to represent possible actions per round given the opponent s current hand type. The matrices are adjusted over time using relative frequencies of opponent behavior. Belief updating is done by standard Bayesian network propagation rules. BPP uses a concept called betting curves to reach an action decision. These curves plot probability of wining versus probability of a given action. Each action has its own probability curve for each round of play. By moving a probability curve relative to the other curves, one could manage the nature of play, either aggressive or conservative [12]. Bluffing too is incorporated with a probability parameter. Once an action is selected using probability curve, BPP tries to over represent its action with some probability. This way if the suggested action by probability curves is to pass, the actual action taken would be to bet. 6. Discussion Searching, opponent modeling, deception strategies and machine learning are the challenging issues to be addressed in building programs that can play effective poker. There are numerous alternative solutions for searching problem. Full depth but limited breadth search has been used successfully. Despite the numerous researches already done, understanding the opponent is still challenging. Programs become slower to adjust as more information is collected on an opponent. This phenomenon is called build up of inertia [3]. It is the general understanding that opponent modeling is impossible to solve perfectly. Better players always change gears. A program playing poker must be deceptive. At the same time the program must be able to understand if and when its opponents are deceptive. This is difficult to deal with past data and pattern recognition. A historical decay function is proposed to deal with this situation [3]. Identifying useful features to focus on from gathered data is challenging. Learning in noisy domains is proposed [3]. Neural networks have been tried effectively for learning. It would be interesting to experiment with the combination of a neural network and an evolutionary strategy (genetic algorithm). The BPP by Korb et al, 1999 [12], can only deal with a two player game. The Bayesian network could be extended to cater for multiple players. Even for two players, there are numerous combinations of improvements that could be tried out. Especially improvements could be made to the technique used to decide on when to be aggressive or conservative. Rather than being purely random, this decision could be made based on learning outcomes. Poki program described earlier in the paper uses a neural network for learning. Combining Bayesian network with learning capabilities of a neural network is a possible research area. When using a Bayesian network, it will be useful to model the conservativeness or aggressiveness of the opponent. An opponent bluffing node could be used for this purpose [12]. BPP uses a distinct Bayesian network for each round of play. A dynamic Bayesian network could be used to model the interrelation on rounds of play. The two main programs that are the outcomes on research on Poker have used very different approaches. Poki tries to solve the real problem with minimal simplifications. BPP uses many simplifications. Thus it would be unfair to make comparisons between them. Poki has never tried out Bayesian techniques, however it has incorporated many useful techniques for opponent modeling, searching, bluffing and machine learning. It would be nice to combine those concepts with Bayesian approach and evaluate the outcomes. 7. Experiments Research was done using a Bayesian network with Texas Hold'em poker game (BPP [12] used Five Card Stud poker). Experiments were done using Bayesian Poker network by Korb et al, 1999 [12], as the basis. Few improvements were done to the network as discussed in the earlier section of this paper.

5 The improved Bayesian network used for experiments is shown in Figure 1. The probabilities required for the conditional probability tables of the Bayesian network were estimated by playing 1,000,000 games using two always-call players (players who play the game till the end of the game). The network was used in one of the players in a two player game, with the other player using only hand evaluation logic for selecting an action. 6,000 hands were played out and the results indicate a noticeable improvement over using mere hand evaluation logic. Program Win ability to adjust the nature of play to the current game conditions. Agregate Amount Won Logical Player Games Played Bayesian Player Final Hand Program Final Hand Figure 2: Results of play between Bayesian player and the logical player 8. Further Work Up Cards Current Hand Bluff Action Program Current Hand Figure 1: Proposed Bayesian network for poker Figure 2 shows the aggregate results for the games played. After 6,000 hands of play, Bayesian network based player had earned nearly 25,000 points more than its opponent. (Lower bet limit was 10 points. Upper bet limit was 20 points. Hence according to rules of play, the least amount a player could win was 20 points.) This clearly indicates the effectiveness of the Bayesian player. The success could be attributed to the ability to predict the opponent using the Bayesian network as well as the The network could be improved to relate opponent s bluffing to the current hand of the program. This could be done by having a link from the Program Current Hand node to the Bluff node. The program now initializes the network for each round. This means that for each round of play in a single game, it uses a new Bayesian network. Sensitiveness to the game context could be improved by using the same network and updating it through the rounds of play, based on the results of the previous rounds. However this would require the use of a dynamic Bayesian network. The experiment was done only with a logical player. There are plans to experiment with Bayesian networks with varying structures to identify their effectiveness. Especially it would be interesting to explore the memory usage and execution time of different networks and the tradeoffs between performance and resource utilization. The Bayesian network could be improved to learn the game context and update the conditional probability tables with experience. An open research area is to experiment how to use a neural network with the Bayesian network to improve learning capabilities. It is expected to explore the possibilities of extending this network for games with multiple players. References [1] Barone Luigi and While Lyndon, 1999, An Adaptive Learning Model for Simplified Poker Using Evolutionary Algorithms, Proceedings of the Congress on Evolutionary Computation

6 [2] Billings Darse, Burch Neil, Davidson Aaron, Holte Robert, Schaeffer Jonathan, Schauenberg Terence and Szafron Duane, 2003, Approximating Game-Theoretic Optimal Strategies for Full-scale Poker, Proceedings of IJCAI-03 (Eighteenth International Joint Conference on Artificial Intelligence) [3] Billings Darse, Davidson Aaron, Schaeffer Jonathan, and Szafron Duane, June 2001, The Challenge of Poker, Artificial Intelligence Journal, vol 134(1-2), pages [4] Billings Darse, Papp Denis, Schaeffer Jonathan, and Szafron Duane, 1998, Modeling in Poker, Proceedings of AAAI-98 (15th National AAAI Conference) [5] Billings Darse, Papp Denis, Schaeffer Jonathan, and Szafron Duane, 1998, Poker as a Testbed for Machine Intelligence Research, Proceedings of AI'98 (Canadian Society for Computational Studies in Intelligence) [6] Billings Darse, Peña Lourdes, Schaeffer Jonathan, and Szafron Duane, 1999, Using Probabilistic Knowledge and Simulation to Play Poker, Proceedings of AAAI-99 (Sixteenth National Conference of the American Association for Artificial Intelligence) [7] Davidson Aaron, 2002, Modeling in Poker: Learning and Acting in a Hostile Environment, M.Sc. thesis, Department of Computing Science, University of Alberta [8] Davidson Aaron, Billings Darse, Schaeffer Jonathan, and Szafron Duane, 2000, Improved Modeling in Poker, Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI'2000), Las Vegas, Nevada, pages [9] Ginsberg Matthew L., 1997, Partition Search, AAAI National conference, pages [10] Kendall Graham and Willdig Mark, 2001, An Investigation of an Adaptive Poker Player, Australian Joint Conference on Artificial Intelligence [11] Koller Daphne and Pfeffer Avi, July 1997, Representations and Solutions for Game-Theoretic Problems, Artificial Intelligence, 94(1), pages [12] Korb B. Kevin, Nicholson E. Ann and Jitnah Nathalie, 1999 Bayesian Poker, Uncertainty in Artificial Intelligence, pages [13] Lincke Thomas, 1994, Perfect Play using Nine Men's Morris as an example [14] Peña Lourdes, 1999, Probabilities and Simulations in Poker, M.Sc. Thesis, School of Computer Science and Software Engineering, Monash University [15] Schaeffer Jonathan, Billings Darse, Peña Lourdes, and Szafron Duane, 1999, Learning to Play Strong Poker, ICML-99 Proceedings of the Sixteenth International Conference on Machine Learning [16] Smith Stephen F., August 1983, Flexible learning of problem solving heuristics through adaptive search, IJCAI- 83, pages

CASPER: a Case-Based Poker-Bot

CASPER: a Case-Based Poker-Bot Ian Watson and Jonathan Rubin Department of Computer Science University of Auckland, New Zealand ian@cs.auckland.ac.nz Abstract. This paper investigates the use of the case-based