BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang

Size: px
Start display at page:

Download "BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang"

Transcription

1 Introduction BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang Texas Hold em Poker is considered the most popular variation of poker that is played widely online and in casinos. As an imperfect information game, poker remains a challenging game to play because for most of the game, the opponents cards are unknown and the future is unpredictable as well. Even with Heads-Up Limit poker, a variant of poker that involves fixed betting and only two players, the challenges of building an agent that has to take into account possible states, that can accurately modeling the opponent without cheating (i.e. knowing his cards), and that can understand risk taking makes building a poker bot a difficult endeavor. The goal of this project is to attempt to create an AI that could approximate the perfect heads-up (one opponent versus another opponent) fixed limit Texas Hold Em poker strategy using techniques taught in class. We define the strategy as the ability to model opponents and the game sufficiently enough in order to make a profit from users. Rules of the Game The version of the game we use to test our bots is Texas Hold em Heads-Up Limit Poker with a reverse blinds format. In heads-up limit poker, two players each begin with two hole cards that are from the standard 52 card deck. In reverse blinds, the second player is designated as the dealer who places the small blind and the first player places the big blind. For the reverse blinds format, the second player goes first on the preflop and the first player goes first on the rest of the stages. The blinds themselves are fixed amounts of money and for our purposes the small blinds and big blinds are $5 and $10 respectively. The game has multiple "rounds" or hands and for each hand there are four stages: the pre-flop, the flop, the turn, and the river. At the flop, three cards are placed on the board. For the turn and river, one additional card at each stage is placed on the board. At each stage of the hand, each player can choose to raise (increasing the bet), call (matching the opponent s bet or continuing the game), and fold (withdrawing from the game). Although there exists betting and checking, we simplify the terms to allow for an easier explanation. The actions for the stage continue until the maximum number of raises for the stage has been reached, both players have called, or one player has decided to fold. For our game, the maximum number of raises allowed per stage is 4 with the exception of 3 for the pre-flop. A player can win the amount of money bet and placed in the pot if the opposing player folds or the game has reached the river stage and both players have called at which a showdown occurs. At the showdown, the player with the best five-card hand (two from hole cards and three out of the five board cards) wins at takes the entire pot or ties at which point the pot is split. Previous Work In the domain of creating bots for heads-up limit poker, there have been many different approaches over the years. In 2004, Billings et al. implemented a version of Expectimax search which they dubbed Miximax and Miximix which computed the expected value at nodes using probabilities of the opponent s possible hand. The group chose a mixed strategy based on the respective weighted sums of each action where they make an assumption that all chances occur uniformly at random [Billings et al., 2004]. In 2007, CMU researchers Gilpin and Sandholm used new approximation methods that provided a heuristic for evaluating the state space of the board and used a simulation to estimate the rewards at the end of the game. Their approach could efficiently take into account a large amount of the game tree when computing the best possible action [Gilpin and Sandholm, 2007]. 1

2 The most advanced development in recent years, however, has been the introduction of counterfactual regret minimization (CFR) that has been extensively used by researchers at the University of Alberta. In 2015, the research group claimed that they had solved the game of heads-up limit poker through Cepheus, a bot that used a variant of CFR (dubbed CFR+) that would allow them to compute a Nash equilibrium of the game [Bowling et al., 2015]. Most of the previously developed projects, however, did not use Q-learning, but rather a different approach to exploring the game tree in a way that utilized game theory. As a result, these approaches were complementary to our approach. Infrastructure To get two bot poker players to play against each other, we used an existing infrastructure from the website: The existing infrastructure provided us the following information and abilities: the ability to play heads-up limit with two poker players that were constructed based off their example template, the ability to examine the game state, and the ability to log the information of the match for inspection. We revised the existing code to allow us to play games with random hand deals, construct our own custom states, and create multiple players for each different approach. As a result, a majority of the code for online Q-learning was written in C and the offline Q-learning (which generated a state to policy mapping) written in Python. Baseline and Oracle Our baseline is a bot that chooses to raise, check, or fold randomly depending on the valid actions that it can take. The random bot chooses to fold 6% of the time (if folding is valid) and chooses to either raise or check for the remainder of the probability (if raising and checking is valid). If one or more actions is invalid, we would normalize appropriately. Our oracle is a pure CFR (Counterfactual Regret Minimization) bot, which was the basis for many top AI bots in the field now including Cepheus and is found here: open-pure-cfr. We trained the pure CFR for an hour with over 289 million iterations. Although pure CFR may not be as challenging as an opponent as Cepheus, the open sourced code remains one of the few poker bots that can interface with our infrastructure. Furthermore, we have tested this CFR and it has beat our baseline consistently in the roughly 20,000 hands we played. Model and Algorithm Modeling as a Markov Decision Process Our approach models the game as a Markov Decision Process in which each game position is a state. We then use Q-learning to learn which action to choose based on the game state. The state of the game consists of the following information: T - betting stage (i.e. T {pre-flop, flop, turn, river}) P - player whose turn it is (0 for small blind, 1 for big blind) S - sequence of checks, calls, and raises that have occurred in the current game H - two cards we have B - cards that are on the board At each turn, the player can choose the following actions: fold, check/call, raise. If it is our turn, let us be at a state S in the game. Taking one of the three possible actions from S leads to a hidden probability distribution that determines the new state S. This probability distribution accounts for the subsequent opponent move as well as the next card(s) that appear on the board. Since we don t know how the opponent determines his choice of action, we don t know this probability distribution of states that we could end up 2

3 in. This motivates learning the transition probabilities and rewards with reinforcement learning, and we chose to use the Q-learning algorithm. Q-learning For every state, we want to estimate the reward associated at that state. Denote this value as V (s). Furthermore, for each state-action pair (s, a), we attempt to learn the reward associated for taking action a at state s. Denote this estimate as Q(s, a). Then V (s) = Q(s, a). max a Actions(s) We can learn the value of Q(s, a) by first extracting certain features from the state-action pair. Let φ(s, a) be the feature vector. Then we will approximate Q(s, a) as a linear combination of these features, so Q(s, a) = w φ(s, a), where w is the weights vector. On the other hand, we know that if taking action a from state s leads to state s, then Q(s, a) r +γv (s ), where r is the reward and γ is the discount factor. We assume the discount factor is 1 in this game. Then our goal is to minimize the quantity (Q(s, a) (r + V (s ))) 2 (s,a,r,s ) It follows that we must learn the weights w so that the above quantity is minimized. Taking the gradient with respect to w, we find that for every tuple (s, a, r, s ), we can update w as w w η[q(s, a) (r + V (s ))]φ(s, a) where η is the step size. For this game, we let η be decreasing over time and equal to the reciprocal of the number of occurrences of the feature vector φ(s, a). To choose an action at a state using the Q-learning algorithm, we can take the action that gives us the greatest reward. That is, if we are at state s, choose action as arg max Q(s, a). However, as we will a Actions(s) later see, we can also choose the action based on the epsilon-greedy approach: with probability ɛ, we choose an action uniformly at random; otherwise, we take the action that gives us the greatest reward. This allows us to explore more of the state space. Data Mining for Offline Q-learning To train the bot in our initial approaches, we fed the Q-Learning algorithm data in form (state, action, reward, new state). The data was obtained from past games from the Annual Computer Poker Competition (ACPC). For every action each player made, we created a state object that stored the current state of the game (phase of the betting, number of raises in the current phase, number of times the opponent raised, and hand strength). To determine the hand strength, we used an external library called Deuces to evaluate the poker hand. Then to create the (s, a, r, s ) format, we looped through every state in the game and found the immediate next state that has the same player under the gun. All states have reward 0 except for when the action is fold or both players show. In those two cases, the reward is the net earning from the pot. After processing all of the data provided by ACPC, we ended up with approximately 161 million (s, a, r, s ) segments. We inputted all of these pieces to the Q-Learning algorithm, which then computed the optimal policy for each state. Approaches, Results and Analysis Throughout the entire project, we took multiple approaches to improving and enhancing our poker bot. Our goal evolved from minimizing loss from our oracle to maximizing the amount of money we could receive 3

4 from it. For each approach, we ran at least 250,000 different rounds or hands against our oracle and our baseline. Approach 1: Offline Q-learning with Hand Categorization For the first approach, we selected the following features from the state: T - betting stage (i.e. T {pre-flop, flop, turn, river}) X - total raises in this stage Y - total opponent raises H - categorization of hand Note: The hand categorization is a string that tells us the best possible hand that we currently have (among our two cards and the cards on the board). Valid hand categorization strings include high card, pair, two pair, triple, straight, flush, full house, four of a kind, and straight flush. Assuming that there can be at most 4 raises in a stage and that the opponent will raise at most 20 times, our feature space is then approximately = Computing the features takes roughly linear time since we just need to run through the cards once to compute the hand categorization. We then bundled all of these characteristics into one feature (T, X, Y, H). An example of a feature vector φ(s, a) would be {(1, 1, 2, pair, call) : 1}. This would mean that we are at the flop stage and there has been one raise in this stage. In the game, the opponent has raised twice, our best hand is a pair, and we perform the action call. With this feature extractor, we ran Q-learning on the data set from ACPC to compute the values of Q(s, a). Then we ran through all states and obtained the optimal action, printing them into a text file. Our poker bot reads this file and plays the game by taking the action specified in the file. If a specific state does not exist in our policy, the poker bot calls/checks. Results for Approach 1 See Appendix A.i for plots. Approach 1 vs Baseline and Oracle Our Earnings vs. Baseline Our Earnings vs. Oracle -$4,080 -$5,617,610 Our initial approach showed us that with this simple approach of classifying what the best hand we could have at this point, we had large losses against the oracle and neither consistently beat nor lose against the random player. Upon examination, we reviewed that for many of the states, our policy was too aggressive such that we would continually raise even when we were in bad positions. For example, when we had 3,5 off suit as our hole cards, we continued to raise even though a normal, conservative player would have only checked or folded. We attributed this to the fact that a simple classification would not accurately tell us what our hand potential was. Approach 2: Offline Q-learning with Hand Potential Heuristic Rather than considering what is the best possible hand we currently have, we could consider how good our hand could become as the game evolves. This led to a different feature extractor in which we consider the following: T - betting stage (i.e. T {pre-flop, flop, turn, river} P - player whose turn it is (0 for small blind, 1 for big blind) X - total raises in this stage H - representation of hand potential Note: If the stage is pre-flop, then H will be the two ranks of the cards in the hand followed by a 4

5 character that indicates whether the cards have the same suit (no indicator if the hand is a pair) i.e. 97s corresponds to 9 7 suited, 88 corresponds to pocket 8 s, AKo corresponds to Ace King offsuit. Otherwise, the hand strength is a tuple (x 1, x 2, x 3, x 4 ), where x 1 is the number of cards on the board whose rank match at least one of the ranks of the cards in our hand. x 2 is the highest power of a matched card in our hand or any card in our hand if there is no matched card. The power of a card is defined to be the number of cards on the board that are less than or equal in rank to that card. x 3 is the categorization of a flush draw: 1 if there is exactly one card in our hand that matches suit with 3 cards on the board, 2 if the player has a flush and there isn t a flush on the board, 3 if the player has suited cards in hand and there are 2 board cards with that suit, and 0 otherwise. x 4 is 1 if there is a straight draw and 2 if there is a straight and 0 otherwise. We can bound the hand potential heuristic space by x 1 5, x 2 5, x 3 4, and x 4 3. Thus, our overall feature space is approximately = The heuristics can be computed in linear time by sweeping through the cards once. Similar to the first approach, we grouped these characteristics into one feature (T, P, X, H). Then we ran Q-learning on same data set from ACPC to compute the values of Q(s, a) and found the optimal policy. The bot again plays according to the fixed policy, and if a specific game state does not exist in our policy, the poker bot calls/checks. Results for Approach 2 See Appendix A.ii for plots. Approach 2 vs Baseline and Oracle Our Earnings vs. Baseline Our Earnings vs. Oracle $1,951,400 -$2,621,670 With the larger state space, we were better able to categorize the potential of our hand by evaluating the rank of the cards and the possible future of the cards with various draws. Our feature space is now larger (9600 compared to 2880) so with more information, we were able to consistently beat the random bot and mitigate our losses to the oracle compared to our first approach. However, when playing against the CFR bot we found that there were many instances in which our desired state did not have a policy assigned to it. Approach 3: Generalizing Hand Potential Heuristic From the previous approach, we realized that there were approximately 500 states that were not included in the optimal policy. This may have been due to a limitation of our training data as some of the states that arose in play were not present in the training data and so we performed a sub-optimal action instead. As a result, we created a feature extractor that generalizes each state. Taking an action a and the previous features (T, P, X, H) and letting H = (x 1, x 2, x 3, x 4 ), we obtain the following feature vector: {(T, a, P ) : 1, (T, a, X) : 1, (T, a, x 1 ) : 1, (T, a, x 2 ) : 1, (T, a, x 3 ) : 1, (T, a, x 4 ) : 1} Essentially, for a given stage and action, we make each characteristic (player, total raises, and components of the hand potential) into its own feature. This allows us to compute an optimal policy for every state. Again, we continued the offline Q-learning approach by first training on the ACPC data set to compute the optimal policy for all states and then playing according to the fixed policy. Results for Approach 3 See Appendix A.iii for plots. 5

6 Approach 3 vs Baseline and Oracle Our Earnings vs. Baseline Our Earnings vs. Oracle $1,493,040 -$206,975 Using the feature extractor, we still consistently beat our baseline, but more importantly, we have reduced our losses to our oracle. We believe that by writing a feature extractor that generalizes to unseen states, we are able to compute a better policy for the missing states in our previous approach. However, we still performed poorly against the CFR bot and we believe this was caused by limitations in offline training so we could not actively model the strategy of our current opponent. Approach 4: Online Q-learning with Generalized Hand Potential Heuristic Since different poker players play differently, it would be better for the poker bot to be able to counter and adapt to the opponent s strategy. As a result, we implemented an online Q-learning bot that learns the rewards for each state while playing the opponent. For each game, we store the features (using the same feature extractor described in the third approach) for every state that we were in as well as the action we take in that state. Then at the end of each game, we obtain the reward for that game and incorporate the feedback of the (s, a, r, s ) tuples into the weights vector. However, we are now playing the opponent from scratch, so in the first couple thousand games, we cannot yet determine the optimal action at each state. Therefore, we use the epsilon-greedy exploration approach. Initially, we set ɛ 0.1, so with probability 0.1, we choose an action uniformly at random and with probability 0.9, we choose the action that yields the maximum expected reward. This encourages exploration in the beginning. But as the bot plays more games, we decrease epsilon by 10 9 per game so we gradually shift from exploration to exploitation. Results for Approach 4 See Appendix A.iv for plots. Approach 4 vs Baseline and Oracle Our Earnings vs. Baseline Our Earnings vs. Oracle $815,995 $916,895 With online Q-learning, we actively learn from the games we play and experience similar rewards for both our baseline and oracle. We seem to learn the optimal weights within the first fifty to seventy thousand hands that are played through the epsilon-greedy approach since that is when our average reward per thousand hands becomes positive. After that, we exploit the opponent s strategies. Approach 5: Online Q-learning with TwoPlusTwo Evaluator In this approach, we replace the hand potential heuristic with the TwoPlusTwo Evaluator, which allows for us to move away from an imperfect evaluation of the our hand potential. The TwoPlusTwo Evaluator provides a look-up table for any seven card hand in constant time, returning the hand s rank relative to all other hands. Therefore, for the flop, turn, and river, we can compute the probability we win over all possible hands the opponent could have and over all possible remaining board cards. For the pre-flop stage, we cannot brute force over all possible configurations, so we used another pre-computed look-up table with the approximate probabilities of winning [Teofilo et al., 2013]. Furthermore, we added back the feature of counting the number of times the opponent raised because it can give us a rough estimate of how good the opponent thinks his or her hand is and a metric of his/her aggressiveness. Consequently, our feature selection consists of the following: T - betting stage (i.e. T {pre-flop, flop, turn, river}) P - player whose turn it is (0 for small blind, 1 for big blind) 6

7 X - total raises in this stage Y - total opponent raises H - probability of winning based on all possible opponent hands and remaining board cards Bucketing the probability into the nearest integer, we find that our features space is now = Moreover, computing the features is now more expensive, taking ( ) iterations in the flop stage to get the probability we win (we have to loop over the remaining two board cards as well as the two opponent cards). Our feature extractor then generalizes by separating the opponent raises with the win probability. Thus, if we perform action a with a particular state with features (T, P, X, Y, H), our feature vector would consist of {(T, P, X, a, Y ) : 1, (T, P, X, a, H) : 1} We then ran the same online Q-learning algorithm with the same epsilon-greedy approach as in the previous section. Results for Approach 5 See Appendix A.v for plots. Approach 5 vs Baseline and Oracle Our Earnings vs. Baseline Our Earnings vs. Oracle $1,599,390 $1,691,505 The earnings using the TwoPlusTwoEvaluator almost double the earnings in the original online Q-learning algorithm. This is a result of the probability of winning returning a better estimation of a given hand than the hand potential heurisitic. In addition, we are now modeling the opponent by considering the number of times he or she raised. This provides another indicator on how well we fare against the opponent s hand. The drawback, however, is that we now have to store a larger feature space (64000 vs 9600) and computation of the features is much slower ( operations per turn vs 50 operations per turn). Approach 6: Online Q-learning with TwoPlusTwo Evaluator and Chen Formula However, while studying the results of approach 5, we found that the bot occasionally makes suboptimal moves during the pre-flop stages by immediately folding on decent cards (such as having Jack and 8). To fix this problem, we added a separate heuristic for the pre-flop cards called the Chen Formula, which approximates the relative value of all pocket hands. Keeping the rest of the features the same, we then ran the online Q-learning algorithm with epsilon-greedy exploration as in the previous approach [Teofilo et al., 2013]. Results for Approach 6 See Appendix A.vi for plots. Approach 6 vs Baseline and Oracle Our Earnings vs. Baseline Our Earnings vs. Oracle $1,788,160 $1,961,315 With a better pre-flop state, our bot has a better understanding of the strength of its hand at the pre-flop and so is less likely to fold better hands. As a result, we avoid the mistakes we made earlier and improve our earnings by not conceding blinds to the opponent without changing the feature space or time complexity from the previous approach. 7

8 Experimenting with Non-linear Features Note that all of the features of a state delineated in the first six approaches are binary or linear features. However, we also ran tests with non-linear features. In particular, we created a quadratic feature equal to the sum of the squares of the number of opponent raises for each betting stage. This allows the opponent to make a bigger impact on the overall reward for each state. Unfortunately, this feature extractor pair with the online Q-learning did not succeed, losing approximately 5000 dollars per a thousand hands. Reflecting on the results, we found that the player often folded immediately on the pre-flop stage. This could mean that we weighted the opponent raises too highly, causing our bot to be bluffed out by the opponent s actions. Approach 7: Online Q-learning with Hidden States In previous approaches, we can determine exactly which state we are in because all the information that determine a state is available to us during the game. One shortcoming of these approaches is that storing all possible information leads to a large state space, which in turn causes a much slower learning rate. In particular, we could not afford to store the history of opponent s bets during the round and instead had to store the total number of raises in the current phase. Note that knowing the opponent s bets mainly serves to give a better estimate on what kind of hand he actually has. This motivates an approach where we have hidden states that store all the information in the game, (i.e. what hand each player has). In other words, if we knew the opponent s cards, we would have another reinforcement learning problem (note it is not optimal just to bet based on expectation; in fact, that serves as a lower bound on what we reward we could get). We define a state in this reinforcement learning problem as the following tuple: (T, P, X, (B 1, B, B 2 )), T - betting stage (i.e. T {preflop, flop, turn, river}) P - player whose turn it is (0 for small blind, 1 for big blind) X - total raises in this stage B 1 - the bucketed probability that our hand will win assuming the all the possible pairs of cards the opponent could have are equally as likely (the probability we think we will win) B - the bucketed probability that our hand will win against the opponent s hand(the actual probability that we will win) B 2 - the bucketed probability that the opponent s hand will win assuming all the possible pairs of cards we could have are equally as likely (the probability the opponent thinks he will win) Total Reward: $4,447,175 The algorithm converges extremely quickly - in less than a 1000 hands and maintains a steady gain of about $18000 per 1000 hands (180 big blinds per 100 hands - 180bb/100). Of course, we don t actually know the opponent s cards, so the states are hidden. Thus, 180bb/100 is the best we can do (if we somehow guess the opponent s exact cards every time), and the performance of this algorithm depends on how well we can estimate the true state we are in. Our approach involves using a hidden Markov model for the estimation. Let x 1, x 2,, x n be the actions the opponent took, where x i is the action he took on his i-th turn. Suppose we know the emission probability distribution p(x s). Then we on the n-turn we could compute p(s x 1, x 2,, x n ), the probability that the true state is s, by using an algorithm like the forward-backward algorithm. 8

9 Figure 1: Result of 250,000 hands for Known Opponent Cards Q-learning gives us a distribution p(a s), or the probability distribution from which we choose an action from a state. Combining this with p(s x 1, x 2,, x n ), we can marginalize out s, we can obtain p(a x 1, x 2,, x n ) the probability distribution from which we choose an action given the opponent s action. This is all described in [Ivanov et al., 2000]. The question is how to find the emission probability distribution p(x s). We make the assumption where at the end of the hand both players reveal their cards. Thus we can just keep a count of how many times the opponent took action x from state s and obtain a maximum likelihood estimate directly for p(x s). If we don t make this assumption (in reality most version of heads up limit do not have any sort of showing), then we can use an EM algorithm that uses the rewards to obtain estimates of the states in hindsight. This is also described in [Ivanov et al., 2000]. We did not attempt this approach but we believe it will perform quite well. Either way, in the end we get a distribution p(a x 1, x 2,, x n ) from which we sample an action and report the reward back in our Q-learning algorithm. So we need to store the Q-table which requires O(SA) where S is the amount of states we have and A is the number of actions we have. In our case there are only 3 actions, and the number of states is a constant factor times B 3 where B is the number of buckets we choose to use for Q-learning. We also need to store a table for the emission probabilities where there is a probability for each state action pair, so this also requires O(SA) = O(B 3 ) space. Each time a new board card appears, we consider all possible 2 cards the opponent could have and update the probability transitions in our HMM: p(s i s i+1 ). We store these transition probabilities, which naively requires O(S 2 ) space. Normally this would require O(B 6 ) memory but note the dimension of the bucket probability we think we win doesn t matter so we can reduce the memory down to O((B 2 ) 2 ) = O(B 4 ). This dominates the space complexity, so the total space complexity is O(B 4 ). For the time complexity we not only have to populate this transition table which takes O(B 4 ) time (we can actually precompute this), but we also have to compute the probabilities that we win and the probabilities that the opponent thinks he wins. To do this we use Monte Carlo simulation running X simulations gives a time complexity of O(X) per every pair of cards the opponent has. We have to consider O(D 2 ) possible pairs of cards the opponent has where D = 52 in our case. Thus the total time complexity is O(B 4 +NXD 2 ) where N is the number of rounds we play. We ran into some trouble implementing this algorithm, and so the results have been omitted. 9

10 Future Work There is always room to explore different state spaces and feature extractors for the standard online Q- learning algorithms described in this paper. Trying different features for opponent modeling beyond just the amount the opponent raised may allow us to develop a measure of when the opponent is aggressive or conservative. Furthermore, attempting to use different non-linear functions may help us if opponents are dynamically changing their strategy in an attempt to adapt. As for the hidden state Q-learning algorithm, there is a lot of work to be done in finding the best weight we should assign to the opponent s actions for guessing what hand he has. A lower weight means we guess the opponent s hand is closer to uniformly distributed (works well against aggressive players), and a higher weight means that the opponent s hand is more likely to be direct reflection of his actions (works well against conservative players). This choice of weight can be adjusted by adjusting the weight of the Laplace prior during our maximum likelihood estimate of the emission probabilities. Also, as mentioned, an EM algorithm can be used for games when players don t show their cards at the end of every hand. We also look forward to testing our bot against other oracles. Our current oracle, pure-cfr, is based on a GTO (Game Theoretically Optimal) bot and thus does not always play like most humans or other bots. Furthermore, pure-cfr, like many other open sourced bots, is a fixed distribution bot, which means it has a fixed probability distribution over which actions it chooses in a particular state which does not incorporate information from outside of the immediate current round. In particular, it does not adapt over time to opponents, which explains why it loses so badly to many of the approaches we tried. For example, our standard online Q-learning was able to win at a rate of 80bb/100 (80 big blinds per 100 hands), which is better than what we could get if the opponent just folded every hand (75bb/100). 10

11 Appendix A Figure 2: Result of 250,000 hands for Approach 1 played against baseline (left) and oracle (right) (i) Figure 3: Result of 250,000 hands for Approach 2 played against baseline (left) and oracle (right) (ii) Figure 4: Result of 250,000 hands for Approach 3 played against baseline (left) and oracle (right) (iii) (iv) 11

12 Figure 5: Result of 250,000 hands for Approach 4 played against baseline (left) and oracle (right) Figure 6: Result of 250,000 hands for Approach 5 played against baseline (left) and oracle (right) (v) Figure 7: Result of 250,000 hands for Approach 6 played against baseline (left) and oracle (right) (vi) 12

13 Appendix B References D. Billings, M. Bowling, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T. Schauenberg, and D. Szafron. Game tree search with adaptation in stochastic imperfect information games. In Proceedings of the 4th International Conference on Computers and Games, pages 21 34, M. Bowling, N. Burch, M. Johanson, and O. Tammelin. Heads-up limit hold em poker is solved. preprint, Andrew Gilpin and Tuomas Sandholm. Better automated abstraction techniques for imperfect information games, with application to texas hold em poker. In 6th international joint conference on Autonomous agents and multiagent systems, Yuri Ivanov, Bruce Blumberg, and Alex Pentland. Em for perceptual coding and reinforcement learning tasks. preprint, Luis Filipe Teofilo, Luis Paulo Reis, and Henrique Lopes Cardoso. Estimating the odds for texas hold em poker agents. In 2013 IEEE/WIC/ACM International Joint Conferences,

CS221 Final Project Report Learn to Play Texas hold em

CS221 Final Project Report Learn to Play Texas hold em CS221 Final Project Report Learn to Play Texas hold em Yixin Tang(yixint), Ruoyu Wang(rwang28), Chang Yue(changyue) 1 Introduction Texas hold em, one of the most popular poker games in casinos, is a variation

More information

Heads-up Limit Texas Hold em Poker Agent

Heads-up Limit Texas Hold em Poker Agent Heads-up Limit Texas Hold em Poker Agent Nattapoom Asavareongchai and Pin Pin Tea-mangkornpan CS221 Final Project Report Abstract Our project aims to create an agent that is able to play heads-up limit

More information

Speeding-Up Poker Game Abstraction Computation: Average Rank Strength

Speeding-Up Poker Game Abstraction Computation: Average Rank Strength Computer Poker and Imperfect Information: Papers from the AAAI 2013 Workshop Speeding-Up Poker Game Abstraction Computation: Average Rank Strength Luís Filipe Teófilo, Luís Paulo Reis, Henrique Lopes Cardoso

More information

Optimal Rhode Island Hold em Poker

Optimal Rhode Island Hold em Poker Optimal Rhode Island Hold em Poker Andrew Gilpin and Tuomas Sandholm Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {gilpin,sandholm}@cs.cmu.edu Abstract Rhode Island Hold

More information

Using Fictitious Play to Find Pseudo-Optimal Solutions for Full-Scale Poker

Using Fictitious Play to Find Pseudo-Optimal Solutions for Full-Scale Poker Using Fictitious Play to Find Pseudo-Optimal Solutions for Full-Scale Poker William Dudziak Department of Computer Science, University of Akron Akron, Ohio 44325-4003 Abstract A pseudo-optimal solution

More information

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Nikolai Yakovenko NVidia ADLR Group -- Santa Clara CA Columbia University Deep Learning Seminar April 2017 Poker is a Turn-Based

More information

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu DeepStack: Expert-Level AI in Heads-Up No-Limit Poker Surya Prakash Chembrolu AI and Games AlphaGo Go Watson Jeopardy! DeepBlue -Chess Chinook -Checkers TD-Gammon -Backgammon Perfect Information Games

More information

Automatic Public State Space Abstraction in Imperfect Information Games

Automatic Public State Space Abstraction in Imperfect Information Games Computer Poker and Imperfect Information: Papers from the 2015 AAAI Workshop Automatic Public State Space Abstraction in Imperfect Information Games Martin Schmid, Matej Moravcik, Milan Hladik Charles

More information

Exploitability and Game Theory Optimal Play in Poker

Exploitability and Game Theory Optimal Play in Poker Boletín de Matemáticas 0(0) 1 11 (2018) 1 Exploitability and Game Theory Optimal Play in Poker Jen (Jingyu) Li 1,a Abstract. When first learning to play poker, players are told to avoid betting outside

More information

Texas Hold em Inference Bot Proposal. By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005

Texas Hold em Inference Bot Proposal. By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005 Texas Hold em Inference Bot Proposal By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005 1 Introduction One of the key goals in Artificial Intelligence is to create cognitive systems that

More information

Using Sliding Windows to Generate Action Abstractions in Extensive-Form Games

Using Sliding Windows to Generate Action Abstractions in Extensive-Form Games Using Sliding Windows to Generate Action Abstractions in Extensive-Form Games John Hawkin and Robert C. Holte and Duane Szafron {hawkin, holte}@cs.ualberta.ca, dszafron@ualberta.ca Department of Computing

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

Reflections on the First Man vs. Machine No-Limit Texas Hold 'em Competition

Reflections on the First Man vs. Machine No-Limit Texas Hold 'em Competition Reflections on the First Man vs. Machine No-Limit Texas Hold 'em Competition Sam Ganzfried Assistant Professor, Computer Science, Florida International University, Miami FL PhD, Computer Science Department,

More information

Probabilistic State Translation in Extensive Games with Large Action Sets

Probabilistic State Translation in Extensive Games with Large Action Sets Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Probabilistic State Translation in Extensive Games with Large Action Sets David Schnizlein Michael Bowling

More information

Strategy Grafting in Extensive Games

Strategy Grafting in Extensive Games Strategy Grafting in Extensive Games Kevin Waugh waugh@cs.cmu.edu Department of Computer Science Carnegie Mellon University Nolan Bard, Michael Bowling {nolan,bowling}@cs.ualberta.ca Department of Computing

More information

Fictitious Play applied on a simplified poker game

Fictitious Play applied on a simplified poker game Fictitious Play applied on a simplified poker game Ioannis Papadopoulos June 26, 2015 Abstract This paper investigates the application of fictitious play on a simplified 2-player poker game with the goal

More information

Analysis For Hold'em 3 Bonus April 9, 2014

Analysis For Hold'em 3 Bonus April 9, 2014 Analysis For Hold'em 3 Bonus April 9, 2014 Prepared For John Feola New Vision Gaming 5 Samuel Phelps Way North Reading, MA 01864 Office: 978 664-1515 Fax: 978-664 - 5117 www.newvisiongaming.com Prepared

More information

Regret Minimization in Games with Incomplete Information

Regret Minimization in Games with Incomplete Information Regret Minimization in Games with Incomplete Information Martin Zinkevich maz@cs.ualberta.ca Michael Bowling Computing Science Department University of Alberta Edmonton, AB Canada T6G2E8 bowling@cs.ualberta.ca

More information

An Adaptive Intelligence For Heads-Up No-Limit Texas Hold em

An Adaptive Intelligence For Heads-Up No-Limit Texas Hold em An Adaptive Intelligence For Heads-Up No-Limit Texas Hold em Etan Green December 13, 013 Skill in poker requires aptitude at a single task: placing an optimal bet conditional on the game state and the

More information

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment BLUFF WITH AI CS297 Report Presented to Dr. Chris Pollett Department of Computer Science San Jose State University In Partial Fulfillment Of the Requirements for the Class CS 297 By Tina Philip May 2017

More information

A Heuristic Based Approach for a Betting Strategy. in Texas Hold em Poker

A Heuristic Based Approach for a Betting Strategy. in Texas Hold em Poker DEPARTMENT OF COMPUTER SCIENCE SERIES OF PUBLICATIONS C REPORT C-2008-41 A Heuristic Based Approach for a Betting Strategy in Texas Hold em Poker Teemu Saukonoja and Tomi A. Pasanen UNIVERSITY OF HELSINKI

More information

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Programming an Othello AI Michael An (man4), Evan Liang (liange) Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black

More information

Learning Strategies for Opponent Modeling in Poker

Learning Strategies for Opponent Modeling in Poker Computer Poker and Imperfect Information: Papers from the AAAI 2013 Workshop Learning Strategies for Opponent Modeling in Poker Ömer Ekmekci Department of Computer Engineering Middle East Technical University

More information

CASPER: a Case-Based Poker-Bot

CASPER: a Case-Based Poker-Bot CASPER: a Case-Based Poker-Bot Ian Watson and Jonathan Rubin Department of Computer Science University of Auckland, New Zealand ian@cs.auckland.ac.nz Abstract. This paper investigates the use of the case-based

More information

Fall 2017 March 13, Written Homework 4

Fall 2017 March 13, Written Homework 4 CS1800 Discrete Structures Profs. Aslam, Gold, & Pavlu Fall 017 March 13, 017 Assigned: Fri Oct 7 017 Due: Wed Nov 8 017 Instructions: Written Homework 4 The assignment has to be uploaded to blackboard

More information

Poker Hand Rankings Highest to Lowest A Poker Hand s Rank determines the winner of the pot!

Poker Hand Rankings Highest to Lowest A Poker Hand s Rank determines the winner of the pot! POKER GAMING GUIDE Poker Hand Rankings Highest to Lowest A Poker Hand s Rank determines the winner of the pot! ROYAL FLUSH Ace, King, Queen, Jack, and 10 of the same suit. STRAIGHT FLUSH Five cards of

More information

Texas hold em Poker AI implementation:

Texas hold em Poker AI implementation: Texas hold em Poker AI implementation: Ander Guerrero Digipen Institute of technology Europe-Bilbao Virgen del Puerto 34, Edificio A 48508 Zierbena, Bizkaia ander.guerrero@digipen.edu This article describes

More information

arxiv: v1 [cs.gt] 23 May 2018

arxiv: v1 [cs.gt] 23 May 2018 On self-play computation of equilibrium in poker Mikhail Goykhman Racah Institute of Physics, Hebrew University of Jerusalem, Jerusalem, 91904, Israel E-mail: michael.goykhman@mail.huji.ac.il arxiv:1805.09282v1

More information

POKER AGENTS LD Miller & Adam Eck April 14 & 19, 2011

POKER AGENTS LD Miller & Adam Eck April 14 & 19, 2011 POKER AGENTS LD Miller & Adam Eck April 14 & 19, 2011 Motivation Classic environment properties of MAS Stochastic behavior (agents and environment) Incomplete information Uncertainty Application Examples

More information

Texas Hold em Poker Basic Rules & Strategy

Texas Hold em Poker Basic Rules & Strategy Texas Hold em Poker Basic Rules & Strategy www.queensix.com.au Introduction No previous poker experience or knowledge is necessary to attend and enjoy a QueenSix poker event. However, if you are new to

More information

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.

More information

Opponent Modelling by Expectation-Maximisation and Sequence Prediction in Simplified Poker

Opponent Modelling by Expectation-Maximisation and Sequence Prediction in Simplified Poker IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 1 Opponent Modelling by Expectation-Maximisation and Sequence Prediction in Simplified Poker Richard Mealing and Jonathan L. Shapiro Abstract

More information

What now? What earth-shattering truth are you about to utter? Sophocles

What now? What earth-shattering truth are you about to utter? Sophocles Chapter 4 Game Sessions What now? What earth-shattering truth are you about to utter? Sophocles Here are complete hand histories and commentary from three heads-up matches and a couple of six-handed sessions.

More information

How to Get my ebook for FREE

How to Get my ebook for FREE Note from Jonathan Little: Below you will find the first 5 hands from a new ebook I m working on which will contain 50 detailed hands from my 2014 WSOP Main Event. 2014 was my first year cashing in the

More information

Endgame Solving in Large Imperfect-Information Games

Endgame Solving in Large Imperfect-Information Games Endgame Solving in Large Imperfect-Information Games Sam Ganzfried and Tuomas Sandholm Computer Science Department Carnegie Mellon University {sganzfri, sandholm}@cs.cmu.edu ABSTRACT The leading approach

More information

Endgame Solving in Large Imperfect-Information Games

Endgame Solving in Large Imperfect-Information Games Endgame Solving in Large Imperfect-Information Games Sam Ganzfried and Tuomas Sandholm Computer Science Department Carnegie Mellon University {sganzfri, sandholm}@cs.cmu.edu Abstract The leading approach

More information

Models of Strategic Deficiency and Poker

Models of Strategic Deficiency and Poker Models of Strategic Deficiency and Poker Gabe Chaddock, Marc Pickett, Tom Armstrong, and Tim Oates University of Maryland, Baltimore County (UMBC) Computer Science and Electrical Engineering Department

More information

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault CS221 Project Final Report Deep Q-Learning on Arcade Game Assault Fabian Chan (fabianc), Xueyuan Mei (xmei9), You Guan (you17) Joint-project with CS229 1 Introduction Atari 2600 Assault is a game environment

More information

A Competitive Texas Hold em Poker Player Via Automated Abstraction and Real-time Equilibrium Computation

A Competitive Texas Hold em Poker Player Via Automated Abstraction and Real-time Equilibrium Computation A Competitive Texas Hold em Poker Player Via Automated Abstraction and Real-time Equilibrium Computation Andrew Gilpin and Tuomas Sandholm Computer Science Department Carnegie Mellon University {gilpin,sandholm}@cs.cmu.edu

More information

Safe and Nested Endgame Solving for Imperfect-Information Games

Safe and Nested Endgame Solving for Imperfect-Information Games Safe and Nested Endgame Solving for Imperfect-Information Games Noam Brown Computer Science Department Carnegie Mellon University noamb@cs.cmu.edu Tuomas Sandholm Computer Science Department Carnegie Mellon

More information

An Exploitative Monte-Carlo Poker Agent

An Exploitative Monte-Carlo Poker Agent An Exploitative Monte-Carlo Poker Agent Technical Report TUD KE 2009-2 Immanuel Schweizer, Kamill Panitzek, Sang-Hyeun Park, Johannes Fürnkranz Knowledge Engineering Group, Technische Universität Darmstadt

More information

Solution to Heads-Up Limit Hold Em Poker

Solution to Heads-Up Limit Hold Em Poker Solution to Heads-Up Limit Hold Em Poker A.J. Bates Antonio Vargas Math 287 Boise State University April 9, 2015 A.J. Bates, Antonio Vargas (Boise State University) Solution to Heads-Up Limit Hold Em Poker

More information

A Practical Use of Imperfect Recall

A Practical Use of Imperfect Recall A ractical Use of Imperfect Recall Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein and Michael Bowling {waugh, johanson, mkan, schnizle, bowling}@cs.ualberta.ca maz@yahoo-inc.com

More information

Computing Robust Counter-Strategies

Computing Robust Counter-Strategies Computing Robust Counter-Strategies Michael Johanson johanson@cs.ualberta.ca Martin Zinkevich maz@cs.ualberta.ca Michael Bowling Computing Science Department University of Alberta Edmonton, AB Canada T6G2E8

More information

Data Biased Robust Counter Strategies

Data Biased Robust Counter Strategies Data Biased Robust Counter Strategies Michael Johanson johanson@cs.ualberta.ca Department of Computing Science University of Alberta Edmonton, Alberta, Canada Michael Bowling bowling@cs.ualberta.ca Department

More information

Evaluating State-Space Abstractions in Extensive-Form Games

Evaluating State-Space Abstractions in Extensive-Form Games Evaluating State-Space Abstractions in Extensive-Form Games Michael Johanson and Neil Burch and Richard Valenzano and Michael Bowling University of Alberta Edmonton, Alberta {johanson,nburch,valenzan,mbowling}@ualberta.ca

More information

Improving Performance in Imperfect-Information Games with Large State and Action Spaces by Solving Endgames

Improving Performance in Imperfect-Information Games with Large State and Action Spaces by Solving Endgames Improving Performance in Imperfect-Information Games with Large State and Action Spaces by Solving Endgames Sam Ganzfried and Tuomas Sandholm Computer Science Department Carnegie Mellon University {sganzfri,

More information

Chapter 6. Doing the Maths. Premises and Assumptions

Chapter 6. Doing the Maths. Premises and Assumptions Chapter 6 Doing the Maths Premises and Assumptions In my experience maths is a subject that invokes strong passions in people. A great many people love maths and find it intriguing and a great many people

More information

Failures of Intuition: Building a Solid Poker Foundation through Combinatorics

Failures of Intuition: Building a Solid Poker Foundation through Combinatorics Failures of Intuition: Building a Solid Poker Foundation through Combinatorics by Brian Space Two Plus Two Magazine, Vol. 14, No. 8 To evaluate poker situations, the mathematics that underpin the dynamics

More information

Expectation and Thin Value in No-limit Hold em: Profit comes with Variance by Brian Space, Ph.D

Expectation and Thin Value in No-limit Hold em: Profit comes with Variance by Brian Space, Ph.D Expectation and Thin Value in No-limit Hold em: Profit comes with Variance by Brian Space, Ph.D People get confused in a number of ways about betting thinly for value in NLHE cash games. It is simplest

More information

Ultimate Texas Hold em features head-to-head play against the player/dealer and optional bonus bets.

Ultimate Texas Hold em features head-to-head play against the player/dealer and optional bonus bets. *Ultimate Texas Hold em is owned, patented and/or copyrighted by ShuffleMaster Inc. Please submit your agreement with Owner authorizing play of Game in your gambling establishment together with any request

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( ) COMP3211 Project Artificial Intelligence for Tron game Group 7 Chiu Ka Wa (20369737) Chun Wai Wong (20265022) Ku Chun Kit (20123470) Abstract Tron is an old and popular game based on a movie of the same

More information

Player Profiling in Texas Holdem

Player Profiling in Texas Holdem Player Profiling in Texas Holdem Karl S. Brandt CMPS 24, Spring 24 kbrandt@cs.ucsc.edu 1 Introduction Poker is a challenging game to play by computer. Unlike many games that have traditionally caught the

More information

Reflections on the First Man vs. Machine No-Limit Texas Hold em Competition

Reflections on the First Man vs. Machine No-Limit Texas Hold em Competition Reflections on the First Man vs. Machine No-Limit Texas Hold em Competition SAM GANZFRIED The first ever human vs. computer no-limit Texas hold em competition took place from April 24 May 8, 2015 at River

More information

Imperfect Information. Lecture 10: Imperfect Information. What is the size of a game with ii? Example Tree

Imperfect Information. Lecture 10: Imperfect Information. What is the size of a game with ii? Example Tree Imperfect Information Lecture 0: Imperfect Information AI For Traditional Games Prof. Nathan Sturtevant Winter 20 So far, all games we ve developed solutions for have perfect information No hidden information

More information

Massachusetts Institute of Technology. Poxpert+, the intelligent poker player v0.91

Massachusetts Institute of Technology. Poxpert+, the intelligent poker player v0.91 Massachusetts Institute of Technology Poxpert+, the intelligent poker player v0.91 Meshkat Farrokhzadi 6.871 Final Project 12-May-2005 Joker s the name, Poker s the game. Chris de Burgh Spanish train Introduction

More information

MITOCW watch?v=mnbqjpejzt4

MITOCW watch?v=mnbqjpejzt4 MITOCW watch?v=mnbqjpejzt4 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To

More information

Opponent Modeling in Texas Hold em

Opponent Modeling in Texas Hold em Opponent Modeling in Texas Hold em Nadia Boudewijn, student number 3700607, Bachelor thesis Artificial Intelligence 7.5 ECTS, Utrecht University, January 2014, supervisor: dr. G. A. W. Vreeswijk ABSTRACT

More information

Learning in 3-Player Kuhn Poker

Learning in 3-Player Kuhn Poker University of Manchester Learning in 3-Player Kuhn Poker Author: Yifei Wang 3rd Year Project Final Report Supervisor: Dr. Jonathan Shapiro April 25, 2015 Abstract This report contains how an ɛ-nash Equilibrium

More information

Optimal Yahtzee performance in multi-player games

Optimal Yahtzee performance in multi-player games Optimal Yahtzee performance in multi-player games Andreas Serra aserra@kth.se Kai Widell Niigata kaiwn@kth.se April 12, 2013 Abstract Yahtzee is a game with a moderately large search space, dependent on

More information

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

Etiquette. Understanding. Poker. Terminology. Facts. Playing DO S & DON TS TELLS VARIANTS PLAYER TERMS HAND TERMS ADVANCED TERMS AND INFO

Etiquette. Understanding. Poker. Terminology. Facts. Playing DO S & DON TS TELLS VARIANTS PLAYER TERMS HAND TERMS ADVANCED TERMS AND INFO TABLE OF CONTENTS Etiquette DO S & DON TS Understanding TELLS Page 4 Page 5 Poker VARIANTS Page 9 Terminology PLAYER TERMS HAND TERMS ADVANCED TERMS Facts AND INFO Page 13 Page 19 Page 21 Playing CERTAIN

More information

Texas Hold em Poker Rules

Texas Hold em Poker Rules Texas Hold em Poker Rules This is a short guide for beginners on playing the popular poker variant No Limit Texas Hold em. We will look at the following: 1. The betting options 2. The positions 3. The

More information

CS188 Spring 2014 Section 3: Games

CS188 Spring 2014 Section 3: Games CS188 Spring 2014 Section 3: Games 1 Nearly Zero Sum Games The standard Minimax algorithm calculates worst-case values in a zero-sum two player game, i.e. a game in which for all terminal states s, the

More information

Biased Opponent Pockets

Biased Opponent Pockets Biased Opponent Pockets A very important feature in Poker Drill Master is the ability to bias the value of starting opponent pockets. A subtle, but mostly ignored, problem with computing hand equity against

More information

arxiv: v2 [cs.gt] 8 Jan 2017

arxiv: v2 [cs.gt] 8 Jan 2017 Eqilibrium Approximation Quality of Current No-Limit Poker Bots Viliam Lisý a,b a Artificial intelligence Center Department of Computer Science, FEL Czech Technical University in Prague viliam.lisy@agents.fel.cvut.cz

More information

Learning to Play Love Letter with Deep Reinforcement Learning

Learning to Play Love Letter with Deep Reinforcement Learning Learning to Play Love Letter with Deep Reinforcement Learning Madeleine D. Dawson* MIT mdd@mit.edu Robert X. Liang* MIT xbliang@mit.edu Alexander M. Turner* MIT turneram@mit.edu Abstract Recent advancements

More information

Case-Based Strategies in Computer Poker

Case-Based Strategies in Computer Poker 1 Case-Based Strategies in Computer Poker Jonathan Rubin a and Ian Watson a a Department of Computer Science. University of Auckland Game AI Group E-mail: jrubin01@gmail.com, E-mail: ian@cs.auckland.ac.nz

More information

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage Richard Kelly and David Churchill Computer Science Faculty of Science Memorial University {richard.kelly, dchurchill}@mun.ca

More information

Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models

Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models Naoki Mizukami 1 and Yoshimasa Tsuruoka 1 1 The University of Tokyo 1 Introduction Imperfect information games are

More information

After receiving his initial two cards, the player has four standard options: he can "Hit," "Stand," "Double Down," or "Split a pair.

After receiving his initial two cards, the player has four standard options: he can Hit, Stand, Double Down, or Split a pair. Black Jack Game Starting Every player has to play independently against the dealer. The round starts by receiving two cards from the dealer. You have to evaluate your hand and place a bet in the betting

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

On Range of Skill. Thomas Dueholm Hansen and Peter Bro Miltersen and Troels Bjerre Sørensen Department of Computer Science University of Aarhus

On Range of Skill. Thomas Dueholm Hansen and Peter Bro Miltersen and Troels Bjerre Sørensen Department of Computer Science University of Aarhus On Range of Skill Thomas Dueholm Hansen and Peter Bro Miltersen and Troels Bjerre Sørensen Department of Computer Science University of Aarhus Abstract At AAAI 07, Zinkevich, Bowling and Burch introduced

More information

A Brief Introduction to Game Theory

A Brief Introduction to Game Theory A Brief Introduction to Game Theory Jesse Crawford Department of Mathematics Tarleton State University November 20, 2014 (Tarleton State University) Brief Intro to Game Theory November 20, 2014 1 / 36

More information

Poker Rules Friday Night Poker Club

Poker Rules Friday Night Poker Club Poker Rules Friday Night Poker Club Last edited: 2 April 2004 General Rules... 2 Basic Terms... 2 Basic Game Mechanics... 2 Order of Hands... 3 The Three Basic Games... 4 Five Card Draw... 4 Seven Card

More information

Learning from Hints: AI for Playing Threes

Learning from Hints: AI for Playing Threes Learning from Hints: AI for Playing Threes Hao Sheng (haosheng), Chen Guo (cguo2) December 17, 2016 1 Introduction The highly addictive stochastic puzzle game Threes by Sirvo LLC. is Apple Game of the

More information

Strategy Evaluation in Extensive Games with Importance Sampling

Strategy Evaluation in Extensive Games with Importance Sampling Michael Bowling BOWLING@CS.UALBERTA.CA Michael Johanson JOHANSON@CS.UALBERTA.CA Neil Burch BURCH@CS.UALBERTA.CA Duane Szafron DUANE@CS.UALBERTA.CA Department of Computing Science, University of Alberta,

More information

Chapter 3 Learning in Two-Player Matrix Games

Chapter 3 Learning in Two-Player Matrix Games Chapter 3 Learning in Two-Player Matrix Games 3.1 Matrix Games In this chapter, we will examine the two-player stage game or the matrix game problem. Now, we have two players each learning how to play

More information

An Empirical Evaluation of Policy Rollout for Clue

An Empirical Evaluation of Policy Rollout for Clue An Empirical Evaluation of Policy Rollout for Clue Eric Marshall Oregon State University M.S. Final Project marshaer@oregonstate.edu Adviser: Professor Alan Fern Abstract We model the popular board game

More information

A. Rules of blackjack, representations, and playing blackjack

A. Rules of blackjack, representations, and playing blackjack CSCI 4150 Introduction to Artificial Intelligence, Fall 2005 Assignment 7 (140 points), out Monday November 21, due Thursday December 8 Learning to play blackjack In this assignment, you will implement

More information

A Rule-Based Learning Poker Player

A Rule-Based Learning Poker Player CSCI 4150 Introduction to Artificial Intelligence, Fall 2000 Assignment 6 (135 points), out Tuesday October 31; see document for due dates A Rule-Based Learning Poker Player For this assignment, teams

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1

CS 188 Fall Introduction to Artificial Intelligence Midterm 1 CS 188 Fall 2018 Introduction to Artificial Intelligence Midterm 1 You have 120 minutes. The time will be projected at the front of the room. You may not leave during the last 10 minutes of the exam. Do

More information

Accelerating Best Response Calculation in Large Extensive Games

Accelerating Best Response Calculation in Large Extensive Games Accelerating Best Response Calculation in Large Extensive Games Michael Johanson johanson@ualberta.ca Department of Computing Science University of Alberta Edmonton, Alberta, Canada Michael Bowling bowling@ualberta.ca

More information

Creating a Poker Playing Program Using Evolutionary Computation

Creating a Poker Playing Program Using Evolutionary Computation Creating a Poker Playing Program Using Evolutionary Computation Simon Olsen and Rob LeGrand, Ph.D. Abstract Artificial intelligence is a rapidly expanding technology. We are surrounded by technology that

More information

Game Theory and Randomized Algorithms

Game Theory and Randomized Algorithms Game Theory and Randomized Algorithms Guy Aridor Game theory is a set of tools that allow us to understand how decisionmakers interact with each other. It has practical applications in economics, international

More information

Using Counterfactual Regret Minimization to Create Competitive Multiplayer Poker Agents

Using Counterfactual Regret Minimization to Create Competitive Multiplayer Poker Agents Using Counterfactual Regret Minimization to Create Competitive Multiplayer Poker Agents Nick Abou Risk University of Alberta Department of Computing Science Edmonton, AB 780-492-5468 abourisk@cs.ualberta.ca

More information

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Scott Watson, Andrew Vardy, Wolfgang Banzhaf Department of Computer Science Memorial University of Newfoundland St John s.

More information

Poker AI: Algorithms for Creating Game-Theoretic Strategies for Large Incomplete-Information Games Tuomas Sandholm

Poker AI: Algorithms for Creating Game-Theoretic Strategies for Large Incomplete-Information Games Tuomas Sandholm Poker AI: Algorithms for Creating Game-Theoretic Strategies for Large Incomplete-Information Games Tuomas Sandholm Professor Carnegie Mellon University Computer Science Department Machine Learning Department

More information

An Introduction to Poker Opponent Modeling

An Introduction to Poker Opponent Modeling An Introduction to Poker Opponent Modeling Peter Chapman Brielin Brown University of Virginia 1 March 2011 It is not my aim to surprise or shock you-but the simplest way I can summarize is to say that

More information

Creating a New Angry Birds Competition Track

Creating a New Angry Birds Competition Track Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference Creating a New Angry Birds Competition Track Rohan Verma, Xiaoyu Ge, Jochen Renz Research School

More information

Refining Subgames in Large Imperfect Information Games

Refining Subgames in Large Imperfect Information Games Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Refining Subgames in Large Imperfect Information Games Matej Moravcik, Martin Schmid, Karel Ha, Milan Hladik Charles University

More information

SUPPOSE that we are planning to send a convoy through

SUPPOSE that we are planning to send a convoy through IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART B: CYBERNETICS, VOL. 40, NO. 3, JUNE 2010 623 The Environment Value of an Opponent Model Brett J. Borghetti Abstract We develop an upper bound for

More information

Strategy Purification

Strategy Purification Strategy Purification Sam Ganzfried, Tuomas Sandholm, and Kevin Waugh Computer Science Department Carnegie Mellon University {sganzfri, sandholm, waugh}@cs.cmu.edu Abstract There has been significant recent

More information

Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization

Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization Michael Johanson, Nolan Bard, Marc Lanctot, Richard Gibson, and Michael Bowling University of Alberta Edmonton,

More information

Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold em Agent

Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold em Agent Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold em Agent Noam Brown, Sam Ganzfried, and Tuomas Sandholm Computer Science

More information

Ar#ficial)Intelligence!!

Ar#ficial)Intelligence!! Introduc*on! Ar#ficial)Intelligence!! Roman Barták Department of Theoretical Computer Science and Mathematical Logic So far we assumed a single-agent environment, but what if there are more agents and

More information

Computer Poker Research at LIACC

Computer Poker Research at LIACC Computer Poker Research at LIACC Luís Filipe Teófilo, Luís Paulo Reis, Henrique Lopes Cardoso, Dinis Félix, Rui Sêca, João Ferreira, Pedro Mendes, Nuno Cruz, Vitor Pereira, Nuno Passos LIACC Artificial

More information

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Last name: First name: SID: Class account login: Collaborators: CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Due: Monday 2/28 at 5:29pm either in lecture or in 283 Soda Drop Box (no slip days).

More information

Intelligent Gaming Techniques for Poker: An Imperfect Information Game

Intelligent Gaming Techniques for Poker: An Imperfect Information Game Intelligent Gaming Techniques for Poker: An Imperfect Information Game Samisa Abeysinghe and Ajantha S. Atukorale University of Colombo School of Computing, 35, Reid Avenue, Colombo 07, Sri Lanka Tel:

More information