CS221 Project Final Report Learning to play bridge

Size: px
Start display at page:

Download "CS221 Project Final Report Learning to play bridge"

Transcription

1 CS221 Project Final Report Learning to play bridge Conrad Grobler (conradg) and Jean-Paul Schmetz (jschmetz) Autumn Introduction We investigated the use of machine learning in bridge playing. Bridge presents an interesting challenge as it combines adversarial game-play with the uncertainty about which cards are in which hands. It is different from many other games of chance, in that the randomness happens only once during the deal at the start of a hand. In other games like backgammon the randomness happens at every move. We investigated two different machine learning approaches and their use in bridge playing. The one is based on policy gradient reinforcement learning and tries to learn the rules of bridge from scratch by playing many random games. The second uses supervised learning where it learns from games played by oracle players. We present the results of these investigations and a survey of the literature regarding computer bridge playing and specifically the use of neural networks in doing so. 2 Bridge Game and Setup A bridge game consists of two distinct phases: an auction phase where players bid to define the contract and trump suit; and a playing phase where the players compete to win tricks. Each trick consists of 4 cards being played, one from each player s hand. A game is played between two teams (North/South and East/West). The players continue playing until all the cards from all the hands have been played. The winner of a trick plays first on the next trick. The other players follow clockwise. On the very first trick, the player to the left of the declarer plays the first card. The declarer is the winner of the auction. As soon as the first card is played, the cards held by the partner of the declarer is placed on the table - the so-called dummy hand. It is visible to all players. Play from the dummy hand is also controlled by the declarer. Declarer s partner does not play any further part in the rest of the game. The focus of our investigations was on the playing phase, but valuable information about the distributions of the different hands can be communicated during the bidding phase. We have implementing a simplified bidding simulation to extract some of this distributional information, but this does not form part of the competitive playing between agents. It is done as a separate preprocessing step on each board. The results are provided to the agents during the initialisation of each hand. Each player is represented as an agent. The agent is given the current state of the game. The state consists of the cards in each hand, the cards which have been played in the trick so far and who the next player is. The agent must then select a legal card to play from the cards in their hand, and in doing so try to win as many tricks as possible. 3 Infrastructure The training is handled differently for the two agents. For the RL agent we built a gym (compatible with OpenAI Gym [1]) where it can train over many hands against a different agent. The initial training is done against a random player to explore as many different 1

2 hands as possible. The supervised learning agent uses the oracle result as the correct training data. It either tries to learn to predict the optimal number of tricks that can be won by a specific action from a specific state or to directly learn the best action from a give state. The supervised training system follows the play of optimal oracle players to explore the game tree. At at a specific state it uses that state as the input and information from the oracle as the desired output. The trainer saves the trained models periodically to disk to be used by the agent during play. The playing capability of each agent type is evaluated by a controller that manages the game play. The controller takes as input how many cards to deal per hand (in case smaller hands are required for debugging), the number of boards to play and the agent type to use for each player. Our standard evaluation process is to play a team made up of the agents being tested against a team made up of oracle players. They play boards and the controller keeps track of the number of tricks each team wins. The average number of tricks won by the team of agents against the oracle team is used as the benchmark for performance. The controller performs the bidding process as a preprocessing step on every game. The bidding is implemented as a simplified versions of the basic convention used by Bridge Base Online agents[2]. This provides initial information on the distribution of the hands and is also used to determine the trump suit and the starting player. The agent is instantiated with some initial data on each board: distributional information from the bidding process and the trump suit. When it is the turn of a particular player, the controller passes the agent the current states of the game (cards remaining in player s hand, cards remaining in dummy, cards already played in this trick) and the agent returns the action to take, i.e. the next card to play. The controller uses this to calculate the next state. The agent is responsible for maintaining its own internal state. At the end of each trick the controller calls each agent and passes the 4 cards that were played during the trick so that the agent can track the playing history. The controller also renders a visualisation of each trick played when it is used in verbose mode: 3.1 Oracle and baseline The oracle is implemented as a double-dummy solver. The oracle is able to cheat by looking at all the cards. It can then compute the optimal play through search as there is no uncertainty left. We used a very efficient open source c++ implementation of a double-dummy solver [3]. We also implemented two different double-dummy solver using python for testing and debugging purposes, but these are only usable on small hands (6-7 tricks or fewer) due to the speed. The first is a modified minimax implementation with αβ-pruning. The second is based on a zero-window search approach suggested by Chang [4]. The baseline is implemented as a simple greedy agent (i.e. it will try to win the current trick as cheaply as possible), but we added a few additional rules that try to match some of the maxims taught to new bridge players (e.g. second hand low, third hand high). The baseline does not keep track of any internal state, apart from the trump suit, and looks only at the current trick. 3.2 Feature extraction The feature extractor used by the supervised learning agent takes the current game state (the cards left in each hand, the cards played so far in the trick, the index of the next player and the trump suit) to create a feature vector that can be used by the learning algorithms. The feature extractor represents each hand 2

3 or each card that was played by using 65 values. The first 52 features of a 65 feature batch represent the 52 cards in a deck. Because trumps are special, there is a separate 13 values for the trump suit. When playing no-trumps these values would be all 0. When a trump contract is played, the values of the trump suit are mapped in these special 13 values and the values corresponding to the original suit are all zero: 0-12: clubs if clubs not trumps, 13-25: diamonds if diamonds not trumps, 26-38: hearts if hearts not trumps, 39-51: spades if spades not trumps, 52-64: trumps. Cards are represented in order of increasing rank. The first 65 values represent the next hand to play. Each feature is 1 if the card is in the hand and 0 if not. For the next three batches of 65 values represent the other hands in playing order. If it is a visible hand (current player s hand or dummy) 1 indicates that a card is in the hand and 0 that it is not. For a non-visible hand the value of each feature is the probability that the cards is in a specific hand. At the moment the feature extractor assumes a uniform distribution. The next three batches of 65 features represent the cards on the table (played so far in the trick). This follows the same convention for trumps as above. If card was played, it is represented as a 1. All other values are 0. We also experimented with including the full history of play in the feature vector, with up to 48 additional batches of 65 features each. Each group of 4 represented the cards played in a completed trick to provide the full history of the board so far. This did not seem to improve the performance of the agent and quadrupled the size of the trained models, so we are not currently including this information in the feature vector. We also implemented optional hand normalisation to reduce the required training time. This takes advantage of the equivalence of many different hands, especially when only a few tricks remain. The winner of a trick in a specific suit only depends on the relative ranks of the cards, not the absolute ranks. Consider the example when each player only has one card left (let s say hearts) with west leading. The following play sequences are conceptually equivalent: K Q J 10; ; A and The hand normalisation removes all cards that have been played from the state and reduces the ranks of cards left in the hand to replace these. In all of the examples above it would have resulted in the last sequence of This makes different states appear more similar to the training algorithm and we managed to reduce the required number of training examples to achieve the same performance by a factor of approximately 5. We also implemented an option that allows the supervised learning algorithm to cheat by looking at the opponents hands to support additional analysis. This provides some insight into which errors are based on the uncertainty and which are based on the general approach. The policy gradient agent was tried with 3 different sets of features. The first is exactly the same as the supervised learning agent. The second is a simplified version in which the game state is represented by 5 rows of 52 cards (collapsed into a 260 value vector). The first row represents the card held by the player and the other rows representing the state of the dummy hand, table, played cards and suits (incl. trumps). We also implemented a simpler game ( minibridge ) which essentially is a 2-player bridge. This was used to test the policy gradient learning agent as it only required a few hundred thousand games to train effectively. The third version of our feature extractor represented the state of minibridge in 5 rows of 52 cards. 4 Approach 4.1 Policy Gradient Agent The policy gradient agent tries to learn the appropriate policy for playing bridge from scratch by playing many random hands against other agents. The initial training was done against a random player to allow it to see as many different hands and states as possible. The agent receives a large negative reward if it tries to perform an illegal action. If it performs a legal action that does not lead to a trick being won it receives a small positive reward. If the legal action 3

4 leads to winning a trick, it receives a larger positive reward. The policy gradient agent is implemented based on the explanation of deep reinforcement learning by Andrej Karpathy [5]. Unlike Pong, the difficulty is to learn the probability of playing one of 52 cards instead of a binary decision (UP/DOWN). We decided to learn a policy for each card. The learner plays one of 52 cards randomly based on their normalized probabilities and collects the rewards until the game is finished (an episode is a 13 tricks game). It then discounts the rewards and back-propagate the episode to the neural networks of the cards that were played in the episode (back-propagating to all cards proved to be harmful). Doing so ensures that the learner explores a large set of possibilities while still learning at a reasonable albeit slow rate. It does also mean that the learning requires more than a million games before the player consistently learns to play by the rules. The agent is a collection of 52 neural networks each with one hidden layer (of relatively small size. We tried between 5 and 50 values) with ReLU nonlinearity and the single value output squeezed between zero and 1 by a sigmoid function. In order to test the learning agent, we created a simplified version of bridge ( minibridge ) in which 2 players both receive 13 cards chosen randomly from a deck of 52 and basically play bridge i.e. the first player (chosen randomly) plays a card and the other player needs to follow suit (if possible). The winner is the player with the largest card in the suit declared. The winner starts the next trick. This game is close enough to bridge to test whether the policy learning agent would be able to learn a more complicated game. The gym that exercises the player is patient and allows the player to try many cards until a correct one is played - each illegal cards tried however receives a relatively large negative reward. Stopping the round at every illegal move proved to be inefficient. Once the learner proved to be able to play minibridge (see results below), we deployed a large numbers of these learners on the more complete sets of features representing the real game of bridge. The learning time was expected to be much larger than for minibridge. Except for the structure of the features (which to a human is a representation of multiple rows of cards) and the structure of 52 cards, the learning agent has no concept of bridge. It is just trying to learn the optimal decision out of 52 possible actions based on a sparse vector of observation. This is important as we really wanted the learner to learn from scratch. In fact it seems that it could learn any card game which involves putting a card on the table at every turn. 4.2 Supervised Learning Agent The main supervised learning algorithm uses a multilayer perceptron regressor (specifically sklearn.neural network.mlpregressor [6], [7]) to learn to predict the number of tricks that can be won from a specific state. Specifically, it uses 3 hidden layers with 400, 300 and 200 nodes. These numbers seemed to provide the most robust performance over a number of training attempts. It uses a sigmoid gain function for the hidden layers. For the output layer it uses a linear model. Conceptually it tries to use the hidden layers to learn useful features and then feed these into the final layer to do linear regression. One thing we noticed early on is that the learning was struggling to cope with the different roles related to the position in a trick. The approach when leading the first card into a new trick is very different to the approach when playing the last card in the trick. To help with this we created 4 neural networks for each agent. The training and playing would use the neural network corresponding to the number of cards already played in the trick. This caused a significant boost in the performance of the agent. The training is done on many random hands. For a specific hand it randomly explores the search tree based on the play of the oracle players. From a specific state it uses the double-dummy solver (oracle) to get the value of actual tricks that could be won from optimal play for each of the possible next states. it aggregates the training data in batches of When a batch of training samples are ready it performs additional training on the model with this batch. 4

5 During play, the agent uses the learnt model and estimates the number of of tricks for each of the possible next states that result from the possible actions at that point. It then chooses the action with the highest score. Another approach we tried was to try to learn the best action directly, rather than deducing it from a predicted number of tricks. For this we used a multilayer perceptron classifier (specifically sklearn.neural network.mlpclassifier [6], [7]). The activation and the number and sizes of the hidden layers match those used in the MLPRegressor implementation The output was and array of 52 possible values, each one corresponding to a card. The generation of training data followed the same process as above, except that it used the optimal cards calculated by the oracle as the training data, rather than the predicted number of tricks. During play the agent plays the card from the list of legal cards that had the highest probability predicted by the classifier. We also implemented a hybrid agent option. This allows the supervised learning agent to switch over to using search to find the optimal card to play once the hand becomes tractable (in the experiments we did this when there were 6 or fewer tricks left). The search process enumerates all possible states that could exist based on the unknown hands. The agent keeps track of the minimum and maximum possible cards in each suit in each hand as the play unfolds. It then evaluate each of these possible states against this information. If it satisfies the constraints it uses the oracle to determine the optimal cards to play from that state. It then plays the card that was returned as the optimal card most ofter over these test states. 5 Results 5.1 Reinforcement learning The minibridge agent is able to learn a policy that allows it to score around 200 points against a random opponent. (playing 1000 games in a row). This is better than the expected 180 average. This means that the agent learned not only to play correctly but also to be good at winning the game. Both learner and player agents (with a learned policy) can be found in a Codalab worksheet. [8]. The training usually requires about 1 million episode (an episode is defined as a 13 tricks round played to the end including all the illegal moves tried by the learner). The real bridge agent takes a lot more episodes to learn to play (also, playing one episode takes a longer time). The learning agent still has a running average (over 100 episodes) of about per episode after half a million episode. This means essentially that the agent is still trying roughly 150 illegal moves per episode (this is not a problem as each illegal move contributes to decrease the probability of that move by a tiny fraction every time). This compares to minibridge, where the agent reaches average running rewards relatively quickly and reaches a plateau of about +160 (it is always exploring and therefore collecting negative rewards while learning) before often diverging again (see Error Analysis below). After 5 days of learning and playing almost 2,000,000 episodes, the policy learned is able to reach a running reward of slightly negative (in the range of -100 to -500) while still learning (i.e. still collecting negative points). The player (which always plays the most probable actions in the learned policy) wins consistently against the random player (by nearly the same margin as the baseline player) and makes few mistakes (defined as playing an illegal card). Both learner and player for a more fully featured bridge can be found in a Codalab worksheet. [9]. We were unable to fully train an agent using the full feature set (which would allow the player to use the knowledge of the order of all the moves in the episode). It does seem possible however but would require a much longer training time (measured in weeks most likely). 5.2 Supervised Learning Agent Below is a summary of the average results from running the supervised training experiment. More details, and two example runs of the experiments can be found in our CodaLab worksheet [10]. 5

6 The results were produced by running the controller over games each with 13 tricks. In all cases the agent was used for playing North and South and oracle players used for East and West. Agent Average won Random 3.8 Baseline 4.5 Normalised 4.0 Direct 3.8 Cheating 4.3 Hybrid 4.7 Oracle 6.5 The results show that the baseline performs surprisingly well. The default agent using a normalised hand performed better than random but did not match the baseline. The cheating agent (removing uncertainty) performed better than the default agent, but not by as big a margin as we expected. The best performing agent (apart from the oracle) was the hybrid agent that switched over to search once the hand became tractable. As part of testing we found a curious phenomenon. The training process was surprisingly volatile. When we trained a model with exactly the same parameters, including the same number of training example, the performance of the agent sometimes fluctuated wildly. The only difference between each run was the random hands that were generated, but the number of hands, number of training examples and all other parameters were the same. We initially thought is was randomness in the evaluation process, but the differences remained over different assessments of the same trained model. For example, different training runs of the default normalised agent produced results usually ranging from 3.9 to 4.2. Most of the tests results were close to 4.1 and a few were worse. One striking example performed extremely well with a result of 4.8. At the time we believed it was due to hyperparameter tuning, but repeated further attempts with the same parameters never reproduced that result. It seems that the training is very sensitive to the details of the training samples. Another interesting observation was that the performance seemed to vary with training samples in a cyclical manner. As more training samples were added the performance would increase up to a point, and then start decreasing and at some later point increasing again. This is likely due to some hands that look very similar but with very different results confusing the learning process. Due to the random nature of the training data the process is bound to encounter this at some point. As even more samples are added, it probably drowns out the problems caused by the conflicting information until other new conflicting information is found. 6 Error Analysis 6.1 Reinforcement Learning On minibridge, it is clear that the player has really learned the game (i.e. the learner makes very few mistakes). It still has moments where it plays an illegal card but they are relatively rare. The hidden feature vector looks like an indicator of playability of the card. The puzzling thing is that the probability of playing is always extremely low regardless of whether the card is legal or illegal. It turns out that the very low probability of playing a legal card is higher than the very low probability of playing an illegal card. In some observations, we saw the comparison being done on vs This of course is a problem as the training will eventually underflow even for the right action. On the more realistic game of bridge, the learner plays more illegal moves, but consistently beats a random player. More details and a few sample runs can be found on the codalab page[9]. 6.2 Supervised Learning For the supervised learning agent we compared the choices of the agent with the oracle in the same circumstances over a number of hands to understand where the agent diverged from the optimal play. For the default supervised agent we looked at the predicted number of tricks, the actual optimal tricks and how this impacted the card which was chosen. In general it was quite good at predicting the number 6

7 of tricks that could be won from a specific state. If the purpose was purely to train a neural network to predict the number of tricks that could be made by a double-dummy solver, the results would have been pretty good. The problem is that it did not match the prediction at crucial parts in the play. The most significant problem we found was that the agent never seemed to learn the third hand plays high maxim. Whenever the agent played the third hand in a trick it almost always played low. It was trying to protect its high cards for the future, but that allowed the player in the fourth hand to win cheap tricks. The whole point behind the third hand high maxim is to force high cards from the opponents to make the cards in the partner s hand good. In spite of training a separate network for each playing position and running many attempts, it never learnt a good strategy for the third hand. The playing results look much more in line with expectations for the other hands. Another more subtle problem was that the network was not good at differentiating between states with only minor differences. When two cards were of similar value but had significant implication later in the game (e.g. one allowing an entry into partner s hand the other not) the agent was not able to predict the differences in outcomes. It quite often chose the wrong one to play. It never seemed to learn the value of strategies like protecting entries into the partner hand or protecting long suits until there are no trumps left in the opponents hands. In the same way it also did not learn to appreciate the value of trying to knock entries out of the opponents hands. For the direct-prediction agent we compared the probability predicted associated with each card to the optimal card choices suggested by the oracle. The network seemed to be pretty good at predicting which card would have been good to play. The problem is that the best card to play was almost never in the hand of the agent, or not legal to play. The agent chose the legal card with the highest predicted probability, but quite often this was a very low value. In general the agent did not seem to perform better than a random player. It would likely require a significantly different design in terms of feature extraction and network architecture to be able to predict this correctly while conforming to the rules of bridge. 7 Survey of the Literature Using Monte Carlo simulations combined with an efficient double-dummy solver to play bridge was first proposed by Ginsberg [11]. The basic idea is to simulate many random deals matching the currently known distributional constraints. On each simulation an efficient double-dummy solver is used to determine the optimal play. The best card in expectation over all the simulations is then chosen. This approach has been refined since then, but still seems to form the basis of the best automated bridge playing agents [12]. Our hybrid agent implementation uses this approach once fewer than 7 tricks remain, except that we perform an exhaustive search of possible deals rather than a Monte Carlo simulation. The significant boost in performance this provides to the hybrid agent highlights the power of this approach. Machine learning has been used in a number of papers to handle the bridge bidding process. The most interesting of these was the use of deep reinforcement learning for automated bidding [13]. As we focused on playing rather than bidding, this was not directly applicable to our work. It does present an interesting approach and outperformed the state of the art in terms of bidding the optimal contracts. We could not find any research that tried to use reinforcement learning on the actual playing phase of bridge. Mossakowski et al used feedforward neural networks to predict the optimal number of tricks by a double-dummy solver [14] and hand strength [15]. The approach for predicting number of tricks is very similar to our approach for the supervised learning agent. They also used a multi-layer feedforward network. The feature extractor was quite different. It used a two numbers to represent each card in a hand (the rank and the suit) rather than indicator features. The other major difference is that they only used it to estimate the number of tricks at the start of a game, not at each point during the game to guide game play. They mentioned plans to investigate the use of their model in game play in future research, but as far as 7

8 we could find that research has not been published. 8 Conclusion Bridge represents a difficult challenge for machine learning. For reinforcement learning, the challenge of just learning the rules of the game was a huge challenge. The rules for which cards are legal to play in a specific situation are complicated and to learn to choose a legal card from an arbitrary state was challenging. Using domain knowledge to encode more information about each card into the feature vector could help with the learning, but this is not quite the same learning from scratch based purely on the raw information about the cards. The supervised learning agent performed worse than anticipated. Based on analysing the cards it chose to play compared to what was suggested by the oracle showed a number of weaknesses in the approach. The main issue seemed to be the inability to learn a successful strategy when playing third in a trick. Another significant issue was that the attributes that are typically useful in neural networks (such as being resilient to minor changes in the input) are actually counter-productive for bridge playing. A single change in the hand, such as whether a minor card is a 3 or a 4 or whether there is 1 or 0 trumps left in the opponents hands could in some cases lead to huge swings in the possible number of tricks. Even with three hidden layers, the networks were unable to learn these subtleties. Perhaps a more complex architecture, such as a deep CNN that uses the structure of the bridge hand or encoding more domain knowledge into the feature vectors could produce better results. The most surprising result is probably how well the baseline player performed, even though it is based on a very simple strategy. It required a lot of effort to surpass it. In our opinion this shows how simple rules of thumb (maxims) can contain very compact representations of important strategies in game play and the value of domain knowledge when approaching a problem. This could be part of the reason that new players can learn bridge and become relatively competitive quite quickly. Real expert-level play however takes many years or decades to develop. The current state of the art in computer bridge playing is to use Monte Carlo methods to explore the game tree and then use an efficient double-dummy solver to determine the best card for each of these states. The card that was optimal in most of these cases is then played. This is very similar to the search part of our implementation of the hybrid agent. The difficulty is that the highest number of possible states exists at the start of the hand. This is also the time when the least information can be deduced from the cards played so far. This is where the largest margin for improvement for future research will probably lie. Perhaps combining machine learning techniques with canned opening plays could improve the performance during the first trick or two. After that the most value from machine learning could probably be to try to detect (or even employ) deceptive plays or learn signaling systems to get more distributional information earlier on in the game so that the Monte Carlo process could be better focused. 8

9 References [1] G. B. et. al, Openai gym. [2] B. B. Online, Gib system notes. system notes.php. [3] B. Haglund, Dds double dummy solver. [4] M.-S. Chang, Building a fast double-dummy bridge solver, tech. rep., New York University, [5] A. Karpathy, Deep reinforcement learning: Pong from pixels. [6] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol. 12, pp , [7] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, and G. Varoquaux, API design for machine learning software: experiences from the scikit-learn project, in ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp , [8] C. Grobler and J.-P. Schmetz, Learning to play bridge - minibridge. [9] C. Grobler and J.-P. Schmetz, Learning to play bridge - policy gradient learner. [10] C. Grobler and J.-P. Schmetz, Learning to play bridge - supervised agent. [11] M. L. Ginsberg, Gib: Steps toward an expert-level bridge-playing program, tech. rep., University of Oregon, [12] P. M. Bethe, The state of automated bridge play, tech. rep., New York University, [13] C.-K. Yeh and H.-T. Lin, Automatic Bridge Bidding Using Deep Reinforcement Learning, ArXiv e-prints, July [14] K. Mossakowski and J. Mańdziuk, Artificial Neural Networks for Solving Double Dummy Bridge Problems, pp Berlin, Heidelberg: Springer Berlin Heidelberg, [15] K. Mossakowski and J. Mańdziuk, Neural Networks and the Estimation of Hands Strength in Contract Bridge, pp Berlin, Heidelberg: Springer Berlin Heidelberg,

Energy Measurement in EXO-200 using Boosted Regression Trees

Energy Measurement in EXO-200 using Boosted Regression Trees Energy Measurement in EXO-2 using Boosted Regression Trees Mike Jewell, Alex Rider June 6, 216 1 Introduction The EXO-2 experiment uses a Liquid Xenon (LXe) time projection chamber (TPC) to search for

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

Double dummy analysis of bridge hands

Double dummy analysis of bridge hands Double dummy analysis of bridge hands Provided by Peter Cheung This is the technique in solving how many tricks can be make for No Trump, Spade, Heart, Diamond, or, Club contracts when all 52 cards are

More information

Content Page. Odds about Card Distribution P Strategies in defending

Content Page. Odds about Card Distribution P Strategies in defending Content Page Introduction and Rules of Contract Bridge --------- P. 1-6 Odds about Card Distribution ------------------------- P. 7-10 Strategies in bidding ------------------------------------- P. 11-18

More information

CS221 Final Project Report Learn to Play Texas hold em

CS221 Final Project Report Learn to Play Texas hold em CS221 Final Project Report Learn to Play Texas hold em Yixin Tang(yixint), Ruoyu Wang(rwang28), Chang Yue(changyue) 1 Introduction Texas hold em, one of the most popular poker games in casinos, is a variation

More information

Artificial Intelligence. Minimax and alpha-beta pruning

Artificial Intelligence. Minimax and alpha-beta pruning Artificial Intelligence Minimax and alpha-beta pruning In which we examine the problems that arise when we try to plan ahead to get the best result in a world that includes a hostile agent (other agent

More information

Using Artificial intelligent to solve the game of 2048

Using Artificial intelligent to solve the game of 2048 Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial

More information

Creating a Poker Playing Program Using Evolutionary Computation

Creating a Poker Playing Program Using Evolutionary Computation Creating a Poker Playing Program Using Evolutionary Computation Simon Olsen and Rob LeGrand, Ph.D. Abstract Artificial intelligence is a rapidly expanding technology. We are surrounded by technology that

More information

David Bird & Taf Anthias. Winning. Leads. AN HONORS ebook FROM MASTER POINT PRESS

David Bird & Taf Anthias. Winning. Leads. AN HONORS ebook FROM MASTER POINT PRESS David Bird & Taf Anthias Winning Notrump Leads AN HONORS ebook FROM MASTER POINT PRESS Text 2011 David Bird & Taf Anthias All rights reserved. Honors ebooks is an imprint of Master Point Press. All contents,

More information

Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar

Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar Othello Rules Two Players (Black and White) 8x8 board Black plays first Every move should Flip over at least

More information

LESSON 4. Second-Hand Play. General Concepts. General Introduction. Group Activities. Sample Deals

LESSON 4. Second-Hand Play. General Concepts. General Introduction. Group Activities. Sample Deals LESSON 4 Second-Hand Play General Concepts General Introduction Group Activities Sample Deals 110 Defense in the 21st Century General Concepts Defense Second-hand play Second hand plays low to: Conserve

More information

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Programming an Othello AI Michael An (man4), Evan Liang (liange) Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

Automatic Bidding for the Game of Skat

Automatic Bidding for the Game of Skat Automatic Bidding for the Game of Skat Thomas Keller and Sebastian Kupferschmid University of Freiburg, Germany {tkeller, kupfersc}@informatik.uni-freiburg.de Abstract. In recent years, researchers started

More information

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

Cambridge University Bridge Club Beginners Lessons 2006 Lesson 2. The basics of Acol 1NT opening

Cambridge University Bridge Club Beginners Lessons 2006 Lesson 2. The basics of Acol 1NT opening Cambridge University Bridge Club Beginners Lessons 2006 Lesson 2. The basics of Acol 1NT opening Jonathan Cairns, jmc200@cam.ac.uk Introduction Last week we learnt Minibridge - a simplified version of

More information

Heads-up Limit Texas Hold em Poker Agent

Heads-up Limit Texas Hold em Poker Agent Heads-up Limit Texas Hold em Poker Agent Nattapoom Asavareongchai and Pin Pin Tea-mangkornpan CS221 Final Project Report Abstract Our project aims to create an agent that is able to play heads-up limit

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

LESSON 9. Negative Doubles. General Concepts. General Introduction. Group Activities. Sample Deals

LESSON 9. Negative Doubles. General Concepts. General Introduction. Group Activities. Sample Deals LESSON 9 Negative Doubles General Concepts General Introduction Group Activities Sample Deals 282 Defense in the 21st Century GENERAL CONCEPTS The Negative Double This lesson covers the use of the negative

More information

An Empirical Evaluation of Policy Rollout for Clue

An Empirical Evaluation of Policy Rollout for Clue An Empirical Evaluation of Policy Rollout for Clue Eric Marshall Oregon State University M.S. Final Project marshaer@oregonstate.edu Adviser: Professor Alan Fern Abstract We model the popular board game

More information

Cambridge University Bridge Club Beginners Lessons 2011 Lesson 1. Hand Evaluation and Minibridge

Cambridge University Bridge Club Beginners Lessons 2011 Lesson 1. Hand Evaluation and Minibridge Cambridge University Bridge Club Beginners Lessons 2011 Lesson 1. Hand Evaluation and Minibridge Jonathan Cairns, jmc200@cam.ac.uk Welcome to Bridge Club! Over the next seven weeks you will learn to play

More information

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault CS221 Project Final Report Deep Q-Learning on Arcade Game Assault Fabian Chan (fabianc), Xueyuan Mei (xmei9), You Guan (you17) Joint-project with CS229 1 Introduction Atari 2600 Assault is a game environment

More information

Competing for the Partscore. By Ron Klinger

Competing for the Partscore. By Ron Klinger Competing for the Partscore By Ron Klinger PARTSCORE COMPETITIVE BIDDING Jean-René Vernes article The Law of Total Tricks was published in June, 1969, in The Bridge World. It caused scarcely a ripple among

More information

Week 1 Beginner s Course

Week 1 Beginner s Course Bridge v Whist Bridge is one of the family of Whist/Trump type games. It was developed from Whist mainly in the US - and shares a lot of its features. As Whist we play with a standard pack of 52 cards

More information

Lesson 3. Takeout Doubles and Advances

Lesson 3. Takeout Doubles and Advances Lesson 3 Takeout Doubles and Advances Lesson Three: Takeout Doubles and Advances Preparation On Each Table: At Registration Desk: Class Organization: Teacher Tools: BETTER BRIDGE GUIDE CARD (see Appendix);

More information

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage Richard Kelly and David Churchill Computer Science Faculty of Science Memorial University {richard.kelly, dchurchill}@mun.ca

More information

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.

More information

CMS.608 / CMS.864 Game Design Spring 2008

CMS.608 / CMS.864 Game Design Spring 2008 MIT OpenCourseWare http://ocw.mit.edu / CMS.864 Game Design Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. DrawBridge Sharat Bhat My card

More information

LEARN HOW TO PLAY MINI-BRIDGE

LEARN HOW TO PLAY MINI-BRIDGE MINI BRIDGE - WINTER 2016 - WEEK 1 LAST REVISED ON JANUARY 29, 2016 COPYRIGHT 2016 BY DAVID L. MARCH INTRODUCTION THE PLAYERS MiniBridge is a game for four players divided into two partnerships. The partners

More information

Bridge Players: 4 Type: Trick-Taking Card rank: A K Q J Suit rank: NT (No Trumps) > (Spades) > (Hearts) > (Diamonds) > (Clubs)

Bridge Players: 4 Type: Trick-Taking Card rank: A K Q J Suit rank: NT (No Trumps) > (Spades) > (Hearts) > (Diamonds) > (Clubs) Bridge Players: 4 Type: Trick-Taking Card rank: A K Q J 10 9 8 7 6 5 4 3 2 Suit rank: NT (No Trumps) > (Spades) > (Hearts) > (Diamonds) > (Clubs) Objective Following an auction players score points by

More information

Presents: Basic Card Play in Bridge

Presents: Basic Card Play in Bridge Presents: Basic Card Play in Bridge Bridge is played with the full standard deck of 52 cards. In this deck we have 4 Suits, and they are as follows: THE BASICS of CARD PLAY in BRIDGE Each Suit has 13 cards,

More information

Machine Learning of Bridge Bidding

Machine Learning of Bridge Bidding Machine Learning of Bridge Bidding Dan Emmons January 23, 2009 Abstract The goal of this project is to create an effective machine bidder in the card game of bridge. Factors like partial information and

More information

Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN

Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN Weijie Chen Fall 2017 Weijie Chen Page 1 of 7 1. INTRODUCTION Game TEN The traditional game Tic-Tac-Toe enjoys people s favor. Moreover,

More information

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi Learning to Play like an Othello Master CS 229 Project Report December 13, 213 1 Abstract This project aims to train a machine to strategically play the game of Othello using machine learning. Prior to

More information

Learning from Hints: AI for Playing Threes

Learning from Hints: AI for Playing Threes Learning from Hints: AI for Playing Threes Hao Sheng (haosheng), Chen Guo (cguo2) December 17, 2016 1 Introduction The highly addictive stochastic puzzle game Threes by Sirvo LLC. is Apple Game of the

More information

Lesson 2 Minibridge. Defence

Lesson 2 Minibridge. Defence Lesson 2 Minibridge Defence Defence often requires you to take far less tricks than Declarer has contracted in order to beat the contract If declarer contracts to make game then all the defenders need

More information

Active and Passive leads. A passive lead has little or no risk attached to it. It means playing safe and waiting for declarer to go wrong.

Active and Passive leads. A passive lead has little or no risk attached to it. It means playing safe and waiting for declarer to go wrong. Active and Passive leads What are they? A passive lead has little or no risk attached to it. It means playing safe and waiting for declarer to go wrong. An active lead is more risky. It involves trying

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

CS221 Project Final Report Gomoku Game Agent

CS221 Project Final Report Gomoku Game Agent CS221 Project Final Report Gomoku Game Agent Qiao Tan qtan@stanford.edu Xiaoti Hu xiaotihu@stanford.edu 1 Introduction Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally

More information

Ar#ficial)Intelligence!!

Ar#ficial)Intelligence!! Introduc*on! Ar#ficial)Intelligence!! Roman Barták Department of Theoretical Computer Science and Mathematical Logic So far we assumed a single-agent environment, but what if there are more agents and

More information

Experiments on Alternatives to Minimax

Experiments on Alternatives to Minimax Experiments on Alternatives to Minimax Dana Nau University of Maryland Paul Purdom Indiana University April 23, 1993 Chun-Hung Tzeng Ball State University Abstract In the field of Artificial Intelligence,

More information

AN EVALUATION OF TWO ALTERNATIVES TO MINIMAX. Dana Nau 1 Computer Science Department University of Maryland College Park, MD 20742

AN EVALUATION OF TWO ALTERNATIVES TO MINIMAX. Dana Nau 1 Computer Science Department University of Maryland College Park, MD 20742 Uncertainty in Artificial Intelligence L.N. Kanal and J.F. Lemmer (Editors) Elsevier Science Publishers B.V. (North-Holland), 1986 505 AN EVALUATION OF TWO ALTERNATIVES TO MINIMAX Dana Nau 1 University

More information

Game-playing: DeepBlue and AlphaGo

Game-playing: DeepBlue and AlphaGo Game-playing: DeepBlue and AlphaGo Brief history of gameplaying frontiers 1990s: Othello world champions refuse to play computers 1994: Chinook defeats Checkers world champion 1997: DeepBlue defeats world

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

CSC321 Lecture 23: Go

CSC321 Lecture 23: Go CSC321 Lecture 23: Go Roger Grosse Roger Grosse CSC321 Lecture 23: Go 1 / 21 Final Exam Friday, April 20, 9am-noon Last names A Y: Clara Benson Building (BN) 2N Last names Z: Clara Benson Building (BN)

More information

SUIT CONTRACTS - PART 1 (Major Suit Bidding Conversations)

SUIT CONTRACTS - PART 1 (Major Suit Bidding Conversations) BEGINNING BRIDGE - SPRING 2018 - WEEK 3 SUIT CONTRACTS - PART 1 (Major Suit Bidding Conversations) LAST REVISED ON APRIL 5, 2018 COPYRIGHT 2010-2018 BY DAVID L. MARCH BIDDING After opener makes a limiting

More information

Questions #1 - #10 From Facebook Page A Teacher First

Questions #1 - #10 From Facebook Page A Teacher First Questions #1 to #10 (from Facebook Page A Teacher First ) #1 Question - You are South. West is the dealer. N/S not vulnerable. E/W vulnerable. West passes. North (your partner) passes. East passes. Your

More information

CS 188: Artificial Intelligence Spring Game Playing in Practice

CS 188: Artificial Intelligence Spring Game Playing in Practice CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein UC Berkeley Game Playing in Practice Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994.

More information

2048: An Autonomous Solver

2048: An Autonomous Solver 2048: An Autonomous Solver Final Project in Introduction to Artificial Intelligence ABSTRACT. Our goal in this project was to create an automatic solver for the wellknown game 2048 and to analyze how different

More information

LESSON 3. Third-Hand Play. General Concepts. General Introduction. Group Activities. Sample Deals

LESSON 3. Third-Hand Play. General Concepts. General Introduction. Group Activities. Sample Deals LESSON 3 Third-Hand Play General Concepts General Introduction Group Activities Sample Deals 72 Defense in the 21st Century Defense Third-hand play General Concepts Third hand high When partner leads a

More information

Game Playing: Adversarial Search. Chapter 5

Game Playing: Adversarial Search. Chapter 5 Game Playing: Adversarial Search Chapter 5 Outline Games Perfect play minimax search α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information Games vs. Search

More information

TEST YOUR BRIDGE TECHNIQUE

TEST YOUR BRIDGE TECHNIQUE TEST YOUR BRIDGE TECHNIQUE David Bird Tim Bourke Q led Q J 10 6 4 A 6 K 8 7 J 5 4 A K 8 K Q A 9 4 3 2 7 6 3 HOW TO PLAY DECEPTIVELY In this book we look at deceptive play from the perspective of both declarer

More information

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( ) COMP3211 Project Artificial Intelligence for Tron game Group 7 Chiu Ka Wa (20369737) Chun Wai Wong (20265022) Ku Chun Kit (20123470) Abstract Tron is an old and popular game based on a movie of the same

More information

AI Learning Agent for the Game of Battleship

AI Learning Agent for the Game of Battleship CS 221 Fall 2016 AI Learning Agent for the Game of Battleship Jordan Ebel (jebel) Kai Yee Wan (kaiw) Abstract This project implements a Battleship-playing agent that uses reinforcement learning to become

More information

Computer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta

Computer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta Computer Go: from the Beginnings to AlphaGo Martin Müller, University of Alberta 2017 Outline of the Talk Game of Go Short history - Computer Go from the beginnings to AlphaGo The science behind AlphaGo

More information

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46. Foundations of Artificial Intelligence May 30, 2016 46. AlphaGo and Outlook Foundations of Artificial Intelligence 46. AlphaGo and Outlook Thomas Keller Universität Basel May 30, 2016 46.1 Introduction

More information

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of Table of Contents Game Mechanics...2 Game Play...3 Game Strategy...4 Truth...4 Contrapositive... 5 Exhaustion...6 Burnout...8 Game Difficulty... 10 Experiment One... 12 Experiment Two...14 Experiment Three...16

More information

Summer Camp Curriculum

Summer Camp Curriculum Day 1: Introduction Summer Camp Curriculum While shuffling a deck of playing cards, announce to the class that today they will begin learning a game that is played with a set of cards like the one you

More information

Lesson 2. Overcalls and Advances

Lesson 2. Overcalls and Advances Lesson 2 Overcalls and Advances Lesson Two: Overcalls and Advances Preparation On Each Table: At Registration Desk: Class Organization: Teacher Tools: BETTER BRIDGE GUIDE CARD (see Appendix); Bidding Boxes;

More information

LESSON 2. Objectives. General Concepts. General Introduction. Group Activities. Sample Deals

LESSON 2. Objectives. General Concepts. General Introduction. Group Activities. Sample Deals LESSON 2 Objectives General Concepts General Introduction Group Activities Sample Deals 38 Bidding in the 21st Century GENERAL CONCEPTS Bidding The purpose of opener s bid Opener is the describer and tries

More information

Game Design Verification using Reinforcement Learning

Game Design Verification using Reinforcement Learning Game Design Verification using Reinforcement Learning Eirini Ntoutsi Dimitris Kalles AHEAD Relationship Mediators S.A., 65 Othonos-Amalias St, 262 21 Patras, Greece and Department of Computer Engineering

More information

LESSON 5. Rebids by Opener. General Concepts. General Introduction. Group Activities. Sample Deals

LESSON 5. Rebids by Opener. General Concepts. General Introduction. Group Activities. Sample Deals LESSON 5 Rebids by Opener General Concepts General Introduction Group Activities Sample Deals 88 Bidding in the 21st Century GENERAL CONCEPTS The Bidding Opener s rebid Opener s second bid gives responder

More information

Reinforcement Learning Applied to a Game of Deceit

Reinforcement Learning Applied to a Game of Deceit Reinforcement Learning Applied to a Game of Deceit Theory and Reinforcement Learning Hana Lee leehana@stanford.edu December 15, 2017 Figure 1: Skull and flower tiles from the game of Skull. 1 Introduction

More information

LESSON 5. Watching Out for Entries. General Concepts. General Introduction. Group Activities. Sample Deals

LESSON 5. Watching Out for Entries. General Concepts. General Introduction. Group Activities. Sample Deals LESSON 5 Watching Out for Entries General Concepts General Introduction Group Activities Sample Deals 114 Lesson 5 Watching out for Entries GENERAL CONCEPTS Play of the Hand Entries Sure entries Creating

More information

Opener s Rebid when it is a Limit Bid

Opener s Rebid when it is a Limit Bid 10 A K 10 7 4 3 A 3 2 7 3 2 J 7 4 3 Q 9 8 2 Q 6 5 J 9 2 10 8 5 K Q J 7 A K Q 10 9 A K 6 5 8 9 6 4 J 8 6 5 4 J 7 6 4 3 2 6 5 9 8 K Q 4 A K Q 5 10 K 9 2 Q J 10 7 4 3 6 5 4 A 3 2 J 8 6 A 7 3 9 8 A 8 K Q J

More information

HENRY FRANCIS (EDITOR-IN-CHIEF), THE OFFICIAL ENCYCLOPEDIA OF BRIDGE

HENRY FRANCIS (EDITOR-IN-CHIEF), THE OFFICIAL ENCYCLOPEDIA OF BRIDGE As many as ten factors may influence a player s decision to overcall. In roughly descending order of importance, they are: Suit length Strength Vulnerability Level Suit Quality Obstruction Opponents skill

More information

CMS.608 / CMS.864 Game Design Spring 2008

CMS.608 / CMS.864 Game Design Spring 2008 MIT OpenCourseWare http://ocw.mit.edu CMS.608 / CMS.864 Game Design Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. The All-Trump Bridge Variant

More information

ALAN TRUSCOTT BRIDGE EDITOR FOR THE NEW YORK TIMES

ALAN TRUSCOTT BRIDGE EDITOR FOR THE NEW YORK TIMES ALAN TRUSCOTT BRIDGE EDITOR FOR THE NEW YORK TIMES M A S T E R P O I N T P R E S S T O R O N T O 1987, 2004 AlanTruscott. All rights reserved. It is illegal to reproduce any portion of this material, except

More information

How the bidding works, Opening 1NT Lesson 6

How the bidding works, Opening 1NT Lesson 6 How the bidding works, Opening Lesson 6 New terms met in this lesson auction balanced bidding bidding box call contract denomination game bid grand slam no bid opener opening bid raise response responder

More information

LESSON 2. Developing Tricks Promotion and Length. General Concepts. General Introduction. Group Activities. Sample Deals

LESSON 2. Developing Tricks Promotion and Length. General Concepts. General Introduction. Group Activities. Sample Deals LESSON 2 Developing Tricks Promotion and Length General Concepts General Introduction Group Activities Sample Deals 40 Lesson 2 Developing Tricks Promotion and Length GENERAL CONCEPTS Play of the Hand

More information

ADVERSARIAL SEARCH. Chapter 5

ADVERSARIAL SEARCH. Chapter 5 ADVERSARIAL SEARCH Chapter 5... every game of skill is susceptible of being played by an automaton. from Charles Babbage, The Life of a Philosopher, 1832. Outline Games Perfect play minimax decisions α

More information

Documentation and Discussion

Documentation and Discussion 1 of 9 11/7/2007 1:21 AM ASSIGNMENT 2 SUBJECT CODE: CS 6300 SUBJECT: ARTIFICIAL INTELLIGENCE LEENA KORA EMAIL:leenak@cs.utah.edu Unid: u0527667 TEEKO GAME IMPLEMENTATION Documentation and Discussion 1.

More information

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Nikolai Yakovenko NVidia ADLR Group -- Santa Clara CA Columbia University Deep Learning Seminar April 2017 Poker is a Turn-Based

More information

Defensive Signals. Attitude Signals

Defensive Signals. Attitude Signals Defensive Signals Quite often, when I am defending, I would like to literally say to partner Partner, I have the setting tricks in spades. Please lead a spade. Of course, the rules of bridge forbid me

More information

Begin contract bridge with Ross Class Three. Bridge customs.

Begin contract bridge with Ross   Class Three. Bridge customs. Begin contract bridge with Ross www.rossfcollins.com/bridge Class Three Bridge customs. Taking tricks. Tricks that are won should be placed in front of one of the partners, in order, face down, with separation

More information

LESSON 7. Interfering with Declarer. General Concepts. General Introduction. Group Activities. Sample Deals

LESSON 7. Interfering with Declarer. General Concepts. General Introduction. Group Activities. Sample Deals LESSON 7 Interfering with Declarer General Concepts General Introduction Group Activities Sample Deals 214 Defense in the 21st Century General Concepts Defense Making it difficult for declarer to take

More information

Search Algorithms for a Bridge Double Dummy Solver

Search Algorithms for a Bridge Double Dummy Solver Bo Haglund, Soren Hein DDS v2.8, 2014-11-18 Search Algorithms for a Bridge Double Dummy Solver This description is intended for anyone interested in the inner workings of a bridge double dummy solver (DDS).

More information

The Exciting World of Bridge

The Exciting World of Bridge The Exciting World of Bridge Welcome to the exciting world of Bridge, the greatest game in the world! These lessons will assume that you are familiar with trick taking games like Euchre and Hearts. If

More information

5-Card Major Bidding Flipper

5-Card Major Bidding Flipper 5-Card Major Bidding Flipper ADVANTAGES OF 5-CARD MAJORS 1. You do not need to rebid your major suit to indicate a 5-card holding. If you open 1 or 1 and partner does not raise, you do not feel the compulsion

More information

ALL YOU SHOULD KNOW ABOUT REVOKES

ALL YOU SHOULD KNOW ABOUT REVOKES E U R O P E AN B R I D G E L E A G U E 9 th EBL Main Tournament Directors Course 30 th January to 3 rd February 2013 Bad Honnef Germany ALL YOU SHOULD KNOW ABOUT REVOKES by Ton Kooijman - 2 All you should

More information

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel Foundations of AI 6. Adversarial Search Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard & Bernhard Nebel Contents Game Theory Board Games Minimax Search Alpha-Beta Search

More information

GLOSSARY OF BRIDGE TERMS

GLOSSARY OF BRIDGE TERMS GLOSSARY OF BRIDGE TERMS Acol A bidding system popular in the UK. Balanced Hand A balanced hand has cards in all suits and does not have shortages (voids, singletons) and/or length in any one suit. More

More information

Pass, Bid or Double Workshop

Pass, Bid or Double Workshop Pass, Bid or Double Workshop PASS, BID OR DOUBLE DETERMINING FACTORS In competitive auctions (both sides bidding), the make or break decision is whether or not to PASS, BID or DOUBLE? This Workshop is

More information

BRIDGE is a card game for four players, who sit down at a

BRIDGE is a card game for four players, who sit down at a THE TRICKS OF THE TRADE 1 Thetricksofthetrade In this section you will learn how tricks are won. It is essential reading for anyone who has not played a trick-taking game such as Euchre, Whist or Five

More information

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm by Silver et al Published by Google Deepmind Presented by Kira Selby Background u In March 2016, Deepmind s AlphaGo

More information

LESSON 3. Responses to 1NT Opening Bids. General Concepts. General Introduction. Group Activities. Sample Deals

LESSON 3. Responses to 1NT Opening Bids. General Concepts. General Introduction. Group Activities. Sample Deals LESSON 3 Responses to 1NT Opening Bids General Concepts General Introduction Group Activities Sample Deals 58 Bidding in the 21st Century GENERAL CONCEPTS Bidding The role of each player The opener is

More information

LESSON 2. Opening Leads Against Suit Contracts. General Concepts. General Introduction. Group Activities. Sample Deals

LESSON 2. Opening Leads Against Suit Contracts. General Concepts. General Introduction. Group Activities. Sample Deals LESSON 2 Opening Leads Against Suit Contracts General Concepts General Introduction Group Activities Sample Deals 40 Defense in the 21st Century General Concepts Defense The opening lead against trump

More information

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play NOTE Communicated by Richard Sutton TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play Gerald Tesauro IBM Thomas 1. Watson Research Center, I? 0. Box 704, Yorktozon Heights, NY 10598

More information

5.4 Imperfect, Real-Time Decisions

5.4 Imperfect, Real-Time Decisions 5.4 Imperfect, Real-Time Decisions Searching through the whole (pruned) game tree is too inefficient for any realistic game Moves must be made in a reasonable amount of time One has to cut off the generation

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search 1 By the end, you will know Why we use Monte Carlo Search Trees The pros and cons of MCTS How it is applied to Super Mario Brothers and Alpha Go 2 Outline I. Pre-MCTS Algorithms

More information

POINTS TO REMEMBER Planning when to draw trumps

POINTS TO REMEMBER Planning when to draw trumps Planning the Play of a Bridge Hand 6 POINTS TO REMEMBER Planning when to draw trumps The general rule is: Draw trumps immediately unless there is a good reason not to. When you are planning to ruff a loser

More information

Bidding Over Opponent s 1NT Opening

Bidding Over Opponent s 1NT Opening Bidding Over Opponent s 1NT Opening A safe way to try to steal a hand. Printer friendly version Before You Start The ideas in this article require partnership agreement. If you like what you read, discuss

More information

Automated Suicide: An Antichess Engine

Automated Suicide: An Antichess Engine Automated Suicide: An Antichess Engine Jim Andress and Prasanna Ramakrishnan 1 Introduction Antichess (also known as Suicide Chess or Loser s Chess) is a popular variant of chess where the objective of

More information

LESSON 3. Developing Tricks the Finesse. General Concepts. General Information. Group Activities. Sample Deals

LESSON 3. Developing Tricks the Finesse. General Concepts. General Information. Group Activities. Sample Deals LESSON 3 Developing Tricks the Finesse General Concepts General Information Group Activities Sample Deals 64 Lesson 3 Developing Tricks the Finesse Play of the Hand The finesse Leading toward the high

More information

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH Santiago Ontañón so367@drexel.edu Recall: Adversarial Search Idea: When there is only one agent in the world, we can solve problems using DFS, BFS, ID,

More information

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence Adversarial Search CS 486/686: Introduction to Artificial Intelligence 1 Introduction So far we have only been concerned with a single agent Today, we introduce an adversary! 2 Outline Games Minimax search

More information

Artificial Intelligence Adversarial Search

Artificial Intelligence Adversarial Search Artificial Intelligence Adversarial Search Adversarial Search Adversarial search problems games They occur in multiagent competitive environments There is an opponent we can t control planning again us!

More information

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1 Unit-III Chap-II Adversarial Search Created by: Ashish Shah 1 Alpha beta Pruning In case of standard ALPHA BETA PRUNING minimax tree, it returns the same move as minimax would, but prunes away branches

More information

LESSON 9. Jacoby Transfers. General Concepts. General Introduction. Group Activities. Sample Deals

LESSON 9. Jacoby Transfers. General Concepts. General Introduction. Group Activities. Sample Deals LESSON 9 Jacoby Transfers General Concepts General Introduction Group Activities Sample Deals 226 Lesson 9 Jacoby Transfers General Concepts This chapter covers the use of the Jacoby transfer for the major

More information

Generalized Game Trees

Generalized Game Trees Generalized Game Trees Richard E. Korf Computer Science Department University of California, Los Angeles Los Angeles, Ca. 90024 Abstract We consider two generalizations of the standard two-player game

More information