CS221 Project Final Report Learning to play bridge
|
|
- Primrose Gilbert
- 6 years ago
- Views:
Transcription
1 CS221 Project Final Report Learning to play bridge Conrad Grobler (conradg) and Jean-Paul Schmetz (jschmetz) Autumn Introduction We investigated the use of machine learning in bridge playing. Bridge presents an interesting challenge as it combines adversarial game-play with the uncertainty about which cards are in which hands. It is different from many other games of chance, in that the randomness happens only once during the deal at the start of a hand. In other games like backgammon the randomness happens at every move. We investigated two different machine learning approaches and their use in bridge playing. The one is based on policy gradient reinforcement learning and tries to learn the rules of bridge from scratch by playing many random games. The second uses supervised learning where it learns from games played by oracle players. We present the results of these investigations and a survey of the literature regarding computer bridge playing and specifically the use of neural networks in doing so. 2 Bridge Game and Setup A bridge game consists of two distinct phases: an auction phase where players bid to define the contract and trump suit; and a playing phase where the players compete to win tricks. Each trick consists of 4 cards being played, one from each player s hand. A game is played between two teams (North/South and East/West). The players continue playing until all the cards from all the hands have been played. The winner of a trick plays first on the next trick. The other players follow clockwise. On the very first trick, the player to the left of the declarer plays the first card. The declarer is the winner of the auction. As soon as the first card is played, the cards held by the partner of the declarer is placed on the table - the so-called dummy hand. It is visible to all players. Play from the dummy hand is also controlled by the declarer. Declarer s partner does not play any further part in the rest of the game. The focus of our investigations was on the playing phase, but valuable information about the distributions of the different hands can be communicated during the bidding phase. We have implementing a simplified bidding simulation to extract some of this distributional information, but this does not form part of the competitive playing between agents. It is done as a separate preprocessing step on each board. The results are provided to the agents during the initialisation of each hand. Each player is represented as an agent. The agent is given the current state of the game. The state consists of the cards in each hand, the cards which have been played in the trick so far and who the next player is. The agent must then select a legal card to play from the cards in their hand, and in doing so try to win as many tricks as possible. 3 Infrastructure The training is handled differently for the two agents. For the RL agent we built a gym (compatible with OpenAI Gym [1]) where it can train over many hands against a different agent. The initial training is done against a random player to explore as many different 1
2 hands as possible. The supervised learning agent uses the oracle result as the correct training data. It either tries to learn to predict the optimal number of tricks that can be won by a specific action from a specific state or to directly learn the best action from a give state. The supervised training system follows the play of optimal oracle players to explore the game tree. At at a specific state it uses that state as the input and information from the oracle as the desired output. The trainer saves the trained models periodically to disk to be used by the agent during play. The playing capability of each agent type is evaluated by a controller that manages the game play. The controller takes as input how many cards to deal per hand (in case smaller hands are required for debugging), the number of boards to play and the agent type to use for each player. Our standard evaluation process is to play a team made up of the agents being tested against a team made up of oracle players. They play boards and the controller keeps track of the number of tricks each team wins. The average number of tricks won by the team of agents against the oracle team is used as the benchmark for performance. The controller performs the bidding process as a preprocessing step on every game. The bidding is implemented as a simplified versions of the basic convention used by Bridge Base Online agents[2]. This provides initial information on the distribution of the hands and is also used to determine the trump suit and the starting player. The agent is instantiated with some initial data on each board: distributional information from the bidding process and the trump suit. When it is the turn of a particular player, the controller passes the agent the current states of the game (cards remaining in player s hand, cards remaining in dummy, cards already played in this trick) and the agent returns the action to take, i.e. the next card to play. The controller uses this to calculate the next state. The agent is responsible for maintaining its own internal state. At the end of each trick the controller calls each agent and passes the 4 cards that were played during the trick so that the agent can track the playing history. The controller also renders a visualisation of each trick played when it is used in verbose mode: 3.1 Oracle and baseline The oracle is implemented as a double-dummy solver. The oracle is able to cheat by looking at all the cards. It can then compute the optimal play through search as there is no uncertainty left. We used a very efficient open source c++ implementation of a double-dummy solver [3]. We also implemented two different double-dummy solver using python for testing and debugging purposes, but these are only usable on small hands (6-7 tricks or fewer) due to the speed. The first is a modified minimax implementation with αβ-pruning. The second is based on a zero-window search approach suggested by Chang [4]. The baseline is implemented as a simple greedy agent (i.e. it will try to win the current trick as cheaply as possible), but we added a few additional rules that try to match some of the maxims taught to new bridge players (e.g. second hand low, third hand high). The baseline does not keep track of any internal state, apart from the trump suit, and looks only at the current trick. 3.2 Feature extraction The feature extractor used by the supervised learning agent takes the current game state (the cards left in each hand, the cards played so far in the trick, the index of the next player and the trump suit) to create a feature vector that can be used by the learning algorithms. The feature extractor represents each hand 2
3 or each card that was played by using 65 values. The first 52 features of a 65 feature batch represent the 52 cards in a deck. Because trumps are special, there is a separate 13 values for the trump suit. When playing no-trumps these values would be all 0. When a trump contract is played, the values of the trump suit are mapped in these special 13 values and the values corresponding to the original suit are all zero: 0-12: clubs if clubs not trumps, 13-25: diamonds if diamonds not trumps, 26-38: hearts if hearts not trumps, 39-51: spades if spades not trumps, 52-64: trumps. Cards are represented in order of increasing rank. The first 65 values represent the next hand to play. Each feature is 1 if the card is in the hand and 0 if not. For the next three batches of 65 values represent the other hands in playing order. If it is a visible hand (current player s hand or dummy) 1 indicates that a card is in the hand and 0 that it is not. For a non-visible hand the value of each feature is the probability that the cards is in a specific hand. At the moment the feature extractor assumes a uniform distribution. The next three batches of 65 features represent the cards on the table (played so far in the trick). This follows the same convention for trumps as above. If card was played, it is represented as a 1. All other values are 0. We also experimented with including the full history of play in the feature vector, with up to 48 additional batches of 65 features each. Each group of 4 represented the cards played in a completed trick to provide the full history of the board so far. This did not seem to improve the performance of the agent and quadrupled the size of the trained models, so we are not currently including this information in the feature vector. We also implemented optional hand normalisation to reduce the required training time. This takes advantage of the equivalence of many different hands, especially when only a few tricks remain. The winner of a trick in a specific suit only depends on the relative ranks of the cards, not the absolute ranks. Consider the example when each player only has one card left (let s say hearts) with west leading. The following play sequences are conceptually equivalent: K Q J 10; ; A and The hand normalisation removes all cards that have been played from the state and reduces the ranks of cards left in the hand to replace these. In all of the examples above it would have resulted in the last sequence of This makes different states appear more similar to the training algorithm and we managed to reduce the required number of training examples to achieve the same performance by a factor of approximately 5. We also implemented an option that allows the supervised learning algorithm to cheat by looking at the opponents hands to support additional analysis. This provides some insight into which errors are based on the uncertainty and which are based on the general approach. The policy gradient agent was tried with 3 different sets of features. The first is exactly the same as the supervised learning agent. The second is a simplified version in which the game state is represented by 5 rows of 52 cards (collapsed into a 260 value vector). The first row represents the card held by the player and the other rows representing the state of the dummy hand, table, played cards and suits (incl. trumps). We also implemented a simpler game ( minibridge ) which essentially is a 2-player bridge. This was used to test the policy gradient learning agent as it only required a few hundred thousand games to train effectively. The third version of our feature extractor represented the state of minibridge in 5 rows of 52 cards. 4 Approach 4.1 Policy Gradient Agent The policy gradient agent tries to learn the appropriate policy for playing bridge from scratch by playing many random hands against other agents. The initial training was done against a random player to allow it to see as many different hands and states as possible. The agent receives a large negative reward if it tries to perform an illegal action. If it performs a legal action that does not lead to a trick being won it receives a small positive reward. If the legal action 3
4 leads to winning a trick, it receives a larger positive reward. The policy gradient agent is implemented based on the explanation of deep reinforcement learning by Andrej Karpathy [5]. Unlike Pong, the difficulty is to learn the probability of playing one of 52 cards instead of a binary decision (UP/DOWN). We decided to learn a policy for each card. The learner plays one of 52 cards randomly based on their normalized probabilities and collects the rewards until the game is finished (an episode is a 13 tricks game). It then discounts the rewards and back-propagate the episode to the neural networks of the cards that were played in the episode (back-propagating to all cards proved to be harmful). Doing so ensures that the learner explores a large set of possibilities while still learning at a reasonable albeit slow rate. It does also mean that the learning requires more than a million games before the player consistently learns to play by the rules. The agent is a collection of 52 neural networks each with one hidden layer (of relatively small size. We tried between 5 and 50 values) with ReLU nonlinearity and the single value output squeezed between zero and 1 by a sigmoid function. In order to test the learning agent, we created a simplified version of bridge ( minibridge ) in which 2 players both receive 13 cards chosen randomly from a deck of 52 and basically play bridge i.e. the first player (chosen randomly) plays a card and the other player needs to follow suit (if possible). The winner is the player with the largest card in the suit declared. The winner starts the next trick. This game is close enough to bridge to test whether the policy learning agent would be able to learn a more complicated game. The gym that exercises the player is patient and allows the player to try many cards until a correct one is played - each illegal cards tried however receives a relatively large negative reward. Stopping the round at every illegal move proved to be inefficient. Once the learner proved to be able to play minibridge (see results below), we deployed a large numbers of these learners on the more complete sets of features representing the real game of bridge. The learning time was expected to be much larger than for minibridge. Except for the structure of the features (which to a human is a representation of multiple rows of cards) and the structure of 52 cards, the learning agent has no concept of bridge. It is just trying to learn the optimal decision out of 52 possible actions based on a sparse vector of observation. This is important as we really wanted the learner to learn from scratch. In fact it seems that it could learn any card game which involves putting a card on the table at every turn. 4.2 Supervised Learning Agent The main supervised learning algorithm uses a multilayer perceptron regressor (specifically sklearn.neural network.mlpregressor [6], [7]) to learn to predict the number of tricks that can be won from a specific state. Specifically, it uses 3 hidden layers with 400, 300 and 200 nodes. These numbers seemed to provide the most robust performance over a number of training attempts. It uses a sigmoid gain function for the hidden layers. For the output layer it uses a linear model. Conceptually it tries to use the hidden layers to learn useful features and then feed these into the final layer to do linear regression. One thing we noticed early on is that the learning was struggling to cope with the different roles related to the position in a trick. The approach when leading the first card into a new trick is very different to the approach when playing the last card in the trick. To help with this we created 4 neural networks for each agent. The training and playing would use the neural network corresponding to the number of cards already played in the trick. This caused a significant boost in the performance of the agent. The training is done on many random hands. For a specific hand it randomly explores the search tree based on the play of the oracle players. From a specific state it uses the double-dummy solver (oracle) to get the value of actual tricks that could be won from optimal play for each of the possible next states. it aggregates the training data in batches of When a batch of training samples are ready it performs additional training on the model with this batch. 4
5 During play, the agent uses the learnt model and estimates the number of of tricks for each of the possible next states that result from the possible actions at that point. It then chooses the action with the highest score. Another approach we tried was to try to learn the best action directly, rather than deducing it from a predicted number of tricks. For this we used a multilayer perceptron classifier (specifically sklearn.neural network.mlpclassifier [6], [7]). The activation and the number and sizes of the hidden layers match those used in the MLPRegressor implementation The output was and array of 52 possible values, each one corresponding to a card. The generation of training data followed the same process as above, except that it used the optimal cards calculated by the oracle as the training data, rather than the predicted number of tricks. During play the agent plays the card from the list of legal cards that had the highest probability predicted by the classifier. We also implemented a hybrid agent option. This allows the supervised learning agent to switch over to using search to find the optimal card to play once the hand becomes tractable (in the experiments we did this when there were 6 or fewer tricks left). The search process enumerates all possible states that could exist based on the unknown hands. The agent keeps track of the minimum and maximum possible cards in each suit in each hand as the play unfolds. It then evaluate each of these possible states against this information. If it satisfies the constraints it uses the oracle to determine the optimal cards to play from that state. It then plays the card that was returned as the optimal card most ofter over these test states. 5 Results 5.1 Reinforcement learning The minibridge agent is able to learn a policy that allows it to score around 200 points against a random opponent. (playing 1000 games in a row). This is better than the expected 180 average. This means that the agent learned not only to play correctly but also to be good at winning the game. Both learner and player agents (with a learned policy) can be found in a Codalab worksheet. [8]. The training usually requires about 1 million episode (an episode is defined as a 13 tricks round played to the end including all the illegal moves tried by the learner). The real bridge agent takes a lot more episodes to learn to play (also, playing one episode takes a longer time). The learning agent still has a running average (over 100 episodes) of about per episode after half a million episode. This means essentially that the agent is still trying roughly 150 illegal moves per episode (this is not a problem as each illegal move contributes to decrease the probability of that move by a tiny fraction every time). This compares to minibridge, where the agent reaches average running rewards relatively quickly and reaches a plateau of about +160 (it is always exploring and therefore collecting negative rewards while learning) before often diverging again (see Error Analysis below). After 5 days of learning and playing almost 2,000,000 episodes, the policy learned is able to reach a running reward of slightly negative (in the range of -100 to -500) while still learning (i.e. still collecting negative points). The player (which always plays the most probable actions in the learned policy) wins consistently against the random player (by nearly the same margin as the baseline player) and makes few mistakes (defined as playing an illegal card). Both learner and player for a more fully featured bridge can be found in a Codalab worksheet. [9]. We were unable to fully train an agent using the full feature set (which would allow the player to use the knowledge of the order of all the moves in the episode). It does seem possible however but would require a much longer training time (measured in weeks most likely). 5.2 Supervised Learning Agent Below is a summary of the average results from running the supervised training experiment. More details, and two example runs of the experiments can be found in our CodaLab worksheet [10]. 5
6 The results were produced by running the controller over games each with 13 tricks. In all cases the agent was used for playing North and South and oracle players used for East and West. Agent Average won Random 3.8 Baseline 4.5 Normalised 4.0 Direct 3.8 Cheating 4.3 Hybrid 4.7 Oracle 6.5 The results show that the baseline performs surprisingly well. The default agent using a normalised hand performed better than random but did not match the baseline. The cheating agent (removing uncertainty) performed better than the default agent, but not by as big a margin as we expected. The best performing agent (apart from the oracle) was the hybrid agent that switched over to search once the hand became tractable. As part of testing we found a curious phenomenon. The training process was surprisingly volatile. When we trained a model with exactly the same parameters, including the same number of training example, the performance of the agent sometimes fluctuated wildly. The only difference between each run was the random hands that were generated, but the number of hands, number of training examples and all other parameters were the same. We initially thought is was randomness in the evaluation process, but the differences remained over different assessments of the same trained model. For example, different training runs of the default normalised agent produced results usually ranging from 3.9 to 4.2. Most of the tests results were close to 4.1 and a few were worse. One striking example performed extremely well with a result of 4.8. At the time we believed it was due to hyperparameter tuning, but repeated further attempts with the same parameters never reproduced that result. It seems that the training is very sensitive to the details of the training samples. Another interesting observation was that the performance seemed to vary with training samples in a cyclical manner. As more training samples were added the performance would increase up to a point, and then start decreasing and at some later point increasing again. This is likely due to some hands that look very similar but with very different results confusing the learning process. Due to the random nature of the training data the process is bound to encounter this at some point. As even more samples are added, it probably drowns out the problems caused by the conflicting information until other new conflicting information is found. 6 Error Analysis 6.1 Reinforcement Learning On minibridge, it is clear that the player has really learned the game (i.e. the learner makes very few mistakes). It still has moments where it plays an illegal card but they are relatively rare. The hidden feature vector looks like an indicator of playability of the card. The puzzling thing is that the probability of playing is always extremely low regardless of whether the card is legal or illegal. It turns out that the very low probability of playing a legal card is higher than the very low probability of playing an illegal card. In some observations, we saw the comparison being done on vs This of course is a problem as the training will eventually underflow even for the right action. On the more realistic game of bridge, the learner plays more illegal moves, but consistently beats a random player. More details and a few sample runs can be found on the codalab page[9]. 6.2 Supervised Learning For the supervised learning agent we compared the choices of the agent with the oracle in the same circumstances over a number of hands to understand where the agent diverged from the optimal play. For the default supervised agent we looked at the predicted number of tricks, the actual optimal tricks and how this impacted the card which was chosen. In general it was quite good at predicting the number 6
7 of tricks that could be won from a specific state. If the purpose was purely to train a neural network to predict the number of tricks that could be made by a double-dummy solver, the results would have been pretty good. The problem is that it did not match the prediction at crucial parts in the play. The most significant problem we found was that the agent never seemed to learn the third hand plays high maxim. Whenever the agent played the third hand in a trick it almost always played low. It was trying to protect its high cards for the future, but that allowed the player in the fourth hand to win cheap tricks. The whole point behind the third hand high maxim is to force high cards from the opponents to make the cards in the partner s hand good. In spite of training a separate network for each playing position and running many attempts, it never learnt a good strategy for the third hand. The playing results look much more in line with expectations for the other hands. Another more subtle problem was that the network was not good at differentiating between states with only minor differences. When two cards were of similar value but had significant implication later in the game (e.g. one allowing an entry into partner s hand the other not) the agent was not able to predict the differences in outcomes. It quite often chose the wrong one to play. It never seemed to learn the value of strategies like protecting entries into the partner hand or protecting long suits until there are no trumps left in the opponents hands. In the same way it also did not learn to appreciate the value of trying to knock entries out of the opponents hands. For the direct-prediction agent we compared the probability predicted associated with each card to the optimal card choices suggested by the oracle. The network seemed to be pretty good at predicting which card would have been good to play. The problem is that the best card to play was almost never in the hand of the agent, or not legal to play. The agent chose the legal card with the highest predicted probability, but quite often this was a very low value. In general the agent did not seem to perform better than a random player. It would likely require a significantly different design in terms of feature extraction and network architecture to be able to predict this correctly while conforming to the rules of bridge. 7 Survey of the Literature Using Monte Carlo simulations combined with an efficient double-dummy solver to play bridge was first proposed by Ginsberg [11]. The basic idea is to simulate many random deals matching the currently known distributional constraints. On each simulation an efficient double-dummy solver is used to determine the optimal play. The best card in expectation over all the simulations is then chosen. This approach has been refined since then, but still seems to form the basis of the best automated bridge playing agents [12]. Our hybrid agent implementation uses this approach once fewer than 7 tricks remain, except that we perform an exhaustive search of possible deals rather than a Monte Carlo simulation. The significant boost in performance this provides to the hybrid agent highlights the power of this approach. Machine learning has been used in a number of papers to handle the bridge bidding process. The most interesting of these was the use of deep reinforcement learning for automated bidding [13]. As we focused on playing rather than bidding, this was not directly applicable to our work. It does present an interesting approach and outperformed the state of the art in terms of bidding the optimal contracts. We could not find any research that tried to use reinforcement learning on the actual playing phase of bridge. Mossakowski et al used feedforward neural networks to predict the optimal number of tricks by a double-dummy solver [14] and hand strength [15]. The approach for predicting number of tricks is very similar to our approach for the supervised learning agent. They also used a multi-layer feedforward network. The feature extractor was quite different. It used a two numbers to represent each card in a hand (the rank and the suit) rather than indicator features. The other major difference is that they only used it to estimate the number of tricks at the start of a game, not at each point during the game to guide game play. They mentioned plans to investigate the use of their model in game play in future research, but as far as 7
8 we could find that research has not been published. 8 Conclusion Bridge represents a difficult challenge for machine learning. For reinforcement learning, the challenge of just learning the rules of the game was a huge challenge. The rules for which cards are legal to play in a specific situation are complicated and to learn to choose a legal card from an arbitrary state was challenging. Using domain knowledge to encode more information about each card into the feature vector could help with the learning, but this is not quite the same learning from scratch based purely on the raw information about the cards. The supervised learning agent performed worse than anticipated. Based on analysing the cards it chose to play compared to what was suggested by the oracle showed a number of weaknesses in the approach. The main issue seemed to be the inability to learn a successful strategy when playing third in a trick. Another significant issue was that the attributes that are typically useful in neural networks (such as being resilient to minor changes in the input) are actually counter-productive for bridge playing. A single change in the hand, such as whether a minor card is a 3 or a 4 or whether there is 1 or 0 trumps left in the opponents hands could in some cases lead to huge swings in the possible number of tricks. Even with three hidden layers, the networks were unable to learn these subtleties. Perhaps a more complex architecture, such as a deep CNN that uses the structure of the bridge hand or encoding more domain knowledge into the feature vectors could produce better results. The most surprising result is probably how well the baseline player performed, even though it is based on a very simple strategy. It required a lot of effort to surpass it. In our opinion this shows how simple rules of thumb (maxims) can contain very compact representations of important strategies in game play and the value of domain knowledge when approaching a problem. This could be part of the reason that new players can learn bridge and become relatively competitive quite quickly. Real expert-level play however takes many years or decades to develop. The current state of the art in computer bridge playing is to use Monte Carlo methods to explore the game tree and then use an efficient double-dummy solver to determine the best card for each of these states. The card that was optimal in most of these cases is then played. This is very similar to the search part of our implementation of the hybrid agent. The difficulty is that the highest number of possible states exists at the start of the hand. This is also the time when the least information can be deduced from the cards played so far. This is where the largest margin for improvement for future research will probably lie. Perhaps combining machine learning techniques with canned opening plays could improve the performance during the first trick or two. After that the most value from machine learning could probably be to try to detect (or even employ) deceptive plays or learn signaling systems to get more distributional information earlier on in the game so that the Monte Carlo process could be better focused. 8
9 References [1] G. B. et. al, Openai gym. [2] B. B. Online, Gib system notes. system notes.php. [3] B. Haglund, Dds double dummy solver. [4] M.-S. Chang, Building a fast double-dummy bridge solver, tech. rep., New York University, [5] A. Karpathy, Deep reinforcement learning: Pong from pixels. [6] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol. 12, pp , [7] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, and G. Varoquaux, API design for machine learning software: experiences from the scikit-learn project, in ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp , [8] C. Grobler and J.-P. Schmetz, Learning to play bridge - minibridge. [9] C. Grobler and J.-P. Schmetz, Learning to play bridge - policy gradient learner. [10] C. Grobler and J.-P. Schmetz, Learning to play bridge - supervised agent. [11] M. L. Ginsberg, Gib: Steps toward an expert-level bridge-playing program, tech. rep., University of Oregon, [12] P. M. Bethe, The state of automated bridge play, tech. rep., New York University, [13] C.-K. Yeh and H.-T. Lin, Automatic Bridge Bidding Using Deep Reinforcement Learning, ArXiv e-prints, July [14] K. Mossakowski and J. Mańdziuk, Artificial Neural Networks for Solving Double Dummy Bridge Problems, pp Berlin, Heidelberg: Springer Berlin Heidelberg, [15] K. Mossakowski and J. Mańdziuk, Neural Networks and the Estimation of Hands Strength in Contract Bridge, pp Berlin, Heidelberg: Springer Berlin Heidelberg,
Energy Measurement in EXO-200 using Boosted Regression Trees
Energy Measurement in EXO-2 using Boosted Regression Trees Mike Jewell, Alex Rider June 6, 216 1 Introduction The EXO-2 experiment uses a Liquid Xenon (LXe) time projection chamber (TPC) to search for
More informationCS 229 Final Project: Using Reinforcement Learning to Play Othello
CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.
More informationDouble dummy analysis of bridge hands
Double dummy analysis of bridge hands Provided by Peter Cheung This is the technique in solving how many tricks can be make for No Trump, Spade, Heart, Diamond, or, Club contracts when all 52 cards are
More informationContent Page. Odds about Card Distribution P Strategies in defending
Content Page Introduction and Rules of Contract Bridge --------- P. 1-6 Odds about Card Distribution ------------------------- P. 7-10 Strategies in bidding ------------------------------------- P. 11-18
More informationCS221 Final Project Report Learn to Play Texas hold em
CS221 Final Project Report Learn to Play Texas hold em Yixin Tang(yixint), Ruoyu Wang(rwang28), Chang Yue(changyue) 1 Introduction Texas hold em, one of the most popular poker games in casinos, is a variation
More informationArtificial Intelligence. Minimax and alpha-beta pruning
Artificial Intelligence Minimax and alpha-beta pruning In which we examine the problems that arise when we try to plan ahead to get the best result in a world that includes a hostile agent (other agent
More informationUsing Artificial intelligent to solve the game of 2048
Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial
More informationCreating a Poker Playing Program Using Evolutionary Computation
Creating a Poker Playing Program Using Evolutionary Computation Simon Olsen and Rob LeGrand, Ph.D. Abstract Artificial intelligence is a rapidly expanding technology. We are surrounded by technology that
More informationDavid Bird & Taf Anthias. Winning. Leads. AN HONORS ebook FROM MASTER POINT PRESS
David Bird & Taf Anthias Winning Notrump Leads AN HONORS ebook FROM MASTER POINT PRESS Text 2011 David Bird & Taf Anthias All rights reserved. Honors ebooks is an imprint of Master Point Press. All contents,
More informationOthello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar
Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar Othello Rules Two Players (Black and White) 8x8 board Black plays first Every move should Flip over at least
More informationLESSON 4. Second-Hand Play. General Concepts. General Introduction. Group Activities. Sample Deals
LESSON 4 Second-Hand Play General Concepts General Introduction Group Activities Sample Deals 110 Defense in the 21st Century General Concepts Defense Second-hand play Second hand plays low to: Conserve
More informationProgramming an Othello AI Michael An (man4), Evan Liang (liange)
Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black
More informationCandyCrush.ai: An AI Agent for Candy Crush
CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.
More informationAutomatic Bidding for the Game of Skat
Automatic Bidding for the Game of Skat Thomas Keller and Sebastian Kupferschmid University of Freiburg, Germany {tkeller, kupfersc}@informatik.uni-freiburg.de Abstract. In recent years, researchers started
More informationPlaying CHIP-8 Games with Reinforcement Learning
Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of
More informationCambridge University Bridge Club Beginners Lessons 2006 Lesson 2. The basics of Acol 1NT opening
Cambridge University Bridge Club Beginners Lessons 2006 Lesson 2. The basics of Acol 1NT opening Jonathan Cairns, jmc200@cam.ac.uk Introduction Last week we learnt Minibridge - a simplified version of
More informationHeads-up Limit Texas Hold em Poker Agent
Heads-up Limit Texas Hold em Poker Agent Nattapoom Asavareongchai and Pin Pin Tea-mangkornpan CS221 Final Project Report Abstract Our project aims to create an agent that is able to play heads-up limit
More informationAn Artificially Intelligent Ludo Player
An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported
More informationLESSON 9. Negative Doubles. General Concepts. General Introduction. Group Activities. Sample Deals
LESSON 9 Negative Doubles General Concepts General Introduction Group Activities Sample Deals 282 Defense in the 21st Century GENERAL CONCEPTS The Negative Double This lesson covers the use of the negative
More informationAn Empirical Evaluation of Policy Rollout for Clue
An Empirical Evaluation of Policy Rollout for Clue Eric Marshall Oregon State University M.S. Final Project marshaer@oregonstate.edu Adviser: Professor Alan Fern Abstract We model the popular board game
More informationCambridge University Bridge Club Beginners Lessons 2011 Lesson 1. Hand Evaluation and Minibridge
Cambridge University Bridge Club Beginners Lessons 2011 Lesson 1. Hand Evaluation and Minibridge Jonathan Cairns, jmc200@cam.ac.uk Welcome to Bridge Club! Over the next seven weeks you will learn to play
More informationCS221 Project Final Report Deep Q-Learning on Arcade Game Assault
CS221 Project Final Report Deep Q-Learning on Arcade Game Assault Fabian Chan (fabianc), Xueyuan Mei (xmei9), You Guan (you17) Joint-project with CS229 1 Introduction Atari 2600 Assault is a game environment
More informationCompeting for the Partscore. By Ron Klinger
Competing for the Partscore By Ron Klinger PARTSCORE COMPETITIVE BIDDING Jean-René Vernes article The Law of Total Tricks was published in June, 1969, in The Bridge World. It caused scarcely a ripple among
More informationWeek 1 Beginner s Course
Bridge v Whist Bridge is one of the family of Whist/Trump type games. It was developed from Whist mainly in the US - and shares a lot of its features. As Whist we play with a standard pack of 52 cards
More informationLesson 3. Takeout Doubles and Advances
Lesson 3 Takeout Doubles and Advances Lesson Three: Takeout Doubles and Advances Preparation On Each Table: At Registration Desk: Class Organization: Teacher Tools: BETTER BRIDGE GUIDE CARD (see Appendix);
More informationComparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage
Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage Richard Kelly and David Churchill Computer Science Faculty of Science Memorial University {richard.kelly, dchurchill}@mun.ca
More informationGame Playing for a Variant of Mancala Board Game (Pallanguzhi)
Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.
More informationCMS.608 / CMS.864 Game Design Spring 2008
MIT OpenCourseWare http://ocw.mit.edu / CMS.864 Game Design Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. DrawBridge Sharat Bhat My card
More informationLEARN HOW TO PLAY MINI-BRIDGE
MINI BRIDGE - WINTER 2016 - WEEK 1 LAST REVISED ON JANUARY 29, 2016 COPYRIGHT 2016 BY DAVID L. MARCH INTRODUCTION THE PLAYERS MiniBridge is a game for four players divided into two partnerships. The partners
More informationBridge Players: 4 Type: Trick-Taking Card rank: A K Q J Suit rank: NT (No Trumps) > (Spades) > (Hearts) > (Diamonds) > (Clubs)
Bridge Players: 4 Type: Trick-Taking Card rank: A K Q J 10 9 8 7 6 5 4 3 2 Suit rank: NT (No Trumps) > (Spades) > (Hearts) > (Diamonds) > (Clubs) Objective Following an auction players score points by
More informationPresents: Basic Card Play in Bridge
Presents: Basic Card Play in Bridge Bridge is played with the full standard deck of 52 cards. In this deck we have 4 Suits, and they are as follows: THE BASICS of CARD PLAY in BRIDGE Each Suit has 13 cards,
More informationMachine Learning of Bridge Bidding
Machine Learning of Bridge Bidding Dan Emmons January 23, 2009 Abstract The goal of this project is to create an effective machine bidder in the card game of bridge. Factors like partial information and
More informationUsing Neural Network and Monte-Carlo Tree Search to Play the Game TEN
Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN Weijie Chen Fall 2017 Weijie Chen Page 1 of 7 1. INTRODUCTION Game TEN The traditional game Tic-Tac-Toe enjoys people s favor. Moreover,
More informationLearning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi
Learning to Play like an Othello Master CS 229 Project Report December 13, 213 1 Abstract This project aims to train a machine to strategically play the game of Othello using machine learning. Prior to
More informationLearning from Hints: AI for Playing Threes
Learning from Hints: AI for Playing Threes Hao Sheng (haosheng), Chen Guo (cguo2) December 17, 2016 1 Introduction The highly addictive stochastic puzzle game Threes by Sirvo LLC. is Apple Game of the
More informationLesson 2 Minibridge. Defence
Lesson 2 Minibridge Defence Defence often requires you to take far less tricks than Declarer has contracted in order to beat the contract If declarer contracts to make game then all the defenders need
More informationActive and Passive leads. A passive lead has little or no risk attached to it. It means playing safe and waiting for declarer to go wrong.
Active and Passive leads What are they? A passive lead has little or no risk attached to it. It means playing safe and waiting for declarer to go wrong. An active lead is more risky. It involves trying
More informationGoogle DeepMind s AlphaGo vs. world Go champion Lee Sedol
Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides
More informationCS221 Project Final Report Gomoku Game Agent
CS221 Project Final Report Gomoku Game Agent Qiao Tan qtan@stanford.edu Xiaoti Hu xiaotihu@stanford.edu 1 Introduction Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally
More informationAr#ficial)Intelligence!!
Introduc*on! Ar#ficial)Intelligence!! Roman Barták Department of Theoretical Computer Science and Mathematical Logic So far we assumed a single-agent environment, but what if there are more agents and
More informationExperiments on Alternatives to Minimax
Experiments on Alternatives to Minimax Dana Nau University of Maryland Paul Purdom Indiana University April 23, 1993 Chun-Hung Tzeng Ball State University Abstract In the field of Artificial Intelligence,
More informationAN EVALUATION OF TWO ALTERNATIVES TO MINIMAX. Dana Nau 1 Computer Science Department University of Maryland College Park, MD 20742
Uncertainty in Artificial Intelligence L.N. Kanal and J.F. Lemmer (Editors) Elsevier Science Publishers B.V. (North-Holland), 1986 505 AN EVALUATION OF TWO ALTERNATIVES TO MINIMAX Dana Nau 1 University
More informationGame-playing: DeepBlue and AlphaGo
Game-playing: DeepBlue and AlphaGo Brief history of gameplaying frontiers 1990s: Othello world champions refuse to play computers 1994: Chinook defeats Checkers world champion 1997: DeepBlue defeats world
More informationReinforcement Learning in Games Autonomous Learning Systems Seminar
Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract
More informationCSC321 Lecture 23: Go
CSC321 Lecture 23: Go Roger Grosse Roger Grosse CSC321 Lecture 23: Go 1 / 21 Final Exam Friday, April 20, 9am-noon Last names A Y: Clara Benson Building (BN) 2N Last names Z: Clara Benson Building (BN)
More informationSUIT CONTRACTS - PART 1 (Major Suit Bidding Conversations)
BEGINNING BRIDGE - SPRING 2018 - WEEK 3 SUIT CONTRACTS - PART 1 (Major Suit Bidding Conversations) LAST REVISED ON APRIL 5, 2018 COPYRIGHT 2010-2018 BY DAVID L. MARCH BIDDING After opener makes a limiting
More informationQuestions #1 - #10 From Facebook Page A Teacher First
Questions #1 to #10 (from Facebook Page A Teacher First ) #1 Question - You are South. West is the dealer. N/S not vulnerable. E/W vulnerable. West passes. North (your partner) passes. East passes. Your
More informationCS 188: Artificial Intelligence Spring Game Playing in Practice
CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein UC Berkeley Game Playing in Practice Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994.
More information2048: An Autonomous Solver
2048: An Autonomous Solver Final Project in Introduction to Artificial Intelligence ABSTRACT. Our goal in this project was to create an automatic solver for the wellknown game 2048 and to analyze how different
More informationLESSON 3. Third-Hand Play. General Concepts. General Introduction. Group Activities. Sample Deals
LESSON 3 Third-Hand Play General Concepts General Introduction Group Activities Sample Deals 72 Defense in the 21st Century Defense Third-hand play General Concepts Third hand high When partner leads a
More informationGame Playing: Adversarial Search. Chapter 5
Game Playing: Adversarial Search Chapter 5 Outline Games Perfect play minimax search α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information Games vs. Search
More informationTEST YOUR BRIDGE TECHNIQUE
TEST YOUR BRIDGE TECHNIQUE David Bird Tim Bourke Q led Q J 10 6 4 A 6 K 8 7 J 5 4 A K 8 K Q A 9 4 3 2 7 6 3 HOW TO PLAY DECEPTIVELY In this book we look at deceptive play from the perspective of both declarer
More informationCOMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )
COMP3211 Project Artificial Intelligence for Tron game Group 7 Chiu Ka Wa (20369737) Chun Wai Wong (20265022) Ku Chun Kit (20123470) Abstract Tron is an old and popular game based on a movie of the same
More informationAI Learning Agent for the Game of Battleship
CS 221 Fall 2016 AI Learning Agent for the Game of Battleship Jordan Ebel (jebel) Kai Yee Wan (kaiw) Abstract This project implements a Battleship-playing agent that uses reinforcement learning to become
More informationComputer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta
Computer Go: from the Beginnings to AlphaGo Martin Müller, University of Alberta 2017 Outline of the Talk Game of Go Short history - Computer Go from the beginnings to AlphaGo The science behind AlphaGo
More information46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.
Foundations of Artificial Intelligence May 30, 2016 46. AlphaGo and Outlook Foundations of Artificial Intelligence 46. AlphaGo and Outlook Thomas Keller Universität Basel May 30, 2016 46.1 Introduction
More informationGame Mechanics Minesweeper is a game in which the player must correctly deduce the positions of
Table of Contents Game Mechanics...2 Game Play...3 Game Strategy...4 Truth...4 Contrapositive... 5 Exhaustion...6 Burnout...8 Game Difficulty... 10 Experiment One... 12 Experiment Two...14 Experiment Three...16
More informationSummer Camp Curriculum
Day 1: Introduction Summer Camp Curriculum While shuffling a deck of playing cards, announce to the class that today they will begin learning a game that is played with a set of cards like the one you
More informationLesson 2. Overcalls and Advances
Lesson 2 Overcalls and Advances Lesson Two: Overcalls and Advances Preparation On Each Table: At Registration Desk: Class Organization: Teacher Tools: BETTER BRIDGE GUIDE CARD (see Appendix); Bidding Boxes;
More informationLESSON 2. Objectives. General Concepts. General Introduction. Group Activities. Sample Deals
LESSON 2 Objectives General Concepts General Introduction Group Activities Sample Deals 38 Bidding in the 21st Century GENERAL CONCEPTS Bidding The purpose of opener s bid Opener is the describer and tries
More informationGame Design Verification using Reinforcement Learning
Game Design Verification using Reinforcement Learning Eirini Ntoutsi Dimitris Kalles AHEAD Relationship Mediators S.A., 65 Othonos-Amalias St, 262 21 Patras, Greece and Department of Computer Engineering
More informationLESSON 5. Rebids by Opener. General Concepts. General Introduction. Group Activities. Sample Deals
LESSON 5 Rebids by Opener General Concepts General Introduction Group Activities Sample Deals 88 Bidding in the 21st Century GENERAL CONCEPTS The Bidding Opener s rebid Opener s second bid gives responder
More informationReinforcement Learning Applied to a Game of Deceit
Reinforcement Learning Applied to a Game of Deceit Theory and Reinforcement Learning Hana Lee leehana@stanford.edu December 15, 2017 Figure 1: Skull and flower tiles from the game of Skull. 1 Introduction
More informationLESSON 5. Watching Out for Entries. General Concepts. General Introduction. Group Activities. Sample Deals
LESSON 5 Watching Out for Entries General Concepts General Introduction Group Activities Sample Deals 114 Lesson 5 Watching out for Entries GENERAL CONCEPTS Play of the Hand Entries Sure entries Creating
More informationOpener s Rebid when it is a Limit Bid
10 A K 10 7 4 3 A 3 2 7 3 2 J 7 4 3 Q 9 8 2 Q 6 5 J 9 2 10 8 5 K Q J 7 A K Q 10 9 A K 6 5 8 9 6 4 J 8 6 5 4 J 7 6 4 3 2 6 5 9 8 K Q 4 A K Q 5 10 K 9 2 Q J 10 7 4 3 6 5 4 A 3 2 J 8 6 A 7 3 9 8 A 8 K Q J
More informationHENRY FRANCIS (EDITOR-IN-CHIEF), THE OFFICIAL ENCYCLOPEDIA OF BRIDGE
As many as ten factors may influence a player s decision to overcall. In roughly descending order of importance, they are: Suit length Strength Vulnerability Level Suit Quality Obstruction Opponents skill
More informationCMS.608 / CMS.864 Game Design Spring 2008
MIT OpenCourseWare http://ocw.mit.edu CMS.608 / CMS.864 Game Design Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. The All-Trump Bridge Variant
More informationALAN TRUSCOTT BRIDGE EDITOR FOR THE NEW YORK TIMES
ALAN TRUSCOTT BRIDGE EDITOR FOR THE NEW YORK TIMES M A S T E R P O I N T P R E S S T O R O N T O 1987, 2004 AlanTruscott. All rights reserved. It is illegal to reproduce any portion of this material, except
More informationHow the bidding works, Opening 1NT Lesson 6
How the bidding works, Opening Lesson 6 New terms met in this lesson auction balanced bidding bidding box call contract denomination game bid grand slam no bid opener opening bid raise response responder
More informationLESSON 2. Developing Tricks Promotion and Length. General Concepts. General Introduction. Group Activities. Sample Deals
LESSON 2 Developing Tricks Promotion and Length General Concepts General Introduction Group Activities Sample Deals 40 Lesson 2 Developing Tricks Promotion and Length GENERAL CONCEPTS Play of the Hand
More informationADVERSARIAL SEARCH. Chapter 5
ADVERSARIAL SEARCH Chapter 5... every game of skill is susceptible of being played by an automaton. from Charles Babbage, The Life of a Philosopher, 1832. Outline Games Perfect play minimax decisions α
More informationDocumentation and Discussion
1 of 9 11/7/2007 1:21 AM ASSIGNMENT 2 SUBJECT CODE: CS 6300 SUBJECT: ARTIFICIAL INTELLIGENCE LEENA KORA EMAIL:leenak@cs.utah.edu Unid: u0527667 TEEKO GAME IMPLEMENTATION Documentation and Discussion 1.
More informationPoker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning
Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Nikolai Yakovenko NVidia ADLR Group -- Santa Clara CA Columbia University Deep Learning Seminar April 2017 Poker is a Turn-Based
More informationDefensive Signals. Attitude Signals
Defensive Signals Quite often, when I am defending, I would like to literally say to partner Partner, I have the setting tricks in spades. Please lead a spade. Of course, the rules of bridge forbid me
More informationBegin contract bridge with Ross Class Three. Bridge customs.
Begin contract bridge with Ross www.rossfcollins.com/bridge Class Three Bridge customs. Taking tricks. Tricks that are won should be placed in front of one of the partners, in order, face down, with separation
More informationLESSON 7. Interfering with Declarer. General Concepts. General Introduction. Group Activities. Sample Deals
LESSON 7 Interfering with Declarer General Concepts General Introduction Group Activities Sample Deals 214 Defense in the 21st Century General Concepts Defense Making it difficult for declarer to take
More informationSearch Algorithms for a Bridge Double Dummy Solver
Bo Haglund, Soren Hein DDS v2.8, 2014-11-18 Search Algorithms for a Bridge Double Dummy Solver This description is intended for anyone interested in the inner workings of a bridge double dummy solver (DDS).
More informationThe Exciting World of Bridge
The Exciting World of Bridge Welcome to the exciting world of Bridge, the greatest game in the world! These lessons will assume that you are familiar with trick taking games like Euchre and Hearts. If
More information5-Card Major Bidding Flipper
5-Card Major Bidding Flipper ADVANTAGES OF 5-CARD MAJORS 1. You do not need to rebid your major suit to indicate a 5-card holding. If you open 1 or 1 and partner does not raise, you do not feel the compulsion
More informationALL YOU SHOULD KNOW ABOUT REVOKES
E U R O P E AN B R I D G E L E A G U E 9 th EBL Main Tournament Directors Course 30 th January to 3 rd February 2013 Bad Honnef Germany ALL YOU SHOULD KNOW ABOUT REVOKES by Ton Kooijman - 2 All you should
More informationFoundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel
Foundations of AI 6. Adversarial Search Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard & Bernhard Nebel Contents Game Theory Board Games Minimax Search Alpha-Beta Search
More informationGLOSSARY OF BRIDGE TERMS
GLOSSARY OF BRIDGE TERMS Acol A bidding system popular in the UK. Balanced Hand A balanced hand has cards in all suits and does not have shortages (voids, singletons) and/or length in any one suit. More
More informationPass, Bid or Double Workshop
Pass, Bid or Double Workshop PASS, BID OR DOUBLE DETERMINING FACTORS In competitive auctions (both sides bidding), the make or break decision is whether or not to PASS, BID or DOUBLE? This Workshop is
More informationBRIDGE is a card game for four players, who sit down at a
THE TRICKS OF THE TRADE 1 Thetricksofthetrade In this section you will learn how tricks are won. It is essential reading for anyone who has not played a trick-taking game such as Euchre, Whist or Five
More informationMastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm
Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm by Silver et al Published by Google Deepmind Presented by Kira Selby Background u In March 2016, Deepmind s AlphaGo
More informationLESSON 3. Responses to 1NT Opening Bids. General Concepts. General Introduction. Group Activities. Sample Deals
LESSON 3 Responses to 1NT Opening Bids General Concepts General Introduction Group Activities Sample Deals 58 Bidding in the 21st Century GENERAL CONCEPTS Bidding The role of each player The opener is
More informationLESSON 2. Opening Leads Against Suit Contracts. General Concepts. General Introduction. Group Activities. Sample Deals
LESSON 2 Opening Leads Against Suit Contracts General Concepts General Introduction Group Activities Sample Deals 40 Defense in the 21st Century General Concepts Defense The opening lead against trump
More informationTD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play
NOTE Communicated by Richard Sutton TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play Gerald Tesauro IBM Thomas 1. Watson Research Center, I? 0. Box 704, Yorktozon Heights, NY 10598
More information5.4 Imperfect, Real-Time Decisions
5.4 Imperfect, Real-Time Decisions Searching through the whole (pruned) game tree is too inefficient for any realistic game Moves must be made in a reasonable amount of time One has to cut off the generation
More informationMonte Carlo Tree Search
Monte Carlo Tree Search 1 By the end, you will know Why we use Monte Carlo Search Trees The pros and cons of MCTS How it is applied to Super Mario Brothers and Alpha Go 2 Outline I. Pre-MCTS Algorithms
More informationPOINTS TO REMEMBER Planning when to draw trumps
Planning the Play of a Bridge Hand 6 POINTS TO REMEMBER Planning when to draw trumps The general rule is: Draw trumps immediately unless there is a good reason not to. When you are planning to ruff a loser
More informationBidding Over Opponent s 1NT Opening
Bidding Over Opponent s 1NT Opening A safe way to try to steal a hand. Printer friendly version Before You Start The ideas in this article require partnership agreement. If you like what you read, discuss
More informationAutomated Suicide: An Antichess Engine
Automated Suicide: An Antichess Engine Jim Andress and Prasanna Ramakrishnan 1 Introduction Antichess (also known as Suicide Chess or Loser s Chess) is a popular variant of chess where the objective of
More informationLESSON 3. Developing Tricks the Finesse. General Concepts. General Information. Group Activities. Sample Deals
LESSON 3 Developing Tricks the Finesse General Concepts General Information Group Activities Sample Deals 64 Lesson 3 Developing Tricks the Finesse Play of the Hand The finesse Leading toward the high
More informationCS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón
CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH Santiago Ontañón so367@drexel.edu Recall: Adversarial Search Idea: When there is only one agent in the world, we can solve problems using DFS, BFS, ID,
More informationAdversarial Search. CS 486/686: Introduction to Artificial Intelligence
Adversarial Search CS 486/686: Introduction to Artificial Intelligence 1 Introduction So far we have only been concerned with a single agent Today, we introduce an adversary! 2 Outline Games Minimax search
More informationArtificial Intelligence Adversarial Search
Artificial Intelligence Adversarial Search Adversarial Search Adversarial search problems games They occur in multiagent competitive environments There is an opponent we can t control planning again us!
More informationUnit-III Chap-II Adversarial Search. Created by: Ashish Shah 1
Unit-III Chap-II Adversarial Search Created by: Ashish Shah 1 Alpha beta Pruning In case of standard ALPHA BETA PRUNING minimax tree, it returns the same move as minimax would, but prunes away branches
More informationLESSON 9. Jacoby Transfers. General Concepts. General Introduction. Group Activities. Sample Deals
LESSON 9 Jacoby Transfers General Concepts General Introduction Group Activities Sample Deals 226 Lesson 9 Jacoby Transfers General Concepts This chapter covers the use of the Jacoby transfer for the major
More informationGeneralized Game Trees
Generalized Game Trees Richard E. Korf Computer Science Department University of California, Los Angeles Los Angeles, Ca. 90024 Abstract We consider two generalizations of the standard two-player game
More information