Learning to Play Love Letter with Deep Reinforcement Learning

Size: px
Start display at page:

Download "Learning to Play Love Letter with Deep Reinforcement Learning"

Transcription

1 Learning to Play Love Letter with Deep Reinforcement Learning Madeleine D. Dawson* MIT Robert X. Liang* MIT Alexander M. Turner* MIT Abstract Recent advancements in artificial intelligence have shown dramatic success in playing games. In particular, the application of deep learning algorithms to reinforcement learning has enabled substantially better performance. Most notably, DeepMind s AlphaGo was able to defeat the world s best human Go players. [1] These algorithms found strategies with minimal knowledge of the domain by learning to play from experience. The majority of recent advancements in reinforcement learning have been in single-agent environments or in twoplayer environments with perfect information. [1] In this paper, we investigate the performance of a similar approach on Love Letter, a multiplayer incomplete information card game. We will use a modified version of Neural Fictitious Selfplay [2] to train competitive agents in this environment, avoiding the need for Nash equilibrium computations. We additionally investigate the feasibility of not only learning strategies, but also the game rules themselves. We propose a method of initial training using random play to aid this process. * The authors contributed equally to this work. 1 Introduction In the past few years, neural networks have become very popular due to their breakthrough success in complex, non-linear tasks like image classification and speech recognition. This was in large part due to the use of graphics processing units (GPUs), which allow the training of much larger models than previously possible. [3] The technique is known as deep learning because of the use of multiple layers that can generate increasingly abstract features of the input. Neural networks have also been used for playing games, with substantial success. Artificial intelligence has achieved expert-level play in Backgammon, Chess, and Go. [2] In a widely publicized series, DeepMind s AlphaGo was able to defeat the world s best human players of Go. [1] Aditionally, OpenAI has trained bots to play Atari games and to perform motor control. [4] These algorithms learned how to perform these tasks with minimal knowledge of the domain of interest by learning to play from experience. The fundamental algorithms used are standard reinforcement learning algorithms, but by adapting them to take advantage of deep learning, they have solved many classes of learning tasks. In Deep Reinforcement Learning from Self-Play in Imperfect- Information Games, Heinrich and Silver use competing neural networks to approach a Nash equilibrium through self-play of Leduc poker. [2]

2 In this paper, we investigate the performance of these algorithms on Love Letter, a multiplayer incomplete information card game. This is a challenging domain because the majority of recent advancements in reinforcement learning have been in single-agent environments or in two-player environments with perfect information. [2] Previous approaches to multiplayer incomplete information games have focused on computing the Nash equilibrium strategy. [2] We will show that it is still possible to train agents that are competitive in this environment. Furthermore, we investigate the ability of agents to not only learn strategies without domain knowledge, but even to learn the game rules themselves. We propose a novel method for initially training agents using a random player. This initialization not only encourages the action-value network to learn which moves are illegal but also may allow for more efficient development of agents, especially for games with complex rules. We show that our player can implictly learn the rules of Love Letter using this method. 2 The Love Letter Game Environment In Love Letter, players use a deck of cards numbered 1 to 8 and take turns until either the deck runs out or there is only one player left. [5] To begin, one secret card is removed from the deck and placed face-down. Each player is dealt one card, also face-down. Players may observe their own hand but not others. This setup results in an incomplete information game. On a player s turn, s/he draws a card and then chooses to play one of the two cards in their hand, triggering the chosen card s effect. This effect is determined by the card s number. After this effect is processed, play proceeds to the next player. The game continues until the deck runs out or only one player remains. The cards have the following numbers, names, and effects: 1. (Guard) The active player chooses another player and guesses a type of card other than a 1 (Guard). If the chosen player has the card guessed in their hand, the chosen player is eliminated. 2. (Priest) The active player chooses another player, and the chosen player privately shows their hand to the active player. 3. (Baron) The active player chooses another player. The two players privately compare their hands. The player with the lower number card is eliminated and reveals their hand publicly. If there is a tie, no player is eliminated and no cards are revealed. 4. (Handmaid) Until the currently active player s next turn, that player cannot be targeted by any cards effects. 5. (Prince) The active player chooses any player and forces them to discard their hand. If the discarded hand is 8 (Princess) then the chosen player is immediately eliminated. If the chosen player is not eliminated, they draw a new card immediately. If no cards are left in the deck, the chosen player picks up the secret card (set aside at the beginning of the game) and the game ends. 6. (King) The active player chooses another player. The active and chosen players switch their hands without revealing their cards to other players. 7. (Countess) The active player must discard this card if the other card in their hand is a 5 (Prince) or 6 (King). 8. (Princess) If a player discards this card for any reason, they are eliminated.

3 We used Python [6] to write our game engine and players, using the Numpy [7] library extensively. We used Keras [8] to build the action-value network and Tensorflow [9] to build the average-policy network. Our random player, used for initialization and testing, chooses a random, legal move and can optionally choose a random, illegal move with some small probability. For our trained AI agent, we allowed it to return any action at each point, meaning it could choose to discard a card it didn t possess. In cases where an agent performed an illegal action, it was immediately eliminated. 3 Methods 3.1 Q-learning Q-learning is a model-free form of reinforcement learning, where a network is trained on action-value pairs. [10] A state s t and action a t are given as input and a value v t is predicted. Typically these values are an expected reward for executing a t in state s t. If R is the reward function, and λ is the weight used to discount future rewards compared to current rewards, Q aims to learn the following function, i.e. the expected total of discounted rewards: Q(s t, a t ) = E[R t + λr t+1 + λ 2 R t+2 + ] The network thus learns a greedy policy where the action of highest expected reward is taken at every step. Traditional Q-learning uses the following as its update step: Q(s t, a t ) (1 α) Q(s t, a t ) + α (r t + λ max a Q(s t+1, a )) Here, Q(s, a) is the prediction of total discounted reward, and r t is the reward observed for taking the action a t at state s t. α is the learning rate. max a Q(s t+1, a ) is the estimate of discounted future reward. Deep Q-learning is a modified form of Q- learning that can be used instead of handcrafting features for high-dimensional states. [11] Instead of storing an estimated action-value pair for every possible state, a deep neural network is used to predict values. This has the potential to generalize estimates of the value for each state, even if that particular state might not have been encountered earlier. 3.2 Neural Fictitious Self-Play Fictitious play is a method of learning to play games by choosing the best response to previous opponents strategies. [2] Fictitious selfplay extends this concept to the agent playing against itself. Each agent stores all state transitions, actions and rewards earned in one memory. In another memory, it stores the agent s previous actions taken. We approach the Love Letter game using a modified form of Neural Fictitious Self-Play [2] where neural networks are also used to approximate the agent s historical chance of taking each action in each state. 3.3 Training The training procedure is outlined in Algorithm 1. This procedure is adapted from Heinrich and Silver s paper. [2] The rewards used in training were simply +1 on victory and -1 on loss or illegal action Initial Training Using Random Play We modify the original Neural Fictitious Self- Play procedure by adding initial training using a random-play policy. This random play policy plays any of the legal moves available to it with uniform probability. A small amount of the time (10%), it is also allowed to play an illegal move. This aims to help the actionvalue network learn which moves are legal as this knowledge is not provided a priori. If an agent plays an illegal move, it simply loses automatically.

4 Algorithm 1: Training Procedure Initialize game and the array of AI agents, each running TrainAgent() for training, first using random training then resetting M RL and using self-play. Function TrainAgent() Initialize memories M RL and M SL Initialize average-policy network, Π Initialize action-value network, Q Initialize target action-value network, Q T, from Q s weights for each game do Set agent s policy to SetPolicy() while game is not over do Observe state, s t, at current time t Perform action a t in game, according to the policy Observe reward r t+1 and following state s t+1 Store (s t, a t, r t+1, s t+1 ) in M RL if policy is ɛ-greedy then Store (s t, a t ) in M SL end if Train Π using SGD on cross entropy loss on M SL data Train Q using SGD on mean squared error loss, using target network Q T, on M RL data every k steps do Update Q T s weights to Q s end every end while end for end function Function SetPolicy() if in initial random training then Set policy to random player else Set policy to ɛ-greedy with probability η. Else, set to Π. end if end function While this random player did require an understanding of the game rules, the game framework does not require players to choose a legal move. Our goal was for the trained player to infer the rules of Love Letter through this random initialization Neural Network Architectures For the action-value (Q) network, two fullyconnected hidden layers of 256 units with ReLU activation were used. The output layer was sized according to the number of possible actions. No activation function was applied to this output. For the average-policy network, two fullyconnected hidden layers of 512 units with ReLU activation were used. The output layer was again sized according to the number of possible actions and softmax activation was used. We experimented briefly with differently sized architectures using half or double the number of units, or using an additional layer. We used these architecture sizes as we found that the training speed of larger architectures was too slow for our purposes and that smaller architectures did not achieve equally good performance Target Q Network The naive approach to training the deep Q network is the mean squared error loss between the predicted total discounted reward, Q(s t, a t ) and the observed reward plus the predicted future discounted reward if a greedy policy is followed, r t + λ max a Q(s t+1, a )). This, however, resulted in unstable predictions, with some weights in the Q network increasing or decreasing without bound. To train stably, used kept track of a target Q network, Q T, which was initialized as a copy of Q. Q was then trained on the mean squared error between Q(s t, a t ) and r t + λ max a Q T (s t+1, a )), i.e. the observed reward plus the target network s prediction of

5 future discounted reward if a greedy policy is followed. The network weights of Q T were then set to Q s periodically. We updated them every 10 game steps. This procedure damped the oscillations, resulting in stable training Memories and Batching Two memory stores were used to store past experience: M RL and M SL. M RL is used to train the action-value network and consists of past rewards observed. A circular buffer of size 2000 was used to store these. M SL is used to train the average-policy network and consists of all states and actions taken by the agent when following the ɛ-greedy policy, described below. The action-value network was trained using a batch size of 32. The average-policy network was trained using a batch size of 300. For each training step, that number of random memories were selected from the corresponding memory to form the batch Policies After initial training using the random-play policy described earlier, at the start of each game, the agent being trained chooses between two policies: ɛ-greedy or Π (average policy). The agent chooses ɛ-greedy with some small probability η. We used η = 0.1. An ɛ-greedy policy usually picks the action that Q predicts will maximize total discounted reward. Given a state as input, Q outputs a predicted total discounted reward for each possible action. The chosen action thus corresponds to argmax of the output. However, with probability ɛ, an agent following an ɛ-greedy policy will simply act randomly. We used an exponentially decaying ɛ that begins at 0.1 and is multiplied by every training step. Additionally, the decay of ɛ stops when it reaches the minimum allowed value of The agent thus chooses Π (average policy) with probability 1 η. In this case, the agent acts according to the average previous action taken in this state, as estimated by the average-policy network. Given a state as input, this network outputs an estimated historical probability of taking each action in that state. The softmax layer normalizes these probabilities so an action is simply randomly selected according to the probabilities outputted. After training, evaluation is performed solely using Π (average policy). The action selected specifies not only the card chosen but also (if relevant) the target of the card and (if the Guard is chosen) the card guessed. 4 Results 4.1 Game length The typical game length was also investigated as a function of the number of games played. This was investigated both for the initial training using random play and training using selfplay. The network was trained using 6000 games for each training stage. The change in game length over time is shown in Figure 1. During initial random-play training, the average game length remained roughly constant. This is expected as the policy is unchanged. After the game proceeds to self-play training, gameplay is initially almost random (with illegal moves allowed). This is due to the average-policy network, which is used the majority of the time, not having yet started training. The average game length is thus very short as many invalid moves are played, causing the training agent to rapidly lose. However, this performance rapidly improves as the average-policy network is trained. It is trained on the saved ɛ-greedy policy actions based off the predictions of the action-value network, which had already been trained on random play. As the number of state transitions memorized increases and the

6 Figure 1: Variation of typical game length with additional training. The vertical axis represents the game length, smoothed using an exponential moving average (α = 0.99). The horizontal axis represents the number of games used to train the agent. The agent is first trained using a random-play policy and game length is constant. Then, as it is trained using self-play, initially game lengths are much shorter, but rapidly improve to a roughly constant value. average network is further trained, there is a very rapid increase as the network learns not to play invalid moves. After a few thousand games, this reaches a roughly constant state. 4.2 Illegal moves To further understand the rate at which the agent learns to not perform illegal moves, we investigated the how this rate varies with additional training. The network was first trained using 5000 games of random play. After each hundred rounds of training, the agent was evaluated by playing 1000 rounds against itself (without the actions being saved or the agent trained). The number of illegal moves the agent made was then saved for each 100 games of training up to 5000 games. The results are shown in Figure 2. Initially the number of illegal moves was very high, this rapidly reduces over the first 1000 games played. After this, the rate of Figure 2: The rate of illegal moves made by the trained agent with varying amounts of self-play training. In all cases, the agent was first trained using 5000 rounds of random play. The horizontal axis is the number of rounds of self-play training performed (in 100s). The vertical axis is the number of illegal moves observed per 1000 games played. The rate of decrease is initially very high and then slows down. However, even after 5000 rounds of self-play training, the agent continues to improve. improvement slowed, but the rate of illegal actions continued to reduce even up to 5000 games, albeit somewhat noisily. This implies that additional training may be able to reduce this rate further. After 5000 rounds of training, the trained player only lost 5% of games by making an illegal move. Note that, as the game lengths become substantially longer after additional training, the rate of illegal moves is much lower than 5% of moves. 4.3 Win rate vs random player We tested our trained player against a player that makes uniformly random valid moves. Before the trained player has learned not to make illegal actions, its win rate is very low it almost instantly loses every time by making an illegal action. As the player trains and learns not to break the rules, the win rate quickly rises and then levels out around 65% after 3000 rounds of training. This trend is

7 Figure 3: The win rate of the trained agent against a random agent constrained to legal moves. In all cases, the agent was first trained using 5000 rounds of random play. The horizontal axis is the number of rounds of selfplay training performed (in 100s). The vertical axis is the win rate of the trained player observed over 1000 games. Initially, the trained agent lost the vast majority of games due to a high number of illegal actions. With additional training, it learns to play mostly only valid modes and the win rate rises to a roughly constant value of around 65%. shown in Figure 3. There is a large degree of inherent randomness involved in Love Letter games, which means that no player will be to win all of the time. Additionally, in Love Letter, the number of valid actions is relatively constrained (due to the two cards present in the active player s hand), and good strategies involve substantial random play. Nevertheless, it appears likely that the optimal player would win more than two-thirds of matches against a random player. It seems possible we have reached this limit as constraining the agent in both training and evaluation to only play valid moves results in a similar accuracy of 65%. Because of this randomness, Love Letter in the two player setting requires winning 7 matches before winning the game. With a 65% winrate per game or round, this means our random player would win in 87% of matches versus the random player. It is also possible that the levelling out seen in Figure 3 is not due to the agent reaching a fundamental limit, but due to the averagepolicy network fitting well to the M SL memory. With additional training, the action-value network could learn more complicated relationships between the state and expected total discounted reward, resulting in more complicated strategies. 5 Contributions This marks the first time that self-play with Q learning has been applied to the incompleteinformation game Love Letter. We demonstrate the applicability of Neural Fictitious Self-play to this game. Further, we investigate the feasibility of not only learning a strategy, but the game rules themselves through this process without a priori knowledge. We propose initially using random-play training to encourage the network to learn which moves are legal. Additionally, we have provided a new initialization algorithm for training reinforcement learning agents. By using records from a random player, our agent converged on a winning strategy much more quickly than if it were to have learned from scratch. Furthermore, we did not need many logs of high-level human play in order to train our agent, which was a method used to train the original AlphaGo player. Each member of the team contributed a roughly equal amount to the project. In the initial implementation, Madeleine Dawson wrote the initial game infrastructure implementation, Robert Liang wrote the actionvalue network implementation, and Alexander Turner wrote the average-policy network implementation. All authors then contributed to the integration of these different modules. When investigating the research questions, each student took on one research questions:

8 win rate vs random player, illegal moves and game length, respectively. Finally, in writing the milestones and the report, each student contributed writing for their section and all students contributed approximately equally in the other sections. 6 Future Work Future work in this domain includes further experimentation with initialization. It is possible that a player who learns from human strategy is better initialized than one who learns from random actions. For example, a canonical Love Letter play is to play a Guard against another player who used the King to swap hands in the previous turn. Another area for further work is to apply our novel initialization method combined with Neural Fictitious Self-Play to other twoplayer turn-based strategy games. For example, in Texas Hold em, does this reinforcement learning method teach itself Nashequilibrium strategies such as bluffing? 7 Conclusions First, we have demonstrated a new method of initializing neural networks in reinforcement learning. This method uses history generated by a random player to prepare players for standard Neural Fictitious Self-Play. Secondly, we have developed an AI agent for Love Letter using this method which can beat a player that plays random legal moves a majority of the time. This was developed using a very simple reward function and from no records of high-level play and was trained without any a priori knowledge of the game rules. Finally, our implementation has led to an environment on which future research into Love Letter and other imperfect information games can be easily developed. References [1] Silver, D. et. al. (2016). Mastering the game of Go with deep neural networks and tree search. In Nature. 529, pp doi: /nature [2] Heinrich, J. and Silver, D. (2016). Deep Reinforcement Learning from Self-Play in Imperfect- Information Games. arxiv: v2. [3] Raina, R., et al. Large-scale deep unsupervised learning using graphics processors. (2009). In Proc. 26th Annu. Int. Conf. Mach. Learn. - ICML 09, ACM Press, New York, New York, USA: pp. 18. doi: / [4] OpenAI. (2017). openai/atari-py Github repository. (accessed December 11, 2017). [5] Alderac Entertainment Group. Love letter rules. (2012). 12/09/Love Letter Rules Final.pdf (accessed December 11, 2017). [6] Python Software Foundation. Python Language Reference, version 3.6. Software available from [7] Van der Walt, S., et al. (2011). The NumPy Array: A Structure for Efficient Numerical Computation. In Computing in Science & Engineering. 13, pp dot: /mcse [8] Chollet, F. (2015). keras Github repository. (accessed December 11, 2017). [9] Abadi, M. et al. (2015). TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org. [10] Watkins, C. and Dayan, P. (1992). Q-Learning. Machine Learning, pp [11] Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. In Nature. 518, pp doi: /nature14236.

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm by Silver et al Published by Google Deepmind Presented by Kira Selby Background u In March 2016, Deepmind s AlphaGo

More information

It s Over 400: Cooperative reinforcement learning through self-play

It s Over 400: Cooperative reinforcement learning through self-play CIS 520 Spring 2018, Project Report It s Over 400: Cooperative reinforcement learning through self-play Team Members: Hadi Elzayn (PennKey: hads; Email: hads@sas.upenn.edu) Mohammad Fereydounian (PennKey:

More information

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Nikolai Yakovenko NVidia ADLR Group -- Santa Clara CA Columbia University Deep Learning Seminar April 2017 Poker is a Turn-Based

More information

arxiv: v1 [cs.lg] 30 Aug 2018

arxiv: v1 [cs.lg] 30 Aug 2018 Application of Self-Play Reinforcement Learning to a Four-Player Game of Imperfect Information Henry Charlesworth Centre for Complexity Science University of Warwick H.Charlesworth@warwick.ac.uk arxiv:1808.10442v1

More information

Using Fictitious Play to Find Pseudo-Optimal Solutions for Full-Scale Poker

Using Fictitious Play to Find Pseudo-Optimal Solutions for Full-Scale Poker Using Fictitious Play to Find Pseudo-Optimal Solutions for Full-Scale Poker William Dudziak Department of Computer Science, University of Akron Akron, Ohio 44325-4003 Abstract A pseudo-optimal solution

More information

Reinforcement Learning Applied to a Game of Deceit

Reinforcement Learning Applied to a Game of Deceit Reinforcement Learning Applied to a Game of Deceit Theory and Reinforcement Learning Hana Lee leehana@stanford.edu December 15, 2017 Figure 1: Skull and flower tiles from the game of Skull. 1 Introduction

More information

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu DeepStack: Expert-Level AI in Heads-Up No-Limit Poker Surya Prakash Chembrolu AI and Games AlphaGo Go Watson Jeopardy! DeepBlue -Chess Chinook -Checkers TD-Gammon -Backgammon Perfect Information Games

More information

Application of self-play deep reinforcement learning to Big 2, a four-player game of imperfect information

Application of self-play deep reinforcement learning to Big 2, a four-player game of imperfect information Application of self-play deep reinforcement learning to Big 2, a four-player game of imperfect information Henry Charlesworth Centre for Complexity Science University of Warwick, Coventry United Kingdom

More information

Decision Making in Multiplayer Environments Application in Backgammon Variants

Decision Making in Multiplayer Environments Application in Backgammon Variants Decision Making in Multiplayer Environments Application in Backgammon Variants PhD Thesis by Nikolaos Papahristou AI researcher Department of Applied Informatics Thessaloniki, Greece Contributions Expert

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

DeepMind Self-Learning Atari Agent

DeepMind Self-Learning Atari Agent DeepMind Self-Learning Atari Agent Human-level control through deep reinforcement learning Nature Vol 518, Feb 26, 2015 The Deep Mind of Demis Hassabis Backchannel / Medium.com interview with David Levy

More information

CS221 Final Project Report Learn to Play Texas hold em

CS221 Final Project Report Learn to Play Texas hold em CS221 Final Project Report Learn to Play Texas hold em Yixin Tang(yixint), Ruoyu Wang(rwang28), Chang Yue(changyue) 1 Introduction Texas hold em, one of the most popular poker games in casinos, is a variation

More information

Playing Atari Games with Deep Reinforcement Learning

Playing Atari Games with Deep Reinforcement Learning Playing Atari Games with Deep Reinforcement Learning 1 Playing Atari Games with Deep Reinforcement Learning Varsha Lalwani (varshajn@iitk.ac.in) Masare Akshay Sunil (amasare@iitk.ac.in) IIT Kanpur CS365A

More information

TUD Poker Challenge Reinforcement Learning with Imperfect Information

TUD Poker Challenge Reinforcement Learning with Imperfect Information TUD Poker Challenge 2008 Reinforcement Learning with Imperfect Information Outline Reinforcement Learning Perfect Information Imperfect Information Lagging Anchor Algorithm Matrix Form Extensive Form Poker

More information

CSC321 Lecture 23: Go

CSC321 Lecture 23: Go CSC321 Lecture 23: Go Roger Grosse Roger Grosse CSC321 Lecture 23: Go 1 / 21 Final Exam Friday, April 20, 9am-noon Last names A Y: Clara Benson Building (BN) 2N Last names Z: Clara Benson Building (BN)

More information

Creating a Poker Playing Program Using Evolutionary Computation

Creating a Poker Playing Program Using Evolutionary Computation Creating a Poker Playing Program Using Evolutionary Computation Simon Olsen and Rob LeGrand, Ph.D. Abstract Artificial intelligence is a rapidly expanding technology. We are surrounded by technology that

More information

Heads-up Limit Texas Hold em Poker Agent

Heads-up Limit Texas Hold em Poker Agent Heads-up Limit Texas Hold em Poker Agent Nattapoom Asavareongchai and Pin Pin Tea-mangkornpan CS221 Final Project Report Abstract Our project aims to create an agent that is able to play heads-up limit

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

Comp 3211 Final Project - Poker AI

Comp 3211 Final Project - Poker AI Comp 3211 Final Project - Poker AI Introduction Poker is a game played with a standard 52 card deck, usually with 4 to 8 players per game. During each hand of poker, players are dealt two cards and must

More information

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46. Foundations of Artificial Intelligence May 30, 2016 46. AlphaGo and Outlook Foundations of Artificial Intelligence 46. AlphaGo and Outlook Thomas Keller Universität Basel May 30, 2016 46.1 Introduction

More information

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault CS221 Project Final Report Deep Q-Learning on Arcade Game Assault Fabian Chan (fabianc), Xueyuan Mei (xmei9), You Guan (you17) Joint-project with CS229 1 Introduction Atari 2600 Assault is a game environment

More information

BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang

BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang Introduction BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang Texas Hold em Poker is considered the most popular variation of poker that is played widely

More information

Creating a New Angry Birds Competition Track

Creating a New Angry Birds Competition Track Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference Creating a New Angry Birds Competition Track Rohan Verma, Xiaoyu Ge, Jochen Renz Research School

More information

Computer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta

Computer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta Computer Go: from the Beginnings to AlphaGo Martin Müller, University of Alberta 2017 Outline of the Talk Game of Go Short history - Computer Go from the beginnings to AlphaGo The science behind AlphaGo

More information

CS221 Project Final Report Gomoku Game Agent

CS221 Project Final Report Gomoku Game Agent CS221 Project Final Report Gomoku Game Agent Qiao Tan qtan@stanford.edu Xiaoti Hu xiaotihu@stanford.edu 1 Introduction Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally

More information

arxiv: v1 [cs.lg] 30 May 2016

arxiv: v1 [cs.lg] 30 May 2016 Deep Reinforcement Learning Radio Control and Signal Detection with KeRLym, a Gym RL Agent Timothy J O Shea and T. Charles Clancy Virginia Polytechnic Institute and State University arxiv:1605.09221v1

More information

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment BLUFF WITH AI CS297 Report Presented to Dr. Chris Pollett Department of Computer Science San Jose State University In Partial Fulfillment Of the Requirements for the Class CS 297 By Tina Philip May 2017

More information

Andrei Behel AC-43И 1

Andrei Behel AC-43И 1 Andrei Behel AC-43И 1 History The game of Go originated in China more than 2,500 years ago. The rules of the game are simple: Players take turns to place black or white stones on a board, trying to capture

More information

Game-playing: DeepBlue and AlphaGo

Game-playing: DeepBlue and AlphaGo Game-playing: DeepBlue and AlphaGo Brief history of gameplaying frontiers 1990s: Othello world champions refuse to play computers 1994: Chinook defeats Checkers world champion 1997: DeepBlue defeats world

More information

CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9

CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9 CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9 Learning to play blackjack In this assignment, you will implement

More information

Optimal Rhode Island Hold em Poker

Optimal Rhode Island Hold em Poker Optimal Rhode Island Hold em Poker Andrew Gilpin and Tuomas Sandholm Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {gilpin,sandholm}@cs.cmu.edu Abstract Rhode Island Hold

More information

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions Slides by Svetlana Lazebnik, 9/2016 Modified by Mark Hasegawa Johnson, 9/2017 Types of game environments Perfect

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search 1 By the end, you will know Why we use Monte Carlo Search Trees The pros and cons of MCTS How it is applied to Super Mario Brothers and Alpha Go 2 Outline I. Pre-MCTS Algorithms

More information

Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN

Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN Weijie Chen Fall 2017 Weijie Chen Page 1 of 7 1. INTRODUCTION Game TEN The traditional game Tic-Tac-Toe enjoys people s favor. Moreover,

More information

Machine Learning Othello Project

Machine Learning Othello Project Machine Learning Othello Project Tom Barry The assignment. We have been provided with a genetic programming framework written in Java and an intelligent Othello player( EDGAR ) as well a random player.

More information

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi Learning to Play like an Othello Master CS 229 Project Report December 13, 213 1 Abstract This project aims to train a machine to strategically play the game of Othello using machine learning. Prior to

More information

MyPawns OppPawns MyKings OppKings MyThreatened OppThreatened MyWins OppWins Draws

MyPawns OppPawns MyKings OppKings MyThreatened OppThreatened MyWins OppWins Draws The Role of Opponent Skill Level in Automated Game Learning Ying Ge and Michael Hash Advisor: Dr. Mark Burge Armstrong Atlantic State University Savannah, Geogia USA 31419-1997 geying@drake.armstrong.edu

More information

Texas Hold em Inference Bot Proposal. By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005

Texas Hold em Inference Bot Proposal. By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005 Texas Hold em Inference Bot Proposal By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005 1 Introduction One of the key goals in Artificial Intelligence is to create cognitive systems that

More information

Game AI Challenges: Past, Present, and Future

Game AI Challenges: Past, Present, and Future Game AI Challenges: Past, Present, and Future Professor Michael Buro Computing Science, University of Alberta, Edmonton, Canada www.skatgame.net/cpcc2018.pdf 1/ 35 AI / ML Group @ University of Alberta

More information

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.

More information

CMSC 671 Project Report- Google AI Challenge: Planet Wars

CMSC 671 Project Report- Google AI Challenge: Planet Wars 1. Introduction Purpose The purpose of the project is to apply relevant AI techniques learned during the course with a view to develop an intelligent game playing bot for the game of Planet Wars. Planet

More information

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Programming an Othello AI Michael An (man4), Evan Liang (liange) Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black

More information

Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models

Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models Naoki Mizukami 1 and Yoshimasa Tsuruoka 1 1 The University of Tokyo 1 Introduction Imperfect information games are

More information

Extending the STRADA Framework to Design an AI for ORTS

Extending the STRADA Framework to Design an AI for ORTS Extending the STRADA Framework to Design an AI for ORTS Laurent Navarro and Vincent Corruble Laboratoire d Informatique de Paris 6 Université Pierre et Marie Curie (Paris 6) CNRS 4, Place Jussieu 75252

More information

An Empirical Evaluation of Policy Rollout for Clue

An Empirical Evaluation of Policy Rollout for Clue An Empirical Evaluation of Policy Rollout for Clue Eric Marshall Oregon State University M.S. Final Project marshaer@oregonstate.edu Adviser: Professor Alan Fern Abstract We model the popular board game

More information

BLUFF WITH AI. A Project. Presented to. The Faculty of the Department of Computer Science. San Jose State University. In Partial Fulfillment

BLUFF WITH AI. A Project. Presented to. The Faculty of the Department of Computer Science. San Jose State University. In Partial Fulfillment BLUFF WITH AI A Project Presented to The Faculty of the Department of Computer Science San Jose State University In Partial Fulfillment Of the Requirements for the Degree Master of Science By Tina Philip

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 6. Board Games Search Strategies for Games, Games with Chance, State of the Art Joschka Boedecker and Wolfram Burgard and Bernhard Nebel Albert-Ludwigs-Universität

More information

Game Design Verification using Reinforcement Learning

Game Design Verification using Reinforcement Learning Game Design Verification using Reinforcement Learning Eirini Ntoutsi Dimitris Kalles AHEAD Relationship Mediators S.A., 65 Othonos-Amalias St, 262 21 Patras, Greece and Department of Computer Engineering

More information

ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT

ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT PATRICK HALUPTZOK, XU MIAO Abstract. In this paper the development of a robot controller for Robocode is discussed.

More information

Fictitious Play applied on a simplified poker game

Fictitious Play applied on a simplified poker game Fictitious Play applied on a simplified poker game Ioannis Papadopoulos June 26, 2015 Abstract This paper investigates the application of fictitious play on a simplified 2-player poker game with the goal

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April AlphaZero

TTIC 31230, Fundamentals of Deep Learning David McAllester, April AlphaZero TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 AlphaZero 1 AlphaGo Fan (October 2015) AlphaGo Defeats Fan Hui, European Go Champion. 2 AlphaGo Lee (March 2016) 3 AlphaGo Zero vs.

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 6. Board Games Search Strategies for Games, Games with Chance, State of the Art Joschka Boedecker and Wolfram Burgard and Frank Hutter and Bernhard Nebel Albert-Ludwigs-Universität

More information

Success Stories of Deep RL. David Silver

Success Stories of Deep RL. David Silver Success Stories of Deep RL David Silver Reinforcement Learning (RL) RL is a general-purpose framework for decision-making An agent selects actions Its actions influence its future observations Success

More information

Tutorial of Reinforcement: A Special Focus on Q-Learning

Tutorial of Reinforcement: A Special Focus on Q-Learning Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model

More information

Creating a Dominion AI Using Genetic Algorithms

Creating a Dominion AI Using Genetic Algorithms Creating a Dominion AI Using Genetic Algorithms Abstract Mok Ming Foong Dominion is a deck-building card game. It allows for complex strategies, has an aspect of randomness in card drawing, and no obvious

More information

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH Santiago Ontañón so367@drexel.edu Recall: Adversarial Search Idea: When there is only one agent in the world, we can solve problems using DFS, BFS, ID,

More information

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence Adversarial Search CS 486/686: Introduction to Artificial Intelligence 1 Introduction So far we have only been concerned with a single agent Today, we introduce an adversary! 2 Outline Games Minimax search

More information

Applying Modern Reinforcement Learning to Play Video Games

Applying Modern Reinforcement Learning to Play Video Games THE CHINESE UNIVERSITY OF HONG KONG FINAL YEAR PROJECT REPORT (TERM 1) Applying Modern Reinforcement Learning to Play Video Games Author: Man Ho LEUNG Supervisor: Prof. LYU Rung Tsong Michael LYU1701 Department

More information

Exploitability and Game Theory Optimal Play in Poker

Exploitability and Game Theory Optimal Play in Poker Boletín de Matemáticas 0(0) 1 11 (2018) 1 Exploitability and Game Theory Optimal Play in Poker Jen (Jingyu) Li 1,a Abstract. When first learning to play poker, players are told to avoid betting outside

More information

Optimal Yahtzee performance in multi-player games

Optimal Yahtzee performance in multi-player games Optimal Yahtzee performance in multi-player games Andreas Serra aserra@kth.se Kai Widell Niigata kaiwn@kth.se April 12, 2013 Abstract Yahtzee is a game with a moderately large search space, dependent on

More information

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Applying Modern Reinforcement Learning to Play Video Games Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Outline Term 1 Review Term 2 Objectives Experiments & Results

More information

Approaching The Royal Game of Ur with Genetic Algorithms and ExpectiMax

Approaching The Royal Game of Ur with Genetic Algorithms and ExpectiMax Approaching The Royal Game of Ur with Genetic Algorithms and ExpectiMax Tang, Marco Kwan Ho (20306981) Tse, Wai Ho (20355528) Zhao, Vincent Ruidong (20233835) Yap, Alistair Yun Hee (20306450) Introduction

More information

Adversarial Search: Game Playing. Reading: Chapter

Adversarial Search: Game Playing. Reading: Chapter Adversarial Search: Game Playing Reading: Chapter 6.5-6.8 1 Games and AI Easy to represent, abstract, precise rules One of the first tasks undertaken by AI (since 1950) Better than humans in Othello and

More information

COMP219: Artificial Intelligence. Lecture 13: Game Playing

COMP219: Artificial Intelligence. Lecture 13: Game Playing CMP219: Artificial Intelligence Lecture 13: Game Playing 1 verview Last time Search with partial/no observations Belief states Incremental belief state search Determinism vs non-determinism Today We will

More information

Experiments with Tensor Flow Roman Weber (Geschäftsführer) Richard Schmid (Senior Consultant)

Experiments with Tensor Flow Roman Weber (Geschäftsführer) Richard Schmid (Senior Consultant) Experiments with Tensor Flow 23.05.2017 Roman Weber (Geschäftsführer) Richard Schmid (Senior Consultant) WEBGATE CONSULTING Gegründet Mitarbeiter CH Inhaber geführt IT Anbieter Partner 2001 Ex 29 Beratung

More information

Artificial Intelligence Adversarial Search

Artificial Intelligence Adversarial Search Artificial Intelligence Adversarial Search Adversarial Search Adversarial search problems games They occur in multiagent competitive environments There is an opponent we can t control planning again us!

More information

ECE 517: Reinforcement Learning in Artificial Intelligence

ECE 517: Reinforcement Learning in Artificial Intelligence ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: Case Studies and Gradient Policy October 29, 2015 Dr. Itamar Arel College of Engineering Department of Electrical Engineering and

More information

Automated Suicide: An Antichess Engine

Automated Suicide: An Antichess Engine Automated Suicide: An Antichess Engine Jim Andress and Prasanna Ramakrishnan 1 Introduction Antichess (also known as Suicide Chess or Loser s Chess) is a popular variant of chess where the objective of

More information

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS Thong B. Trinh, Anwer S. Bashi, Nikhil Deshpande Department of Electrical Engineering University of New Orleans New Orleans, LA 70148 Tel: (504) 280-7383 Fax:

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence selman@cs.cornell.edu Module: Adversarial Search R&N: Chapter 5 1 Outline Adversarial Search Optimal decisions Minimax α-β pruning Case study: Deep Blue

More information

Mastering the game of Go without human knowledge

Mastering the game of Go without human knowledge Mastering the game of Go without human knowledge David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton,

More information

Towards Strategic Kriegspiel Play with Opponent Modeling

Towards Strategic Kriegspiel Play with Opponent Modeling Towards Strategic Kriegspiel Play with Opponent Modeling Antonio Del Giudice and Piotr Gmytrasiewicz Department of Computer Science, University of Illinois at Chicago Chicago, IL, 60607-7053, USA E-mail:

More information

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess Stefan Lüttgen Motivation Learn to play chess Computer approach different than human one Humans search more selective: Kasparov (3-5

More information

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game Outline Game Playing ECE457 Applied Artificial Intelligence Fall 2007 Lecture #5 Types of games Playing a perfect game Minimax search Alpha-beta pruning Playing an imperfect game Real-time Imperfect information

More information

Game Playing: Adversarial Search. Chapter 5

Game Playing: Adversarial Search. Chapter 5 Game Playing: Adversarial Search Chapter 5 Outline Games Perfect play minimax search α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information Games vs. Search

More information

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence Adversarial Search CS 486/686: Introduction to Artificial Intelligence 1 AccessAbility Services Volunteer Notetaker Required Interested? Complete an online application using your WATIAM: https://york.accessiblelearning.com/uwaterloo/

More information

How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997)

How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997) How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997) Alan Fern School of Electrical Engineering and Computer Science Oregon State University Deep Mind s vs. Lee Sedol (2016) Watson vs. Ken

More information

Artificial Intelligence. Minimax and alpha-beta pruning

Artificial Intelligence. Minimax and alpha-beta pruning Artificial Intelligence Minimax and alpha-beta pruning In which we examine the problems that arise when we try to plan ahead to get the best result in a world that includes a hostile agent (other agent

More information

Computing Science (CMPUT) 496

Computing Science (CMPUT) 496 Computing Science (CMPUT) 496 Search, Knowledge, and Simulations Martin Müller Department of Computing Science University of Alberta mmueller@ualberta.ca Winter 2017 Part IV Knowledge 496 Today - Mar 9

More information

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Valentijn Muijrers 3275183 Valentijn.Muijrers@phil.uu.nl Supervisor: Gerard Vreeswijk 7,5 ECTS

More information

A. Rules of blackjack, representations, and playing blackjack

A. Rules of blackjack, representations, and playing blackjack CSCI 4150 Introduction to Artificial Intelligence, Fall 2005 Assignment 7 (140 points), out Monday November 21, due Thursday December 8 Learning to play blackjack In this assignment, you will implement

More information

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks 2015 IEEE Symposium Series on Computational Intelligence Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks Michiel van de Steeg Institute of Artificial Intelligence

More information

Data Biased Robust Counter Strategies

Data Biased Robust Counter Strategies Data Biased Robust Counter Strategies Michael Johanson johanson@cs.ualberta.ca Department of Computing Science University of Alberta Edmonton, Alberta, Canada Michael Bowling bowling@cs.ualberta.ca Department

More information

POKER AGENTS LD Miller & Adam Eck April 14 & 19, 2011

POKER AGENTS LD Miller & Adam Eck April 14 & 19, 2011 POKER AGENTS LD Miller & Adam Eck April 14 & 19, 2011 Motivation Classic environment properties of MAS Stochastic behavior (agents and environment) Incomplete information Uncertainty Application Examples

More information

Game Playing. Philipp Koehn. 29 September 2015

Game Playing. Philipp Koehn. 29 September 2015 Game Playing Philipp Koehn 29 September 2015 Outline 1 Games Perfect play minimax decisions α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information 2 games

More information

AI Approaches to Ultimate Tic-Tac-Toe

AI Approaches to Ultimate Tic-Tac-Toe AI Approaches to Ultimate Tic-Tac-Toe Eytan Lifshitz CS Department Hebrew University of Jerusalem, Israel David Tsurel CS Department Hebrew University of Jerusalem, Israel I. INTRODUCTION This report is

More information

BLUFF WITH AI. Advisor Dr. Christopher Pollett. By TINA PHILIP. Committee Members Dr. Philip Heller Dr. Robert Chun

BLUFF WITH AI. Advisor Dr. Christopher Pollett. By TINA PHILIP. Committee Members Dr. Philip Heller Dr. Robert Chun BLUFF WITH AI Advisor Dr. Christopher Pollett Committee Members Dr. Philip Heller Dr. Robert Chun By TINA PHILIP Agenda Project Goal Problem Statement Related Work Game Rules and Terminology Game Flow

More information

Using Artificial intelligent to solve the game of 2048

Using Artificial intelligent to solve the game of 2048 Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial

More information

Five-In-Row with Local Evaluation and Beam Search

Five-In-Row with Local Evaluation and Beam Search Five-In-Row with Local Evaluation and Beam Search Jiun-Hung Chen and Adrienne X. Wang jhchen@cs axwang@cs Abstract This report provides a brief overview of the game of five-in-row, also known as Go-Moku,

More information

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Davis Ancona and Jake Weiner Abstract In this report, we examine the plausibility of implementing a NEAT-based solution

More information

Artificial Intelligence Search III

Artificial Intelligence Search III Artificial Intelligence Search III Lecture 5 Content: Search III Quick Review on Lecture 4 Why Study Games? Game Playing as Search Special Characteristics of Game Playing Search Ingredients of 2-Person

More information

SDS PODCAST EPISODE 110 ALPHAGO ZERO

SDS PODCAST EPISODE 110 ALPHAGO ZERO SDS PODCAST EPISODE 110 ALPHAGO ZERO Show Notes: http://www.superdatascience.com/110 1 Kirill: This is episode number 110, AlphaGo Zero. Welcome back ladies and gentlemen to the SuperDataSceince podcast.

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Bart Selman Reinforcement Learning R&N Chapter 21 Note: in the next two parts of RL, some of the figure/section numbers refer to an earlier edition of R&N

More information