CS221 Project Final Report Deep Q-Learning on Arcade Game Assault

Size: px
Start display at page:

Download "CS221 Project Final Report Deep Q-Learning on Arcade Game Assault"

Transcription

1 CS221 Project Final Report Deep Q-Learning on Arcade Game Assault Fabian Chan (fabianc), Xueyuan Mei (xmei9), You Guan (you17) Joint-project with CS229 1 Introduction Atari 2600 Assault is a game environment provided on the OpenAI Gym platform; it is a top-down shoot em up game where the player gains reward points for destroying enemy ships. The enemy consists of a mothership and smaller vessels that shoot at the player. The player can move and shoot in various directions with a total of 7 actions available. Every time the player shoots, a heat meter keeps track of how hot the engine is; if the player shoots too frequently, the player can lose a life when the heat meter fills up due to overheating. The player can also lose a life upon taking fire from enemy ships. The game ends when the player runs out of lives. We create an AI agent that generates the optimal actions, taking raw pixels as features by feeding them into a convolutional neural network (CNN), also known as deep Q-learning. 2 Literature Review The first paper Playing Atari with Deep Reinforcement Learning [1] addresses how convolutional neural networks and deep reinforcement learning combine together to accomplish a high performance AI agent that play Atari games. The paper analyzes how Reinforcement learning (RL) provides a good solution to game playing problem and also the challenges in Deep Learning brought about by RL from the data representation perspective. The paper proposes deep reinforcement learning on game playing agents, which is similar to our goal. However, there are differences between our approaches and theirs. One of the differences is that the approach in this paper relies on heavy downsampling images before feeding them into a neural network. Our approach tries to avoid this downsampling procedure in an attempt to produce better data layers for deep Q-learning. The second related paper Deep Learning for Real-Time Atari Game Play Using Offline Monte- Carlo Tree Search Planning [2] proposes another solution to game playing. Compared to the first paper, which uses a model-free Q-learning strategy, this paper tackles the problem using a combination of Monte-Carlo techniques and deep Q-learning, ending up with a much more sophisticated algorithm that adds extra assumptions and considerable complexity. Though we choose the modelfree learning approach, this paper still provides us with insights about deep learning applied to games, such as the preprocessing of raw data and the architecture of convolutional neural networks. 3 Task Definition The goal is to create an AI controller that tries to maximize the total game score. To achieve this, the problem is broken down to finding a score for each action (the Q-value), given the current state of the game. After which the optimal action to take is the one that corresponds to the highest 1

2 Q-value. We note that the scope of this project is limited to the Assault game, and though it might work on other games with varying degrees of performance, we are not considering other games at this time. We will evaluate our agent by comparing the final scores achieved against with both the baseline and oracle. The baseline corresponds to the performance of a simple Q-learning with a simple feature extractor, with an average score of 670.7, while the oracle corresponds to the performance achieved by a top human player, with a score of 4153[3]. This is explained in more details in section Infrastructure OpenAI Gym is a platform for developing AI agents. It offers a Python API for interacting with the game, as well as provide the game environment represented as pixels along with the reward earned at every timestep. All this allows us to spend less time writing the game itself, and more time working on the AI agent. At each step, the agent takes an action, and it receives an observation and reward from the environment. An RL algorithm seeks to maximize some measure of the agent s total reward, as the agent interacts with the environment in an online learning or batch learning manner, where the agent takes the tuple (observation, reward, done) as an input on each timestep and either performs learning updates incrementally or collect for later use in a batch update. Our infrastructure can be thought of as a processing pipeline and can be summarized as follows: game observations are collected from OpenAI as raw pixel data, which are images of pixels with a 128-color palette. Then the data is processed and fed as inputs into a CNN; the outputs of the CNN are Q-values that correspond to the list of actions. While there are many software libraries available for implementing neural networks, the tool we have chosen to use is TensorFlow. TensorFlow is a popular open-source library used by various researchers and companies. With its flexible architecture and multidimensional data arrays, also called tensors, we can implement convolutional neural networks for reinforcement learning. The final note regarding our infrastructure is that we involved GPU computation. The computational task for the reinforcement learning process is so computationally heavy that we decided to leverage a GPU to perform the gradient-based updates on our neural network. Since TensorFlow supports computation on the GPU, with the help of NVIDIA s parallel computing API, CUDA, we gain a speed up of 2x-10x compared to relying on a CPU. 5 Approach 5.1 Modeling States and Actions Our model defines its states and actions in a fairly straightforward manner as follows: a state consists of pixel values of the game screen taken for a window of k consecutive frames. This is a numpy.array of shape (k, 250, 160, 3), where the second value x is the height of the screen, the third value y stands for the width of the screen, and the fourth represents the RGB dimension for each pixel at coordinate (x, y). We want our state definition to contain not simply the current frame but the last few frames because Assault is a dynamic game and a single frame is not enough to 2

3 determine the motion of various game entities (i.e. direction and velocity). Our assumption is that the most recent k frames will provide sufficient information to calculate the most optimal action(s) to take in the future. For this progress report we have chosen k to be 3. The set of actions are simply the 7 actions made available to the player: moving and shooting in various directions, as well as a do-nothing action. 5.2 Challenges and Methods to Address Them There are two major challenges we need to pay attention to in this project. First, to ensure reasonable progress, the ability to iterate quickly is crucial. In order to save training time and computing resources, we needed to simplify the state space and reduce the number of features before feeding them into our training algorithm. Second, because we expect it to take several days to train on a dataset, we need to ensure that it converges to a good local minimum (if not global) within a reasonable amount of time. Experience replay is a technique we ended up using to help with the convergence problem. Furthermore, a tradeoff needs to be made between exploration and exploitation to find a balance between running time and how well the result can be optimized. We decided to use simple Epsilon-Greedy exploration for this, and the major task in dealing with this challenge would involve finding suitable value(s) for epsilon. 5.3 Baseline and Oracle The baseline for our project is the performance of a simple Q-learning algorithm with a simple feature extractor. The reward for the algorithm is the score the player receives. The value of Q opt is calculated using w φ(s, a) where w is the weight vector, φ is the feature vector, s is the state and a is the action. For this simple algorithm, we set k = 1, discount factor γ = 1, ɛ = 0.3, and used a feature extractor that only indicates whether a pixel is black or not. We ran this simple Q-learning algorithm for 5 times, each for 3 hours. The average score over the 5 runs that this naive Q-learning approach was able to obtain is The oracle for our project is the performance achieved by a top human player, which is 4153 [3]. The baseline and oracle scores serve to give us a rough idea of how well the AI should perform. 6 Learning Algorithm 6.1 Q-Learning for Game Playing One common way to deal with the game playing problem is to assume a Markov Decision Process (MDP). This is appropriate for Assault because the enemy agents move randomly. An MDP is a model defined by a set of States, Actions, Transitions, and Rewards. In order to train an AI to tackle game playing tasks, reinforcement learning based on Q-learning is a popular choice. In Q-learning, the MDP recurrence is defined as follows: Q(s, a) = E s ɛ[r + γmax a Q (s, a ) s, a] (1) where a is the action it takes, s is the current state, r is the reward, and γ is the discount factor. Furthermore, we can use function approximation by parameterizing Q-value. In this way, we can easily adapt linear regression and gradient descent techniques from machine learning. With function 3

4 approximation, we can calculate the best weights by adapting the update rule: w w η[ ˆQ opt (s, a; w) (r + γ ˆV opt (s ))]Φ(s, a) (2) where w is a vector containing weights of each feature, and is initialized randomly to avoid getting into the same local optimum in every trial. 6.2 Convolutional Neural Networks as Function Approximators We decided to use deep Q-learning instead of ordinary Q-learning because there are too many possible game states. The size of one frame is 216 by 160 pixels, and for each pixel there are choices of RGB values. Furthermore, a sliding window of k frames leads to k possible states in total this is too large for ordinary Q-learning because it will result in too many rows in our imaginary Q-table. Therefore, we decide to use a neural network to learn these Q values instead. In effect, this neural network ends up operating as a function approximator. The network architecture is currently described as follows: Preprocess frames: convert RGB pixels to grayscale and threshold to black or white Input layer: takes in the preprocessed frames. Size: [k, 160, 250, 1] Hidden convolutional layer 1: kernel size [8, 8, k, 32], strides [1, 4, 4, 1] Max pooling layer 1: kernel size [1, 2, 2, 1], strides [1, 2, 2, 1] Hidden convolutional layer 2: kernel size [4, 4, 32, 64], strides [1, 2, 2, 1] Max pooling layer 2: kernel size [1, 2, 2, 1], strides [1, 2, 2, 1] Hidden convolutional layer 3: kernel size [3, 3, 64, 64], strides [1, 1, 1, 1] Max pooling layer 3: kernel size [1, 2, 2, 1], strides [1, 2, 2, 1] Resize the max pooling outputs to a vector of size [768] and feed to one fully connected layer Feed the outputs to a rectified linear activation function Collect the final output as 7 Q-values, each corresponding to an action Though we have 3 max-pooling layers involved in our network architecture, this is not what we had in mind at the beginning. Unlike CNN architectures typically used in computer vision tasks such as image classification, pooling layers in our architecture may not have been desirable for our purposes because we likely do not want to introduce translation invariance since the position of the game entities are important for estimating Q values. However, max-pooling serves as an adequate way to compress our large state space into a vector of size 768, and is the reason why we have been using them. One suggestion to replace these max-pooling layers is to make the strides larger in each hidden convolutional layer but given that the current strides already have substantial size, we have been hesitant to increase it any further. However, we are not dismissing it as a bad idea and would like to give it a try if we were given an additional month or two to evaluate. 4

5 6.3 Training Process The OpenAI Gym library provides game screen observations given as pixel values, which we use to construct as part of the state. We initialize the weights and biases for every layer of the neural network to random values taken from a Gaussian distribution centered around zero. Then at each time step, we take a minibatch of size 100 and feed it into the network for training. However, instead of starting training right away, our algorithm waits for 500 steps. During the first 500 steps, the act of choosing the next action is based on a uniformly random distribution. After the program observes enough with the randomly chosen actions, it starts to train and will have the ability to choose the next action based on the last state. Our loss function is: 6.4 Epsilon-Greedy Search L = 1 2 (r + γmax a Q(s, a ) Q(s, a)) 2 (3) Although it will have the ability to choose the next action based on the last state, it is worthy to note that a tradeoff needs to be made between exploration and exploitation as there is a balance between running time and how well the result can be optimized. We decide to use simple ɛ-greedy exploration for this, where ɛ is the probability that the player chooses a random action decreases as time goes on. This seems to work sufficiently for our purposes. However, we have elected to use a dynamic ɛ that is dependent on the timestep, where we linearly decrease ɛ from 0.8 to 0.05, annealed over 50,000 timesteps. 6.5 Experience Replay In general, deep neural networks are difficult to train. In the presence of multiple local optima, gradient descent may end up at a bad local minimum which will lead to poor performance. Initializing the network weights and biases to random values helps a little but is likely not sufficient. Learning directly from the newest observations is ineffective, due to the strong correlations among the most recent observations; randomizing the samples breaks these correlations and therefore reduces the variance of the updates. During training, the current parameters determine the next data sample that the parameters are trained on, and as a result unwanted feedback loops may occur and the parameters could converge and get stuck at a poor local optimum, or even possibly diverge. We incorporate a technique called experience replay to encourage the algorithm to find better optima instead of getting stuck at some underperforming local optimum. To be more specific, as we run a game session during training, all experiences < s, a, r, s > are stored in replay memory. During training, we take random samples from the replay memory instead of always grabbing the most recent transition. By breaking the similarity of subsequent training examples, this trick is likely to prevent the network from diving into some local minimum and will do so in an efficient manner [5]. By using experience replay, the behavior distribution is averaged over many of its previous states, smoothing out learning and avoiding divergence in the parameters. In our implementation we simply perform a uniform sample from the bank of observed states (the replay memory) to construct a minibatch of size 100 with which we train on each iteration. Storing all past experiences is impossible due to the humongous state space, so we simply retain the most recent 20,000 observations in the replay memory and sample from that. 5

6 7 Results and Analysis As preliminary evaluation, we first ran our algorithm (without experience replay) for 5 trials, each with a cutoff at the end of 36 hours, using number of consecutive frames k=3. The final scores for each trial is plotted on the top half of Figure 1. From the results we can see that some trials performed pretty badly, and are in fact no better than the baseline Q-learning algorithm. However, other trials performed significantly better than the baseline. We believe these differences in performance among the trials can be explained by the gradient descent approach that we used for training our deep neural network, which is characteristically vulnerable to getting stuck at some under-performing local optimum. We arrived at this explanation because each trial is initialized with random weights and biases, and these trials produce wildly different final scores, so mostly likely each of them ended up in a different local optimum. We then tweaked our model parameters (aka hyperparameters) such as ɛ the exploitation-exploration parameter, k the number of consecutive frames in consideration, and η the learning rate, in a manner similar to grid search. Since in our situation we do not have a dataset for which to divide into training, validation and testing sets for the reason that out training comes from operating a dynamic game, we simply repeated training on various values of ɛ, k and η and find the combination that gives the best scores. This is straightforward compared to the usual hyperparameter optimization process in machine learning. We found that setting ɛ to a dynamic one as described in section 6.4, k to 4, and η to 0.01 gives the best results overall, but we notice that in general varying the hyperparameters does not significantly influence the game agent s performance, so we did not devote too much time trying out different combinations. Figure 1: Comparison: performance with and without experience replay We obtained our final results by running the agent using the model weights computed at the end of training and noting down the final score repeated for a total of 5 trials, and instead of doing the cutoff based on time, we now terminate training after performing 50,000 iterations. We switched from time-based cutoff to iteration-based cutoff because the time it takes to train a model is mainly a function of the hardware used to train it. Furthermore, reporting the scores obtained from a 6

7 Figure 2: Average scores over training episodes certain number of episodes makes it more robust against the noise/stochasticity of the OpenAI gym environment. This time we also incorporated experience replay into the training procedure. The scores we obtained are plotted on the bottom half of Figure 1. Comparing these results with those above, we can see that experience replay produced results that are more consistent and stable. This is because this technique was able to effectively prevent the algorithm from getting stuck at some bad local optimum, and thus help achieve (slightly) higher and more consistent results. To aid with our analysis we plot the average score per 20 consecutive training episodes over one complete trial in Figure 2. We can see that at the beginning, there is a small amount of episodes during which the score stays low and did not improve. This corresponds to the starting period when we did not apply training and only allow the agent to choose actions randomly to gather observations. Then afterwards the score rises rapidly to around 600 because the game mode stays the same up to this point and so far it is quite easy to get there. After which it appears to get stuck at around 600 for over 1500 episodes, and this is because starting from this point, the game jumps in difficulty new enemy entities known as crawlers appear on the left and right sides of the agent, in addition to the enemy ships already hovering above. As if it encountered a roadblock, the agent was unable to make much progress past 600 for quite some time, but it did eventually learn to overcome this obstacle. So we conclude that even with this change of difficulty, our agent simply needed some time to adjust and continue to learn. From 600 onwards the agent improves at a rate slower than it did from the start to 600 because the game is no longer as easy as it was in the beginning. Finally, it saturated at around 1000 and this is roughly the final score it was able to achieve. 8 Conclusion In this project, we have implemented a game playing agent for Atari Assault using deep Q-learning. We first implemented ordinary Q-learning to obtain a baseline of score 670.7, then we implemented deep Q-learning by constructing a convolutional neural network using Tensorflow. We obtained promising results after experimenting with and without experience replay. For us, experience replay worked well in helping the agent avoid getting stuck at some bad lo- 7

8 cal optimum. Some of our improvements also came about by extending the training time and tweaking hyperparameters. Our deep Q-learning agent managed to significantly out perform the baseline: the average score it obtained was 980, while the ordinary Q-learning baseline has an average score of Although our agent does significantly better than the baseline, it still does not come close to the oracle. We believe that there are still many places we can try to improve, such as revising the neural network architecture, adding customized feature extractors, and experimenting with more fully connected layers. 9 References 1. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Wierstra, D., & Riedmiller, M. (2016). Playing Atari with Deep Reinforcement Learning. University of Toronto. 2. Guo, X., Singh, S., Lee, H., Lewis, R., & Wang, X. (n.d.). Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning. Retrieved 2016, from 3. Assault (1983, Bomb) - Atari Score (n.d.). Retrieved November 16, 2016, from yttto 4. Assault-v0. (n.d.). Retrieved from 5. Matiisen, B. T. (n.d.). Demystifying Deep Reinforcement Learning. Retrieved November 16, 2016, from 8

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

Playing Atari Games with Deep Reinforcement Learning

Playing Atari Games with Deep Reinforcement Learning Playing Atari Games with Deep Reinforcement Learning 1 Playing Atari Games with Deep Reinforcement Learning Varsha Lalwani (varshajn@iitk.ac.in) Masare Akshay Sunil (amasare@iitk.ac.in) IIT Kanpur CS365A

More information

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures

More information

Heads-up Limit Texas Hold em Poker Agent

Heads-up Limit Texas Hold em Poker Agent Heads-up Limit Texas Hold em Poker Agent Nattapoom Asavareongchai and Pin Pin Tea-mangkornpan CS221 Final Project Report Abstract Our project aims to create an agent that is able to play heads-up limit

More information

Learning from Hints: AI for Playing Threes

Learning from Hints: AI for Playing Threes Learning from Hints: AI for Playing Threes Hao Sheng (haosheng), Chen Guo (cguo2) December 17, 2016 1 Introduction The highly addictive stochastic puzzle game Threes by Sirvo LLC. is Apple Game of the

More information

Swing Copters AI. Monisha White and Nolan Walsh Fall 2015, CS229, Stanford University

Swing Copters AI. Monisha White and Nolan Walsh  Fall 2015, CS229, Stanford University Swing Copters AI Monisha White and Nolan Walsh mewhite@stanford.edu njwalsh@stanford.edu Fall 2015, CS229, Stanford University 1. Introduction For our project we created an autonomous player for the game

More information

DeepMind Self-Learning Atari Agent

DeepMind Self-Learning Atari Agent DeepMind Self-Learning Atari Agent Human-level control through deep reinforcement learning Nature Vol 518, Feb 26, 2015 The Deep Mind of Demis Hassabis Backchannel / Medium.com interview with David Levy

More information

an AI for Slither.io

an AI for Slither.io an AI for Slither.io Jackie Yang(jackiey) Introduction Game playing is a very interesting topic area in Artificial Intelligence today. Most of the recent emerging AI are for turn-based game, like the very

More information

arxiv: v1 [cs.lg] 30 May 2016

arxiv: v1 [cs.lg] 30 May 2016 Deep Reinforcement Learning Radio Control and Signal Detection with KeRLym, a Gym RL Agent Timothy J O Shea and T. Charles Clancy Virginia Polytechnic Institute and State University arxiv:1605.09221v1

More information

CS221 Final Project Report Learn to Play Texas hold em

CS221 Final Project Report Learn to Play Texas hold em CS221 Final Project Report Learn to Play Texas hold em Yixin Tang(yixint), Ruoyu Wang(rwang28), Chang Yue(changyue) 1 Introduction Texas hold em, one of the most popular poker games in casinos, is a variation

More information

Training a Minesweeper Solver

Training a Minesweeper Solver Training a Minesweeper Solver Luis Gardea, Griffin Koontz, Ryan Silva CS 229, Autumn 25 Abstract Minesweeper, a puzzle game introduced in the 96 s, requires spatial awareness and an ability to work with

More information

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni. Lesson 08 Convolutional Neural Network Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni Lesson 08 Convolution we will consider 2D convolution the result

More information

Research on Hand Gesture Recognition Using Convolutional Neural Network

Research on Hand Gesture Recognition Using Convolutional Neural Network Research on Hand Gesture Recognition Using Convolutional Neural Network Tian Zhaoyang a, Cheng Lee Lung b a Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China E-mail address:

More information

Deep Learning. Dr. Johan Hagelbäck.

Deep Learning. Dr. Johan Hagelbäck. Deep Learning Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Image Classification Image classification can be a difficult task Some of the challenges we have to face are: Viewpoint variation:

More information

AI Agents for Playing Tetris

AI Agents for Playing Tetris AI Agents for Playing Tetris Sang Goo Kang and Viet Vo Stanford University sanggookang@stanford.edu vtvo@stanford.edu Abstract Game playing has played a crucial role in the development and research of

More information

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Programming an Othello AI Michael An (man4), Evan Liang (liange) Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black

More information

CS221 Project Final Report Gomoku Game Agent

CS221 Project Final Report Gomoku Game Agent CS221 Project Final Report Gomoku Game Agent Qiao Tan qtan@stanford.edu Xiaoti Hu xiaotihu@stanford.edu 1 Introduction Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally

More information

Human Level Control in Halo Through Deep Reinforcement Learning

Human Level Control in Halo Through Deep Reinforcement Learning 1 Human Level Control in Halo Through Deep Reinforcement Learning Samuel Colbran, Vighnesh Sachidananda Abstract In this report, a reinforcement learning agent and environment for the game Halo: Combat

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

Learning to Play Love Letter with Deep Reinforcement Learning

Learning to Play Love Letter with Deep Reinforcement Learning Learning to Play Love Letter with Deep Reinforcement Learning Madeleine D. Dawson* MIT mdd@mit.edu Robert X. Liang* MIT xbliang@mit.edu Alexander M. Turner* MIT turneram@mit.edu Abstract Recent advancements

More information

BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang

BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang Introduction BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang Texas Hold em Poker is considered the most popular variation of poker that is played widely

More information

Playing Geometry Dash with Convolutional Neural Networks

Playing Geometry Dash with Convolutional Neural Networks Playing Geometry Dash with Convolutional Neural Networks Ted Li Stanford University CS231N tedli@cs.stanford.edu Sean Rafferty Stanford University CS231N CS231A seanraff@cs.stanford.edu Abstract The recent

More information

Tutorial of Reinforcement: A Special Focus on Q-Learning

Tutorial of Reinforcement: A Special Focus on Q-Learning Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model

More information

Using Artificial intelligent to solve the game of 2048

Using Artificial intelligent to solve the game of 2048 Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial

More information

Playing FPS Games with Deep Reinforcement Learning

Playing FPS Games with Deep Reinforcement Learning Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Playing FPS Games with Deep Reinforcement Learning Guillaume Lample, Devendra Singh Chaplot {glample,chaplot}@cs.cmu.edu

More information

Applying Modern Reinforcement Learning to Play Video Games

Applying Modern Reinforcement Learning to Play Video Games THE CHINESE UNIVERSITY OF HONG KONG FINAL YEAR PROJECT REPORT (TERM 1) Applying Modern Reinforcement Learning to Play Video Games Author: Man Ho LEUNG Supervisor: Prof. LYU Rung Tsong Michael LYU1701 Department

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Deep Learning Barnabás Póczos Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2

More information

Mutliplayer Snake AI

Mutliplayer Snake AI Mutliplayer Snake AI CS221 Project Final Report Felix CREVIER, Sebastien DUBOIS, Sebastien LEVY 12/16/2016 Abstract This project is focused on the implementation of AI strategies for a tailor-made game

More information

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Nikolai Yakovenko NVidia ADLR Group -- Santa Clara CA Columbia University Deep Learning Seminar April 2017 Poker is a Turn-Based

More information

An Empirical Evaluation of Policy Rollout for Clue

An Empirical Evaluation of Policy Rollout for Clue An Empirical Evaluation of Policy Rollout for Clue Eric Marshall Oregon State University M.S. Final Project marshaer@oregonstate.edu Adviser: Professor Alan Fern Abstract We model the popular board game

More information

CS221 Project Final Report Automatic Flappy Bird Player

CS221 Project Final Report Automatic Flappy Bird Player 1 CS221 Project Final Report Automatic Flappy Bird Player Minh-An Quinn, Guilherme Reis Introduction Flappy Bird is a notoriously difficult and addicting game - so much so that its creator even removed

More information

Deep RL For Starcraft II

Deep RL For Starcraft II Deep RL For Starcraft II Andrew G. Chang agchang1@stanford.edu Abstract Games have proven to be a challenging yet fruitful domain for reinforcement learning. One of the main areas that AI agents have surpassed

More information

AI Learning Agent for the Game of Battleship

AI Learning Agent for the Game of Battleship CS 221 Fall 2016 AI Learning Agent for the Game of Battleship Jordan Ebel (jebel) Kai Yee Wan (kaiw) Abstract This project implements a Battleship-playing agent that uses reinforcement learning to become

More information

arxiv: v1 [cs.lg] 30 Aug 2018

arxiv: v1 [cs.lg] 30 Aug 2018 Application of Self-Play Reinforcement Learning to a Four-Player Game of Imperfect Information Henry Charlesworth Centre for Complexity Science University of Warwick H.Charlesworth@warwick.ac.uk arxiv:1808.10442v1

More information

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:

More information

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.

More information

Deep Learning for Autonomous Driving

Deep Learning for Autonomous Driving Deep Learning for Autonomous Driving Shai Shalev-Shwartz Mobileye IMVC dimension, March, 2016 S. Shalev-Shwartz is also affiliated with The Hebrew University Shai Shalev-Shwartz (MobilEye) DL for Autonomous

More information

Monte Carlo based battleship agent

Monte Carlo based battleship agent Monte Carlo based battleship agent Written by: Omer Haber, 313302010; Dror Sharf, 315357319 Introduction The game of battleship is a guessing game for two players which has been around for almost a century.

More information

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Applying Modern Reinforcement Learning to Play Video Games Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Outline Term 1 Review Term 2 Objectives Experiments & Results

More information

Department of Computer Science and Engineering. The Chinese University of Hong Kong. Final Year Project Report LYU1601

Department of Computer Science and Engineering. The Chinese University of Hong Kong. Final Year Project Report LYU1601 Department of Computer Science and Engineering The Chinese University of Hong Kong 2016 2017 LYU1601 Intelligent Non-Player Character with Deep Learning Prepared by ZHANG Haoze Supervised by Prof. Michael

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL

VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL Doron Sobol 1, Lior Wolf 1,2 & Yaniv Taigman 2 1 School of Computer Science, Tel-Aviv University 2 Facebook AI Research ABSTRACT

More information

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Davis Ancona and Jake Weiner Abstract In this report, we examine the plausibility of implementing a NEAT-based solution

More information

Approaching The Royal Game of Ur with Genetic Algorithms and ExpectiMax

Approaching The Royal Game of Ur with Genetic Algorithms and ExpectiMax Approaching The Royal Game of Ur with Genetic Algorithms and ExpectiMax Tang, Marco Kwan Ho (20306981) Tse, Wai Ho (20355528) Zhao, Vincent Ruidong (20233835) Yap, Alistair Yun Hee (20306450) Introduction

More information

CS221 Project Final Report Learning to play bridge

CS221 Project Final Report Learning to play bridge CS221 Project Final Report Learning to play bridge Conrad Grobler (conradg) and Jean-Paul Schmetz (jschmetz) Autumn 2016 1 Introduction We investigated the use of machine learning in bridge playing. Bridge

More information

Learning and Using Models of Kicking Motions for Legged Robots

Learning and Using Models of Kicking Motions for Legged Robots Learning and Using Models of Kicking Motions for Legged Robots Sonia Chernova and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {soniac, mmv}@cs.cmu.edu Abstract

More information

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO Introduction to RNNs for NLP SHANG GAO About Me PhD student in the Data Science and Engineering program Took Deep Learning last year Work in the Biomedical Sciences, Engineering, and Computing group at

More information

It s Over 400: Cooperative reinforcement learning through self-play

It s Over 400: Cooperative reinforcement learning through self-play CIS 520 Spring 2018, Project Report It s Over 400: Cooperative reinforcement learning through self-play Team Members: Hadi Elzayn (PennKey: hads; Email: hads@sas.upenn.edu) Mohammad Fereydounian (PennKey:

More information

Augmenting Self-Learning In Chess Through Expert Imitation

Augmenting Self-Learning In Chess Through Expert Imitation Augmenting Self-Learning In Chess Through Expert Imitation Michael Xie Department of Computer Science Stanford University Stanford, CA 94305 xie@cs.stanford.edu Gene Lewis Department of Computer Science

More information

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) AI Plays 2048 Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) Abstract The strategy game 2048 gained great popularity quickly. Although it is easy to play, people cannot win the game easily,

More information

GPU ACCELERATED DEEP LEARNING WITH CUDNN

GPU ACCELERATED DEEP LEARNING WITH CUDNN GPU ACCELERATED DEEP LEARNING WITH CUDNN Larry Brown Ph.D. March 2015 AGENDA 1 Introducing cudnn and GPUs 2 Deep Learning Context 3 cudnn V2 4 Using cudnn 2 Introducing cudnn and GPUs 3 HOW GPU ACCELERATION

More information

Deep Reinforcement Learning and Forward Modeling for StarCraft AI

Deep Reinforcement Learning and Forward Modeling for StarCraft AI M2 Mathématiques, Vision et Apprentissage École Normale Supérieure de Cachan Deep Reinforcement Learning and Forward Modeling for StarCraft AI Internship Report Alex Auvolat Under the supervision of: Gabriel

More information

AI Agent for Ants vs. SomeBees: Final Report

AI Agent for Ants vs. SomeBees: Final Report CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 1 AI Agent for Ants vs. SomeBees: Final Report Wanyi Qian, Yundong Zhang, Xiaotong Duan Abstract This project aims to build a real-time game playing

More information

CSE 258 Winter 2017 Assigment 2 Skill Rating Prediction on Online Video Game

CSE 258 Winter 2017 Assigment 2 Skill Rating Prediction on Online Video Game ABSTRACT CSE 258 Winter 2017 Assigment 2 Skill Rating Prediction on Online Video Game In competitive online video game communities, it s common to find players complaining about getting skill rating lower

More information

Deep Imitation Learning for Playing Real Time Strategy Games

Deep Imitation Learning for Playing Real Time Strategy Games Deep Imitation Learning for Playing Real Time Strategy Games Jeffrey Barratt Stanford University 353 Serra Mall jbarratt@cs.stanford.edu Chuanbo Pan Stanford University 353 Serra Mall chuanbo@cs.stanford.edu

More information

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland An Introduction to Convolutional Neural Networks Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland Sources & Resources - Andrej Karpathy, CS231n http://cs231n.github.io/convolutional-networks/

More information

General Video Game AI: Learning from Screen Capture

General Video Game AI: Learning from Screen Capture General Video Game AI: Learning from Screen Capture Kamolwan Kunanusont University of Essex Colchester, UK Email: kkunan@essex.ac.uk Simon M. Lucas University of Essex Colchester, UK Email: sml@essex.ac.uk

More information

THE problem of automating the solving of

THE problem of automating the solving of CS231A FINAL PROJECT, JUNE 2016 1 Solving Large Jigsaw Puzzles L. Dery and C. Fufa Abstract This project attempts to reproduce the genetic algorithm in a paper entitled A Genetic Algorithm-Based Solver

More information

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Scott Watson, Andrew Vardy, Wolfgang Banzhaf Department of Computer Science Memorial University of Newfoundland St John s.

More information

Learning to Play Donkey Kong Using Neural Networks and Reinforcement Learning

Learning to Play Donkey Kong Using Neural Networks and Reinforcement Learning Learning to Play Donkey Kong Using Neural Networks and Reinforcement Learning Paul Ozkohen 1, Jelle Visser 1, Martijn van Otterlo 2, and Marco Wiering 1 1 University of Groningen, Groningen, The Netherlands,

More information

Lane Detection in Automotive

Lane Detection in Automotive Lane Detection in Automotive Contents Introduction... 2 Image Processing... 2 Reading an image... 3 RGB to Gray... 3 Mean and Gaussian filtering... 5 Defining our Region of Interest... 6 BirdsEyeView Transformation...

More information

Automatic Processing of Dance Dance Revolution

Automatic Processing of Dance Dance Revolution Automatic Processing of Dance Dance Revolution John Bauer December 12, 2008 1 Introduction 2 Training Data The video game Dance Dance Revolution is a musicbased game of timing. The game plays music and

More information

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu DeepStack: Expert-Level AI in Heads-Up No-Limit Poker Surya Prakash Chembrolu AI and Games AlphaGo Go Watson Jeopardy! DeepBlue -Chess Chinook -Checkers TD-Gammon -Backgammon Perfect Information Games

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

Alternation in the repeated Battle of the Sexes

Alternation in the repeated Battle of the Sexes Alternation in the repeated Battle of the Sexes Aaron Andalman & Charles Kemp 9.29, Spring 2004 MIT Abstract Traditional game-theoretic models consider only stage-game strategies. Alternation in the repeated

More information

Lecture 17 Convolutional Neural Networks

Lecture 17 Convolutional Neural Networks Lecture 17 Convolutional Neural Networks 30 March 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/22 Notes: Problem set 6 is online and due next Friday, April 8th Problem sets 7,8, and 9 will be due

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1

CS 188 Fall Introduction to Artificial Intelligence Midterm 1 CS 188 Fall 2018 Introduction to Artificial Intelligence Midterm 1 You have 120 minutes. The time will be projected at the front of the room. You may not leave during the last 10 minutes of the exam. Do

More information

Colorful Image Colorizations Supplementary Material

Colorful Image Colorizations Supplementary Material Colorful Image Colorizations Supplementary Material Richard Zhang, Phillip Isola, Alexei A. Efros {rich.zhang, isola, efros}@eecs.berkeley.edu University of California, Berkeley 1 Overview This document

More information

arxiv: v3 [cs.cv] 18 Dec 2018

arxiv: v3 [cs.cv] 18 Dec 2018 Video Colorization using CNNs and Keyframes extraction: An application in saving bandwidth Ankur Singh 1 Anurag Chanani 2 Harish Karnick 3 arxiv:1812.03858v3 [cs.cv] 18 Dec 2018 Abstract In this paper,

More information

Convolutional Networks Overview

Convolutional Networks Overview Convolutional Networks Overview Sargur Srihari 1 Topics Limitations of Conventional Neural Networks The convolution operation Convolutional Networks Pooling Convolutional Network Architecture Advantages

More information

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Emeric Stéphane Boigné eboigne@stanford.edu Jan Felix Heyse heyse@stanford.edu Abstract Scaling

More information

Hanabi : Playing Near-Optimally or Learning by Reinforcement?

Hanabi : Playing Near-Optimally or Learning by Reinforcement? Hanabi : Playing Near-Optimally or Learning by Reinforcement? Bruno Bouzy LIPADE Paris Descartes University Talk at Game AI Research Group Queen Mary University of London October 17, 2017 Outline The game

More information

A Deep Q-Learning Agent for the L-Game with Variable Batch Training

A Deep Q-Learning Agent for the L-Game with Variable Batch Training A Deep Q-Learning Agent for the L-Game with Variable Batch Training Petros Giannakopoulos and Yannis Cotronis National and Kapodistrian University of Athens - Dept of Informatics and Telecommunications

More information

Autocomplete Sketch Tool

Autocomplete Sketch Tool Autocomplete Sketch Tool Sam Seifert, Georgia Institute of Technology Advanced Computer Vision Spring 2016 I. ABSTRACT This work details an application that can be used for sketch auto-completion. Sketch

More information

2048: An Autonomous Solver

2048: An Autonomous Solver 2048: An Autonomous Solver Final Project in Introduction to Artificial Intelligence ABSTRACT. Our goal in this project was to create an automatic solver for the wellknown game 2048 and to analyze how different

More information

Reinforcement Learning Applied to a Game of Deceit

Reinforcement Learning Applied to a Game of Deceit Reinforcement Learning Applied to a Game of Deceit Theory and Reinforcement Learning Hana Lee leehana@stanford.edu December 15, 2017 Figure 1: Skull and flower tiles from the game of Skull. 1 Introduction

More information

Convolutional neural networks

Convolutional neural networks Convolutional neural networks Themes Curriculum: Ch 9.1, 9.2 and http://cs231n.github.io/convolutionalnetworks/ The simple motivation and idea How it s done Receptive field Pooling Dilated convolutions

More information

Optimal Yahtzee performance in multi-player games

Optimal Yahtzee performance in multi-player games Optimal Yahtzee performance in multi-player games Andreas Serra aserra@kth.se Kai Widell Niigata kaiwn@kth.se April 12, 2013 Abstract Yahtzee is a game with a moderately large search space, dependent on

More information

Generating an appropriate sound for a video using WaveNet.

Generating an appropriate sound for a video using WaveNet. Australian National University College of Engineering and Computer Science Master of Computing Generating an appropriate sound for a video using WaveNet. COMP 8715 Individual Computing Project Taku Ueki

More information

Deep Green. System for real-time tracking and playing the board game Reversi. Final Project Submitted by: Nadav Erell

Deep Green. System for real-time tracking and playing the board game Reversi. Final Project Submitted by: Nadav Erell Deep Green System for real-time tracking and playing the board game Reversi Final Project Submitted by: Nadav Erell Introduction to Computational and Biological Vision Department of Computer Science, Ben-Gurion

More information

An Iterative BP-CNN Architecture for Channel Decoding

An Iterative BP-CNN Architecture for Channel Decoding 1 An Iterative BP-CNN Architecture for Channel Decoding Fei Liang, Cong Shen, and Feng Wu arxiv:1707.05697v1 [stat.ml] 18 Jul 2017 Abstract Inspired by recent advances in deep learning, we propose a novel

More information

Experiments with Tensor Flow Roman Weber (Geschäftsführer) Richard Schmid (Senior Consultant)

Experiments with Tensor Flow Roman Weber (Geschäftsführer) Richard Schmid (Senior Consultant) Experiments with Tensor Flow 23.05.2017 Roman Weber (Geschäftsführer) Richard Schmid (Senior Consultant) WEBGATE CONSULTING Gegründet Mitarbeiter CH Inhaber geführt IT Anbieter Partner 2001 Ex 29 Beratung

More information

Transferring Deep Reinforcement Learning from a Game Engine Simulation for Robots

Transferring Deep Reinforcement Learning from a Game Engine Simulation for Robots Transferring Deep Reinforcement Learning from a Game Engine Simulation for Robots Christoffer Bredo Lillelund Msc in Medialogy Aalborg University CPH Clille13@student.aau.dk May 2018 Abstract Simulations

More information

Ojas Ahuja, Kevin Black CS314H 12 October 2018

Ojas Ahuja, Kevin Black CS314H 12 October 2018 Tetris Ojas Ahuja, Kevin Black CS314H 12 October 2018 1 Introduction We implement Tetris, a classic computer game in which a player must arrange variously-shaped falling pieces into rows on a 2D grid.

More information

USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER

USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER World Automation Congress 21 TSI Press. USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER Department of Computer Science Connecticut College New London, CT {ahubley,

More information

CS221 Project: Final Report Raiden AI Agent

CS221 Project: Final Report Raiden AI Agent CS221 Project: Final Report Raiden AI Agent Lu Bian lbian@stanford.edu Yiran Deng yrdeng@stanford.edu Xuandong Lei xuandong@stanford.edu 1 Introduction Raiden is a classic shooting game where the player

More information

Learning and Using Models of Kicking Motions for Legged Robots

Learning and Using Models of Kicking Motions for Legged Robots Learning and Using Models of Kicking Motions for Legged Robots Sonia Chernova and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {soniac, mmv}@cs.cmu.edu Abstract

More information

Image Manipulation Detection using Convolutional Neural Network

Image Manipulation Detection using Convolutional Neural Network Image Manipulation Detection using Convolutional Neural Network Dong-Hyun Kim 1 and Hae-Yeoun Lee 2,* 1 Graduate Student, 2 PhD, Professor 1,2 Department of Computer Software Engineering, Kumoh National

More information

ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT

ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT PATRICK HALUPTZOK, XU MIAO Abstract. In this paper the development of a robot controller for Robocode is discussed.

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 6. Convolutional Neural Networks (Some figures adapted from NNDL book) 1 Convolution Neural Networks 1. Convolutional Neural Networks Convolution,

More information

arxiv: v2 [cs.lg] 13 Nov 2015

arxiv: v2 [cs.lg] 13 Nov 2015 Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control Fangyi Zhang, Jürgen Leitner, Michael Milford, Ben Upcroft, Peter Corke ARC Centre of Excellence for Robotic Vision (ACRV) Queensland

More information

Lane Detection in Automotive

Lane Detection in Automotive Lane Detection in Automotive Contents Introduction... 2 Image Processing... 2 Reading an image... 3 RGB to Gray... 3 Mean and Gaussian filtering... 6 Defining our Region of Interest... 10 BirdsEyeView

More information

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks 2015 IEEE Symposium Series on Computational Intelligence Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks Michiel van de Steeg Institute of Artificial Intelligence

More information

Matthew Fox CS229 Final Project Report Beating Daily Fantasy Football. Introduction

Matthew Fox CS229 Final Project Report Beating Daily Fantasy Football. Introduction Matthew Fox CS229 Final Project Report Beating Daily Fantasy Football Introduction In this project, I ve applied machine learning concepts that we ve covered in lecture to create a profitable strategy

More information

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017 Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER 2017 April 6, 2017 Upcoming Misc. Check out course webpage and schedule Check out Canvas, especially for deadlines Do the survey by tomorrow,

More information

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( ) COMP3211 Project Artificial Intelligence for Tron game Group 7 Chiu Ka Wa (20369737) Chun Wai Wong (20265022) Ku Chun Kit (20123470) Abstract Tron is an old and popular game based on a movie of the same

More information