Reinforcement Learning Agent for Scrolling Shooter Game

Size: px
Start display at page:

Download "Reinforcement Learning Agent for Scrolling Shooter Game"

Transcription

1 Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan Yangxin Zhong Zibo Gong 1 Introduction and Task Definition 1.1 Game Agent Nowadays, gaming has become one of the most popular fields in both live and research. Game agents are some computer programs which can not only play games automatically, but also try to maximize the score or performance in the game. The standard input of a game agent is a scene or state in a specific game, and it will return an optimal action which is estimated as the best action for now to get a high score in the future. The mapping from game state to optimal action is called optimal policy. The task in this paper is that given a specific game and its rules, train a game agent which yields a reasonable optimal policy. 1.2 Objective and Evaluation Playing with a reasonable optimal policy, a game agent should get relatively high score. So, the evaluation for a game agent is how good its performance is in the game by average (or by maximum). Usually, we will compare the performance with the ones of human players of different levels. In our paper, we use the amateur human player level performance as a comparison. Therefore, our objective is to get a high score in a scrolling shooter game with an agent learned by reinforcement learning. Given environment data in each frame as state s, find optimal policy such that gives the optimal action in [up, down, left, right, stay]. () s 1.3 Challenges Complex environment: the game contains hero, enemies, missiles, power ups, etc., and their internal interaction rules, resulting in a complex environment with a huge state space. Delayed rewards: it takes time for the missile to fly and thus one action s reward is revealed after several frames, making it hard for the agent to capture and associate the reward with proper actions. Sequence of actions: one enemy will be destroyed only after several hits, thus it requires a sequence of action to get a reward instead of one single action 2 Infrastructure 2.1 Scrolling Shooter Game The specific game we chose in this paper is a scrolling shooter game. As shown in Figure 1, the enemy ships will occur randomly from the top of the screen with different moving and attacking strategies. We need to control a hero fighter (referenced as the hero below) to shoot down the enemies and try to survive as long as possible to get a high score. In addition, some extra items can upgrade our weapon or give us a shield, which will make life easier. We should try to collect them as well. The game itself is an open source one[1], written in C++ with OpenGL rendering the frames, enabling us to modify its source code slightly to better fit our need in training and testing. 1

2 2.2 Infrastructure Overview Figure 1 Scrolling Shooter Game We made several APIs to let our AI agent to interact with the game, and we also introduced non-graphic mode of the game to speed it up for faster training. The working flow is shown in Figure 2, where the communication part is done via TCP and we disabled the default 40ms delay mechanism in TCP protocol to speed it up, achieved a transmission rate of thousands of frames per second, making it capable of fast training. New game environment Enemies action TCP Game status data Update game status AI agent (python) TCP Hero s action Figure 2 Infrastructure overview 3 Model Generally, a game can be modeled as a Markov Decision Process (MDP). MDP is a state-based model with states, actions, transition distribution and reward: from each state, we can take a legal action, and then with a probability distribution we may transit to one other state and get some reward from the transition. Specifically, in our game the MDP can be formulated as: State. Each state is one frame of game scene. The state includes all the game information of that scene: enemy information (position, type); ammo information (position, speed, type); power-ups information (position, type); hero information (position, health point, shield point, lives, active guns, score). An end state will be the game over scene. Action. Legal action set is [up, down, left, right, stay], which means hero move up, move down, move left, move right and stay still. Since changing action every frame makes little difference on hero position, we design that every time we choose an action, it will remain the same in the next 5 frames. As a result, a successor state will be the scene that is 5 frames after the current state after taking one of the actions above. Transition distribution. This part of MDP cannot be modeled directly since the game is way too complicated and full of randomness. On each state, given an action to take, we cannot predict what next state might be until we execute that action and run the game. The states number of 2

3 this complex game is too large, which makes modeling transition distribution intractable. In next section, we will introduce methods called reinforcement learning to tackle this issue. Reward. The goal of this game is to get a high score, so intuitively the reward will be the score gained in each transition. But using game score as reward will make it difficult to solve MDP and get reasonable optimal policy in practice. In next section, we will give more details of this and redesign a reward function appropriate for solving the problem. 4 Approaches 4.1 Q-learning Q-learning Overview To solve MDP without explicit transition distribution, we introduce Q-learning algorithm with function approximation. Before describing about Q-learning, we need to talk about two important concepts of policy over MDP: value and Q-value. The value of a state s with respect to a fixed policy π is denotated as V π (s), which is the expected total reward received in the future by following policy π from state s. And the Q-value of a stateaction pair (s, a) is notated as Q π (s, a), which is the expected total reward received in the future after taking action a from state s and then following policy π. If our policy π is the optimal policy π opt, then we have optimal value of a state s : π opt (s) = arg max a Q opt(s, a) (1) V opt (s) = { 0 if s is an end state max Q opt(s, a) otherwise (2) a Q opt (s, a) = T(s, π opt (s), s )[R(s, π opt (s), s ) + V opt (s )] s where T(s, a, s ) is the transition probability from state s to s by action a ; and R(s, a, s ) is the transition reward. To obtain the optimal policy π opt, we can estimate V opt (s) and Q opt (s, a) from MDP and take argmax of Q-value. The challenge, as stated in the previous section, is that we cannot estimate transition distribution T(s, a, s ) easily. Q-learning algorithm is one of the solutions. In Q-learning, we don t estimate T(s, a, s ); instead, we directly estimate Q opt (s, a) and V opt (s). First we need to obtain training data: for each state s, we take a so far predicted optimal action a, and then we run the game to transit to a successor state s and get some actual reward r from game. Repeat these steps and we can get a large number of (s, a, s, r) tuples, which will be used as training dataset. For each tuple of (s, a, s, r), we can update Q opt (s, a), V opt (s) and π opt (s) as: Q opt (s, a) (1 η)q opt (s, a) + η (r + V opt (s )) V opt (s) max Q opt(s, a) a 3

4 π opt (s) argmax Q opt(s, a) a where η (0,1). The idea behind this update rule is that we try to use the actual reward r to correct the estimation of value and Q-value step by step. After updates, we can use the corrected estimation of optimal policy to gain new training data and continue this process again and again. In practice, we don t always take the estimated optimal action as the next action, with a probability of ε (0,1), we take a random action. This strategy is called ε-greedy policy, which is necessary because the algorithm can converge to local optima without it. The intuition behind it is that we should sometimes take some random actions, which might never be considered before, and see whether they are better than the current policy; if so, we can improve the estimated optimal policy. Using Q-learning algorithm with ε-greedy strategy, we are guaranteed to get the real optimal policy finally (although we need a very long time in practice). But there is another issue here: if a MDP has too many states, we will have a great number of (s, a) pairs, which makes it hard to store Q opt (s, a) of each pair with limited space and also makes it impossible to converge to true value with limited time Function Approximation In order to deal with large number of states in MDP, we employ function approximation in Q- learning. For each (s, a) pair, we define a feature vector φ(s, a) and use features to approximate Q-value Q opt (s, a). E.g. φ 1 (s, a) = activated weapons number; φ 2 (s, a) = 1[a = w]. The idea behind this is that the Q-value can be estimated by features of current state and the action to take. For instance, if the hero have activated many powerful weapons, he is likely to receive a high score in the future (i.e. a high φ 1 (s, a) might indicate a high Q opt (s, a)); on the contrary, in a scrolling shooter game, we should seldom move the hero to the top of screen (i.e. when φ 2 (s, a) = 1, it might indicate we will have a lower Q opt (s, a)). In this paper, we employ the most common function approximation - linear approximation: Q opt (s, a) = w φ(s, a) (3) where w is vector of weights of all the features. With this function approximation, Q-learning algorithm can update π opt, V opt, and Q opt through updating the weights vector w. Using a stochastic gradient descent method, the update rule of Q-learning will turn into w w η [Q opt (s, a) (r + V opt (s ))] φ(s, a) (4) where η (0,1). And the definitions of π opt, V opt, and Q opt follow (1)(2)(3). The actual update rule we used in Q-learning is exactly formula (4). In section 4.4, we will specify the feature vector φ(s, a) we used in this paper Reward Function As the goal of playing this game is to get a high score, an intuitive idea is to use obtained game score in transition as the reward in Q-learning. But this can cause some issues: 1) the hero gets some score only when it defeats enemy. However, it needs to shoot enemy for quite a long time before defeating it. The shooting actions are valuable but cannot be captured by the score reward. 4

5 2) When hero is taken damage, its life point and shield point will go down but the score won t. So, the bad actions that make the hero take damage cannot be captured by score reward either. 3) When hero collects a power-up, it is likely to have a higher total reward in the future. But the good actions to collect power-ups cannot be captured by the score reward. In order to capture these features of good/bad actions, we need to design a heuristic reward function which can estimate the future reward after taking an action in the current state. In other words, it can give bonus/punishment when hero is taking possible good/bad actions for the future. The reward function we design in this paper is as follow: r = w 1 r dmg + w 2 r dodge + w 3 r item + w 4 r attack + w 5 r genral (5) It s a weighted reward of 5 components: Damage taken reward r dmg is the hero taken damage in transition. When hero takes some damage in the 5-frame transition, this reward will be negative to punish the bad actions. Dodging enemy/ammo reward r dodge is the increase of distance sum of nearby enemy/ammo. This reward is used to encourage the actions that dodge the nearby enemy/ammo and punish those actions that get closer to them. Item collecting reward r item is the decrease of distance to the closest power-up item. This reward is used to encourage the actions that try to collect the closest power-up. We find that these items are very useful for getting a high score so we weight this reward higher than others. Attacking reward r attack is hero positive attacking estimation. This reward estimates whether the hero is trying to attack enemy to gain score. We need this reward to encourage attacking action since defeating enemy is the only way to gain score. The general idea of this term is to see if the hero is attacking an enemy or approaching the closest enemy that can be shot. General movement reward r general is whether the hero is moving to bottom center when idle. A general optimal position for hero is at the bottom center of the screen (since it can make hero able to shoot any enemy for a long time and also convenient to move left and right to dodge enemy/ammo). So when hero is not dodging attack, collecting item or trying to attack enemies, a good move is trying to go to the bottom center. Our final reward function is a weighted sum of the five rewards above, which is complicated but more useful than plain score reward from the training result Features Since we use function approximation to estimate Q opt (s, a), we need to design the feature vector φ(s, a) for each state-action pair in equation (3). The features we use include: Number of ammo in the hero nearby region. We define the nearby region as a circular area with a center of hero position and a fixed radius. Then we divide this area into 8 sector regions and count the number of ammo in each area to form a feature vector of length 8. Speed distribution of ammo in the hero nearby region. The nearby region has the same definition. Now instead of counting ammo number, we use the histogram of speed angles as feature (as shown in Figure 3). Similarly, we use 8 buckets for different speed direction. So it s a feature vector with length 8 * 8 = 64. 5

6 Figure 3 Speed distribution features (left) and the hero front area (right) Number of enemies in the hero nearby region. Similar to the first type of feature but we keep track of the number of different type of enemies. We have 7 kinds of enemy in all, so it s a vector with length 7 * 8 = 56. With these three types of features above, we are able to teach game agent to dodge the enemies and ammos under specific situation. Number of enemy in the hero front area (as shown in Figure 3). We use this feature to keep track of number and positions of attackable enemies. This feature can be useful since we want the game agent to learn to approach and attack enemies to gain score. Other features include number of power-up items in the hero nearby region (can be useful to teach hero to collect power-ups), hero shield point, health point, lives, activated weapon indicators, score, and special position indicators (hero is whether or not at the center of x axis or near the screen boundaries). These features try to capture factors which are appropriate for future reward estimation. We can see that all of the features somehow correspond to different components of our reward function in section 4.3. This is because even if we have a reasonable reward function, we still need appropriate features that can memorize the bonus/punishment to get a good function approximation of Q-value. In total the feature vector of state has a length of 193 in our design. And we also need to design features based on actions; otherwise we cannot tell any difference between Q-values of two different actions in the same state. However, the action itself contains very little information. It is meaningful only when we combine it with state features. As a result, we replicate the length- 193 feature vector with 5 times (since we have 5 possible actions [ w, a, s, d, 0 ]) and multiply each set of copy by an indicator of the corresponding action. Finally, we get a feature vectors with a length of 965. In our design, for each state-action pair, only 193 entries of the 965-dimension vector can be non-zeros due to the indicators of actions. 4.2 Deep Q-Learning The Deep Q-learning is based on our traditional Q-learning approach and thus shares the same action space and states. Instead of using self-extracted feature for learning, the core of Deep Q- learning is to use a Deep Q-Network (DQN) to replace the self-extracted feature and weight part[2] Features To capture geometry relationships in the game world, we set coordinates centered at the hero and mesh area nearby hero in each frame into a 52*52 grid/matrix, and without giving much information, we filter each frame into 4 feature maps, as shown in Figure 4, where each E and U is a 5*5 matrix with all ones, centered at where the target object is. 6

7 each frame E... E U... U E world boundary feature map enemies position feature map missiles count feature map power ups position feature map Related Works Figure 4 Feature maps In 2013, the DeepMind Technologies present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning[3]. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. In addition, other researches[4, 5] on the Deep Q-Network (DQN) also inspired us in tuning the hyper-parameters in the DNQ Deep Q-Network (DQN) Structure To better address the delayed reward issue, we utilized a technique known as Experience Replay where we stored the agent s experiences at each time-step, et st, at, rt, st 1 in a data-set D e,, 1 e, and pooled over many episodes into a replay memory. The algorithm is show in N Algorithm 1, where in the inner loop of the algorithm, we applied Q-learning updates or minibatch updates, to samples of experience drawn at random from the pool of stored samples. After performing experience replay, the agent selected and executes an action according to an ε- greedy policy. Algorithm 1 Deep Q-learning with Experience Replay: Initialize replay memory D to capacity N Initialize action value function Q with random weights for episode = 1: M do Initialize sequence s 1 [ x 1 ] and preprocessed sequenced for t = 1:T do With probability ε select a random action Otherwise: a max Q* x, a select t a i Execute action at in emulator and observe reward new feature maps x t 1 Set st 1 ( st, at, rt ) Store transition and xt 1 x st 1 x j, a j, rj, x j 1 in D Sample random minibatch of transitions Set y r max Qx a, t t1 a t1 a t s 1 r t and extract x j, a j, rj, x j 1 from D Perform a gradient descent step according to the loss function max Q x, a y 2 t a t end for end for 7

8 The structure of our DQN is shown in Figure 5. We fed our Deep Q-Network (DQN) with a sequence of 5 frames as one frame set (one iteration), and perfume one update every 64 frame sets (batch size is 64), resulting in an input dimension of 52*52*4*5*64. The first hidden layers convolve 16 8*8 filters with stride 4 and applies a rectifier nonlinearity. The second hidden layer convolves 32 4*4 filters with stride 2, again followed by a rectifier nonlinearity. The final hidden layer is a fully-connected and consists of 64 rectifier units. The output layer is also a fullyconnected linear layer with 5 outputs for 5 actions. The loss function of the network is L r V s ' Q s, a 2. 5 Experiments 5.1 Baselines & Oracle For baseline, we implement two dumb agents: Figure 5 DQN structure We first applied a random-move strategy to the game, which means that the fighter ship takes a random action for a random period of time. Then we applied a more sophisticated strategy: when an enemy ship or a missile is about to collide with the fighter ship, the agent will try to move our ship to avoid it. This strategy worked much better on the game. But since it can only take a few enemies into account, it cannot avoid collision when we get more enemies and missiles. In this paper, human amateur level of playing is used as the oracle: we practice playing the game for hours and then play it for multiple times and record the final scores we get. 5.2 Evaluation & Analysis For each kind of agent, we run the agent on the game for 100 times and report their average score and maximum score as evaluation of that agent. The result is shown in Table 1 and the learning curve of Q-learning method is shown in Figure 6. 8

9 Table 1 Performance of each method Method Average Score Max Score Random Rule-based Q-learning ( iterations) Q-learning ( iterations) Deep Q-learning ( iterations) Human Figure 6 Learning curve for Q-learning (left) and Deep Q-learning (right) Compared to all the other methods, Q-learning obtained the highest performance at the end. Compared to human player oracle, Q-learning agent get a comparable average score while its maximum score is much higher than the maximum of amateur human player. This shows the effectiveness of Q-learning to estimate the optimal policy. And it also shows that our design of reward function can estimate the future reward well and the features we use can capture some good factors to correctly approximate Q-value. Another advantage of Q-learning is that it need much shorter time for updating in each iteration compared to deep Q-learning method. When 5 we compare Q-learning and Deep Q-learning after the same number of iterations ( ) in Table 1, Deep Q-learning perform better than Q-learning, but since we can execute much more iterations in limited time with Q-learning than with deep Q-learning, the final performance of 7 Q-learning with 2 10 iterations is much higher. As shown in Table 1 shows that DQN can capture some useful local features and also learn some effective non-handcrafted features through the networks. However, the final performance of DQN is much lower. This is because 1) primarily, we cannot run more iterations with limited time due to the slow training speed of DQN; 2) only feature maps of position may not contain enough information for this complicated game environment; and 3) the structure of DQN with two convolution layers might not be appropriate for this game. Other interesting observations. In the process of training of Q-learning algorithm, we once found that our agent is too aggressive that the hero usually prefers to shoot enemies first rather than to dodge the ammo and collect power-ups, although the latter actions have high probability to get a higher reward in the future. We check through our implementation and find that this is because the weight of r attack in our reward function (4) is too high. This cause our heuristic 9

10 cannot estimate the future score correctly. Then we increase the weights of r dmg, r dodge and r item and re-train the model. After the modification of hyper parameters, we manage to get a much smarter agent which can dodge ammos, collect power-ups and attack enemies at the same time. From Figure 6, we find that the performance of Q-learning rapidly increased at iteration of That s exactly because we tune up the weights above at that time and also tune down the value of ε to continue the training. 6 Conclusion To conclude, we manage to obtain a smart game agent on the scrolling shooter game using Q- learning algorithm and Depp Q-learning Network. Our experiment shows the effectiveness of DQN in capturing local region and non-linear hidden features under the same number of iterations compared to traditional Q-learning algorithm. However, the highest performance is still achieved by Q-learning since it s more efficient and can also capture good feature by a welldesigned reward function and feature vector. The final performance of our Q-learning game agent is comparable to the performance of amateur human level player. 7 References [1] "A scrolling shooter game: Chromium B.S.U. [2] V. Mnih et al., "Human-level control through deep reinforcement learning," Nature, Letter vol. 518, no. 7540, pp , 02/26/print [3] V. Mnih et al., "Playing Atari with Deep Reinforcement Learning," CoRR, vol. abs/ , [4] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, "What is the best multi-stage architecture for object recognition?," in 2009 IEEE 12th International Conference on Computer Vision, 2009, pp [5] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010, pp

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault CS221 Project Final Report Deep Q-Learning on Arcade Game Assault Fabian Chan (fabianc), Xueyuan Mei (xmei9), You Guan (you17) Joint-project with CS229 1 Introduction Atari 2600 Assault is a game environment

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures

More information

Playing Atari Games with Deep Reinforcement Learning

Playing Atari Games with Deep Reinforcement Learning Playing Atari Games with Deep Reinforcement Learning 1 Playing Atari Games with Deep Reinforcement Learning Varsha Lalwani (varshajn@iitk.ac.in) Masare Akshay Sunil (amasare@iitk.ac.in) IIT Kanpur CS365A

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

Learning from Hints: AI for Playing Threes

Learning from Hints: AI for Playing Threes Learning from Hints: AI for Playing Threes Hao Sheng (haosheng), Chen Guo (cguo2) December 17, 2016 1 Introduction The highly addictive stochastic puzzle game Threes by Sirvo LLC. is Apple Game of the

More information

CS221 Project: Final Report Raiden AI Agent

CS221 Project: Final Report Raiden AI Agent CS221 Project: Final Report Raiden AI Agent Lu Bian lbian@stanford.edu Yiran Deng yrdeng@stanford.edu Xuandong Lei xuandong@stanford.edu 1 Introduction Raiden is a classic shooting game where the player

More information

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Programming an Othello AI Michael An (man4), Evan Liang (liange) Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black

More information

Swing Copters AI. Monisha White and Nolan Walsh Fall 2015, CS229, Stanford University

Swing Copters AI. Monisha White and Nolan Walsh  Fall 2015, CS229, Stanford University Swing Copters AI Monisha White and Nolan Walsh mewhite@stanford.edu njwalsh@stanford.edu Fall 2015, CS229, Stanford University 1. Introduction For our project we created an autonomous player for the game

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

DeepMind Self-Learning Atari Agent

DeepMind Self-Learning Atari Agent DeepMind Self-Learning Atari Agent Human-level control through deep reinforcement learning Nature Vol 518, Feb 26, 2015 The Deep Mind of Demis Hassabis Backchannel / Medium.com interview with David Levy

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

Learning to Play Love Letter with Deep Reinforcement Learning

Learning to Play Love Letter with Deep Reinforcement Learning Learning to Play Love Letter with Deep Reinforcement Learning Madeleine D. Dawson* MIT mdd@mit.edu Robert X. Liang* MIT xbliang@mit.edu Alexander M. Turner* MIT turneram@mit.edu Abstract Recent advancements

More information

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.

More information

Applying Modern Reinforcement Learning to Play Video Games

Applying Modern Reinforcement Learning to Play Video Games THE CHINESE UNIVERSITY OF HONG KONG FINAL YEAR PROJECT REPORT (TERM 1) Applying Modern Reinforcement Learning to Play Video Games Author: Man Ho LEUNG Supervisor: Prof. LYU Rung Tsong Michael LYU1701 Department

More information

ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT

ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT PATRICK HALUPTZOK, XU MIAO Abstract. In this paper the development of a robot controller for Robocode is discussed.

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Deep Learning Barnabás Póczos Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2

More information

an AI for Slither.io

an AI for Slither.io an AI for Slither.io Jackie Yang(jackiey) Introduction Game playing is a very interesting topic area in Artificial Intelligence today. Most of the recent emerging AI are for turn-based game, like the very

More information

VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL

VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL Doron Sobol 1, Lior Wolf 1,2 & Yaniv Taigman 2 1 School of Computer Science, Tel-Aviv University 2 Facebook AI Research ABSTRACT

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

Hanabi : Playing Near-Optimally or Learning by Reinforcement?

Hanabi : Playing Near-Optimally or Learning by Reinforcement? Hanabi : Playing Near-Optimally or Learning by Reinforcement? Bruno Bouzy LIPADE Paris Descartes University Talk at Game AI Research Group Queen Mary University of London October 17, 2017 Outline The game

More information

Heads-up Limit Texas Hold em Poker Agent

Heads-up Limit Texas Hold em Poker Agent Heads-up Limit Texas Hold em Poker Agent Nattapoom Asavareongchai and Pin Pin Tea-mangkornpan CS221 Final Project Report Abstract Our project aims to create an agent that is able to play heads-up limit

More information

Mutliplayer Snake AI

Mutliplayer Snake AI Mutliplayer Snake AI CS221 Project Final Report Felix CREVIER, Sebastien DUBOIS, Sebastien LEVY 12/16/2016 Abstract This project is focused on the implementation of AI strategies for a tailor-made game

More information

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( ) COMP3211 Project Artificial Intelligence for Tron game Group 7 Chiu Ka Wa (20369737) Chun Wai Wong (20265022) Ku Chun Kit (20123470) Abstract Tron is an old and popular game based on a movie of the same

More information

Monte Carlo based battleship agent

Monte Carlo based battleship agent Monte Carlo based battleship agent Written by: Omer Haber, 313302010; Dror Sharf, 315357319 Introduction The game of battleship is a guessing game for two players which has been around for almost a century.

More information

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Scott Watson, Andrew Vardy, Wolfgang Banzhaf Department of Computer Science Memorial University of Newfoundland St John s.

More information

Playing Geometry Dash with Convolutional Neural Networks

Playing Geometry Dash with Convolutional Neural Networks Playing Geometry Dash with Convolutional Neural Networks Ted Li Stanford University CS231N tedli@cs.stanford.edu Sean Rafferty Stanford University CS231N CS231A seanraff@cs.stanford.edu Abstract The recent

More information

CS221 Project Final Report Gomoku Game Agent

CS221 Project Final Report Gomoku Game Agent CS221 Project Final Report Gomoku Game Agent Qiao Tan qtan@stanford.edu Xiaoti Hu xiaotihu@stanford.edu 1 Introduction Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally

More information

Research on Hand Gesture Recognition Using Convolutional Neural Network

Research on Hand Gesture Recognition Using Convolutional Neural Network Research on Hand Gesture Recognition Using Convolutional Neural Network Tian Zhaoyang a, Cheng Lee Lung b a Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China E-mail address:

More information

It s Over 400: Cooperative reinforcement learning through self-play

It s Over 400: Cooperative reinforcement learning through self-play CIS 520 Spring 2018, Project Report It s Over 400: Cooperative reinforcement learning through self-play Team Members: Hadi Elzayn (PennKey: hads; Email: hads@sas.upenn.edu) Mohammad Fereydounian (PennKey:

More information

USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER

USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER World Automation Congress 21 TSI Press. USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER Department of Computer Science Connecticut College New London, CT {ahubley,

More information

Using Artificial intelligent to solve the game of 2048

Using Artificial intelligent to solve the game of 2048 Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial

More information

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess Stefan Lüttgen Motivation Learn to play chess Computer approach different than human one Humans search more selective: Kasparov (3-5

More information

2048: An Autonomous Solver

2048: An Autonomous Solver 2048: An Autonomous Solver Final Project in Introduction to Artificial Intelligence ABSTRACT. Our goal in this project was to create an automatic solver for the wellknown game 2048 and to analyze how different

More information

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Applying Modern Reinforcement Learning to Play Video Games Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Outline Term 1 Review Term 2 Objectives Experiments & Results

More information

Deep Reinforcement Learning and Forward Modeling for StarCraft AI

Deep Reinforcement Learning and Forward Modeling for StarCraft AI M2 Mathématiques, Vision et Apprentissage École Normale Supérieure de Cachan Deep Reinforcement Learning and Forward Modeling for StarCraft AI Internship Report Alex Auvolat Under the supervision of: Gabriel

More information

Deep Learning for Infrastructure Assessment in Africa using Remote Sensing Data

Deep Learning for Infrastructure Assessment in Africa using Remote Sensing Data Deep Learning for Infrastructure Assessment in Africa using Remote Sensing Data Pascaline Dupas Department of Economics, Stanford University Data for Development Initiative @ Stanford Center on Global

More information

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:

More information

Augmenting Self-Learning In Chess Through Expert Imitation

Augmenting Self-Learning In Chess Through Expert Imitation Augmenting Self-Learning In Chess Through Expert Imitation Michael Xie Department of Computer Science Stanford University Stanford, CA 94305 xie@cs.stanford.edu Gene Lewis Department of Computer Science

More information

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu DeepStack: Expert-Level AI in Heads-Up No-Limit Poker Surya Prakash Chembrolu AI and Games AlphaGo Go Watson Jeopardy! DeepBlue -Chess Chinook -Checkers TD-Gammon -Backgammon Perfect Information Games

More information

AI Agent for Ants vs. SomeBees: Final Report

AI Agent for Ants vs. SomeBees: Final Report CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 1 AI Agent for Ants vs. SomeBees: Final Report Wanyi Qian, Yundong Zhang, Xiaotong Duan Abstract This project aims to build a real-time game playing

More information

Deep Learning for Autonomous Driving

Deep Learning for Autonomous Driving Deep Learning for Autonomous Driving Shai Shalev-Shwartz Mobileye IMVC dimension, March, 2016 S. Shalev-Shwartz is also affiliated with The Hebrew University Shai Shalev-Shwartz (MobilEye) DL for Autonomous

More information

NOVA. Game Pitch SUMMARY GAMEPLAY LOOK & FEEL. Story Abstract. Appearance. Alex Tripp CIS 587 Fall 2014

NOVA. Game Pitch SUMMARY GAMEPLAY LOOK & FEEL. Story Abstract. Appearance. Alex Tripp CIS 587 Fall 2014 Alex Tripp CIS 587 Fall 2014 NOVA Game Pitch SUMMARY Story Abstract Aliens are attacking the Earth, and it is up to the player to defend the planet. Unfortunately, due to bureaucratic incompetence, only

More information

BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang

BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang Introduction BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang Texas Hold em Poker is considered the most popular variation of poker that is played widely

More information

Fast Online Learning of Antijamming and Jamming Strategies

Fast Online Learning of Antijamming and Jamming Strategies Fast Online Learning of Antijamming and Jamming Strategies Y. Gwon, S. Dastangoo, C. Fossa, H. T. Kung December 9, 2015 Presented at the 58 th IEEE Global Communications Conference, San Diego, CA This

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Slides borrowed from Katerina Fragkiadaki Solving known MDPs: Dynamic Programming Markov Decision Process (MDP)! A Markov Decision Process

More information

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms Felix Arnold, Bryan Horvat, Albert Sacks Department of Computer Science Georgia Institute of Technology Atlanta, GA 30318 farnold3@gatech.edu

More information

AI Learning Agent for the Game of Battleship

AI Learning Agent for the Game of Battleship CS 221 Fall 2016 AI Learning Agent for the Game of Battleship Jordan Ebel (jebel) Kai Yee Wan (kaiw) Abstract This project implements a Battleship-playing agent that uses reinforcement learning to become

More information

THE problem of automating the solving of

THE problem of automating the solving of CS231A FINAL PROJECT, JUNE 2016 1 Solving Large Jigsaw Puzzles L. Dery and C. Fufa Abstract This project attempts to reproduce the genetic algorithm in a paper entitled A Genetic Algorithm-Based Solver

More information

Playing FPS Games with Deep Reinforcement Learning

Playing FPS Games with Deep Reinforcement Learning Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Playing FPS Games with Deep Reinforcement Learning Guillaume Lample, Devendra Singh Chaplot {glample,chaplot}@cs.cmu.edu

More information

CS221 Final Project Report Learn to Play Texas hold em

CS221 Final Project Report Learn to Play Texas hold em CS221 Final Project Report Learn to Play Texas hold em Yixin Tang(yixint), Ruoyu Wang(rwang28), Chang Yue(changyue) 1 Introduction Texas hold em, one of the most popular poker games in casinos, is a variation

More information

TUD Poker Challenge Reinforcement Learning with Imperfect Information

TUD Poker Challenge Reinforcement Learning with Imperfect Information TUD Poker Challenge 2008 Reinforcement Learning with Imperfect Information Outline Reinforcement Learning Perfect Information Imperfect Information Lagging Anchor Algorithm Matrix Form Extensive Form Poker

More information

CS221 Project Final Report Automatic Flappy Bird Player

CS221 Project Final Report Automatic Flappy Bird Player 1 CS221 Project Final Report Automatic Flappy Bird Player Minh-An Quinn, Guilherme Reis Introduction Flappy Bird is a notoriously difficult and addicting game - so much so that its creator even removed

More information

Dice Games and Stochastic Dynamic Programming

Dice Games and Stochastic Dynamic Programming Dice Games and Stochastic Dynamic Programming Henk Tijms Dept. of Econometrics and Operations Research Vrije University, Amsterdam, The Netherlands Revised December 5, 2007 (to appear in the jubilee issue

More information

Department of Computer Science and Engineering. The Chinese University of Hong Kong. Final Year Project Report LYU1601

Department of Computer Science and Engineering. The Chinese University of Hong Kong. Final Year Project Report LYU1601 Department of Computer Science and Engineering The Chinese University of Hong Kong 2016 2017 LYU1601 Intelligent Non-Player Character with Deep Learning Prepared by ZHANG Haoze Supervised by Prof. Michael

More information

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written

More information

Tutorial of Reinforcement: A Special Focus on Q-Learning

Tutorial of Reinforcement: A Special Focus on Q-Learning Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model

More information

AI Agents for Playing Tetris

AI Agents for Playing Tetris AI Agents for Playing Tetris Sang Goo Kang and Viet Vo Stanford University sanggookang@stanford.edu vtvo@stanford.edu Abstract Game playing has played a crucial role in the development and research of

More information

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer Learning via Delayed Knowledge A Case of Jamming SaiDhiraj Amuru and R. Michael Buehrer 1 Why do we need an Intelligent Jammer? Dynamic environment conditions in electronic warfare scenarios failure of

More information

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks 2015 IEEE Symposium Series on Computational Intelligence Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks Michiel van de Steeg Institute of Artificial Intelligence

More information

Biologically Inspired Computation

Biologically Inspired Computation Biologically Inspired Computation Deep Learning & Convolutional Neural Networks Joe Marino biologically inspired computation biological intelligence flexible capable of detecting/ executing/reasoning about

More information

arxiv: v1 [cs.lg] 2 Jan 2018

arxiv: v1 [cs.lg] 2 Jan 2018 Deep Learning for Identifying Potential Conceptual Shifts for Co-creative Drawing arxiv:1801.00723v1 [cs.lg] 2 Jan 2018 Pegah Karimi pkarimi@uncc.edu Kazjon Grace The University of Sydney Sydney, NSW 2006

More information

Learning to Play 2D Video Games

Learning to Play 2D Video Games Learning to Play 2D Video Games Justin Johnson jcjohns@stanford.edu Mike Roberts mlrobert@stanford.edu Matt Fisher mdfisher@stanford.edu Abstract Our goal in this project is to implement a machine learning

More information

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm by Silver et al Published by Google Deepmind Presented by Kira Selby Background u In March 2016, Deepmind s AlphaGo

More information

Deep RL For Starcraft II

Deep RL For Starcraft II Deep RL For Starcraft II Andrew G. Chang agchang1@stanford.edu Abstract Games have proven to be a challenging yet fruitful domain for reinforcement learning. One of the main areas that AI agents have surpassed

More information

INFORMATION about image authenticity can be used in

INFORMATION about image authenticity can be used in 1 Constrained Convolutional Neural Networs: A New Approach Towards General Purpose Image Manipulation Detection Belhassen Bayar, Student Member, IEEE, and Matthew C. Stamm, Member, IEEE Abstract Identifying

More information

Lecture 10: Games II. Question. Review: minimax. Review: depth-limited search

Lecture 10: Games II. Question. Review: minimax. Review: depth-limited search Lecture 0: Games II cs22.stanford.edu/q Question For a simultaneous two-player zero-sum game (like rock-paper-scissors), can you still be optimal if you reveal your strategy? yes no CS22 / Autumn 208 /

More information

If you have any questions or feedback regarding the game, please do not hesitate to contact us through

If you have any questions or feedback regarding the game, please do not hesitate to contact us through 1 CONTACT If you have any questions or feedback regarding the game, please do not hesitate to contact us through info@fermis-path.com MAIN MENU The main menu is your first peek into the world of Fermi's

More information

An Empirical Evaluation of Policy Rollout for Clue

An Empirical Evaluation of Policy Rollout for Clue An Empirical Evaluation of Policy Rollout for Clue Eric Marshall Oregon State University M.S. Final Project marshaer@oregonstate.edu Adviser: Professor Alan Fern Abstract We model the popular board game

More information

A retro space combat game by Chad Fillion. Chad Fillion Scripting for Interactivity ITGM 719: 5/13/13 Space Attack - Retro space shooter game

A retro space combat game by Chad Fillion. Chad Fillion Scripting for Interactivity ITGM 719: 5/13/13 Space Attack - Retro space shooter game A retro space combat game by Designed and developed as a throwback to the classic 80 s arcade games, Space Attack launches players into a galaxy of Alien enemies in an endurance race to attain the highest

More information

Human Level Control in Halo Through Deep Reinforcement Learning

Human Level Control in Halo Through Deep Reinforcement Learning 1 Human Level Control in Halo Through Deep Reinforcement Learning Samuel Colbran, Vighnesh Sachidananda Abstract In this report, a reinforcement learning agent and environment for the game Halo: Combat

More information

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Last name: First name: SID: Class account login: Collaborators: CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Due: Monday 2/28 at 5:29pm either in lecture or in 283 Soda Drop Box (no slip days).

More information

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO Introduction to RNNs for NLP SHANG GAO About Me PhD student in the Data Science and Engineering program Took Deep Learning last year Work in the Biomedical Sciences, Engineering, and Computing group at

More information

Generating an appropriate sound for a video using WaveNet.

Generating an appropriate sound for a video using WaveNet. Australian National University College of Engineering and Computer Science Master of Computing Generating an appropriate sound for a video using WaveNet. COMP 8715 Individual Computing Project Taku Ueki

More information

Storyboard for Playing the Game (in detail) Hoang Huynh, Jeremy West, Ioan Ihnatesn

Storyboard for Playing the Game (in detail) Hoang Huynh, Jeremy West, Ioan Ihnatesn Storyboard for Playing the Game (in detail) Hoang Huynh, Jeremy West, Ioan Ihnatesn Playing the Game (in detail) Rules Playing with collision rules Playing with boundary rules Collecting power-ups Game

More information

Deep Learning. Dr. Johan Hagelbäck.

Deep Learning. Dr. Johan Hagelbäck. Deep Learning Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Image Classification Image classification can be a difficult task Some of the challenges we have to face are: Viewpoint variation:

More information

ConvNets and Forward Modeling for StarCraft AI

ConvNets and Forward Modeling for StarCraft AI ConvNets and Forward Modeling for StarCraft AI Alex Auvolat September 15, 2016 ConvNets and Forward Modeling for StarCraft AI 1 / 20 Overview ConvNets and Forward Modeling for StarCraft AI 2 / 20 Section

More information

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Monte Carlo Tree Search and AlphaGo Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Zero-Sum Games and AI A player s utility gain or loss is exactly balanced by the combined gain or loss of opponents:

More information

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Nikolai Yakovenko NVidia ADLR Group -- Santa Clara CA Columbia University Deep Learning Seminar April 2017 Poker is a Turn-Based

More information

SPACEYARD SCRAPPERS 2-D GAME DESIGN DOCUMENT

SPACEYARD SCRAPPERS 2-D GAME DESIGN DOCUMENT SPACEYARD SCRAPPERS 2-D GAME DESIGN DOCUMENT Abstract This game design document describes the details for a Vertical Scrolling Shoot em up (AKA shump or STG) video game that will be based around concepts

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 7: Minimax and Alpha-Beta Search 2/9/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein 1 Announcements W1 out and due Monday 4:59pm P2

More information

Image analysis. CS/CME/BIOPHYS/BMI 279 Fall 2015 Ron Dror

Image analysis. CS/CME/BIOPHYS/BMI 279 Fall 2015 Ron Dror Image analysis CS/CME/BIOPHYS/BMI 279 Fall 2015 Ron Dror A two- dimensional image can be described as a function of two variables f(x,y). For a grayscale image, the value of f(x,y) specifies the brightness

More information

General Video Game AI: Learning from Screen Capture

General Video Game AI: Learning from Screen Capture General Video Game AI: Learning from Screen Capture Kamolwan Kunanusont University of Essex Colchester, UK Email: kkunan@essex.ac.uk Simon M. Lucas University of Essex Colchester, UK Email: sml@essex.ac.uk

More information

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46. Foundations of Artificial Intelligence May 30, 2016 46. AlphaGo and Outlook Foundations of Artificial Intelligence 46. AlphaGo and Outlook Thomas Keller Universität Basel May 30, 2016 46.1 Introduction

More information

Success Stories of Deep RL. David Silver

Success Stories of Deep RL. David Silver Success Stories of Deep RL David Silver Reinforcement Learning (RL) RL is a general-purpose framework for decision-making An agent selects actions Its actions influence its future observations Success

More information

UAV-Aided 5G Communications with Deep Reinforcement Learning Against Jamming

UAV-Aided 5G Communications with Deep Reinforcement Learning Against Jamming 1 UAV-Aided 5G Communications with Deep Reinforcement Learning Against Jamming Xiaozhen Lu, Liang Xiao, Canhuang Dai Dept. of Communication Engineering, Xiamen Univ., Xiamen, China. Email: lxiao@xmu.edu.cn

More information

Gateways Placement in Backbone Wireless Mesh Networks

Gateways Placement in Backbone Wireless Mesh Networks I. J. Communications, Network and System Sciences, 2009, 1, 1-89 Published Online February 2009 in SciRes (http://www.scirp.org/journal/ijcns/). Gateways Placement in Backbone Wireless Mesh Networks Abstract

More information

Artificial Intelligence. Cameron Jett, William Kentris, Arthur Mo, Juan Roman

Artificial Intelligence. Cameron Jett, William Kentris, Arthur Mo, Juan Roman Artificial Intelligence Cameron Jett, William Kentris, Arthur Mo, Juan Roman AI Outline Handicap for AI Machine Learning Monte Carlo Methods Group Intelligence Incorporating stupidity into game AI overview

More information

A Deep Q-Learning Agent for the L-Game with Variable Batch Training

A Deep Q-Learning Agent for the L-Game with Variable Batch Training A Deep Q-Learning Agent for the L-Game with Variable Batch Training Petros Giannakopoulos and Yannis Cotronis National and Kapodistrian University of Athens - Dept of Informatics and Telecommunications

More information

Approaching The Royal Game of Ur with Genetic Algorithms and ExpectiMax

Approaching The Royal Game of Ur with Genetic Algorithms and ExpectiMax Approaching The Royal Game of Ur with Genetic Algorithms and ExpectiMax Tang, Marco Kwan Ho (20306981) Tse, Wai Ho (20355528) Zhao, Vincent Ruidong (20233835) Yap, Alistair Yun Hee (20306450) Introduction

More information

Author(s) Corr, Philip J.; Silvestre, Guenole C.; Bleakley, Christopher J. The Irish Pattern Recognition & Classification Society

Author(s) Corr, Philip J.; Silvestre, Guenole C.; Bleakley, Christopher J. The Irish Pattern Recognition & Classification Society Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title Open Source Dataset and Deep Learning Models

More information

Diceland Consolidated Special Effect Rules

Diceland Consolidated Special Effect Rules Diceland Consolidated Special Effect Rules Document version: 2008-May-23 A supplement for the Diceland Paper Dice Game available from http://www.diceland.com This document is a supplement for Diceland

More information

Transferring Deep Reinforcement Learning from a Game Engine Simulation for Robots

Transferring Deep Reinforcement Learning from a Game Engine Simulation for Robots Transferring Deep Reinforcement Learning from a Game Engine Simulation for Robots Christoffer Bredo Lillelund Msc in Medialogy Aalborg University CPH Clille13@student.aau.dk May 2018 Abstract Simulations

More information

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen Policy Teaching Through Reward Function Learning Haoqi Zhang, David Parkes, and Yiling Chen School of Engineering and Applied Sciences Harvard University ACM EC 2009 Haoqi Zhang (Harvard University) Policy

More information

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Emeric Stéphane Boigné eboigne@stanford.edu Jan Felix Heyse heyse@stanford.edu Abstract Scaling

More information

Neural Labyrinth Robot Finding the Best Way in a Connectionist Fashion

Neural Labyrinth Robot Finding the Best Way in a Connectionist Fashion Neural Labyrinth Robot Finding the Best Way in a Connectionist Fashion Marvin Oliver Schneider 1, João Luís Garcia Rosa 1 1 Mestrado em Sistemas de Computação Pontifícia Universidade Católica de Campinas

More information

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS Thong B. Trinh, Anwer S. Bashi, Nikhil Deshpande Department of Electrical Engineering University of New Orleans New Orleans, LA 70148 Tel: (504) 280-7383 Fax:

More information

Playing Othello Using Monte Carlo

Playing Othello Using Monte Carlo June 22, 2007 Abstract This paper deals with the construction of an AI player to play the game Othello. A lot of techniques are already known to let AI players play the game Othello. Some of these techniques

More information