Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Size: px
Start display at page:

Download "Creating an Agent of Doom: A Visual Reinforcement Learning Approach"

Transcription

1 Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract We investigate the use of the Deep Q-Network deep learning model for use in playing complex video games such as first-person shooters. Due to the availability of the ViZDoom interface framework, we chose to experiment with the 1993 PC game Doom. By modeling the game as an extremely large Markov Decision Process, we are able to employ Q-learning techniques to train the parameters of a neural network Q-function approximation. Using ViZDoom, we were able to train an artificial intelligence agent that was capable of completing tasks in simple s using only the current screen image as input to the network. Results show that for the s tested, the agent was able to learn successful strategies in relatively small amounts of training time. I. INTRODUCTION In 2013, the Google DeepMind team published a fascinating paper detailing a method for using reinforcement learning techniques to train an artificial intelligence agent to play Atari 2600 video games using only the screen s raw pixel data and the game score as input [1]. The project proposed a new deep learning model, dubbed the Deep Q-Network (DQN), that uses a variation of Q- learning to train a convolutional neural network (CNN) to make decisions on which actions to take in the game. As such, it has the ability to learn and develop policies using high-dimensional input data. This project attempts to tackle the challenge of extending this study by training an AI agent to play a first-person shooter (FPS) game specifically the 1993 classic PC game Doom. This goal is made tractable by the existence of an interface framework called ViZDoom, which was developed as part of the Visual Doom AI Competition during the 2016 Computational Intelligence & Games (CIG) conference [2]. The framework provides access to the the screen buffer as well as game state information such as player health, armor, ammo, etc., making application of the DQN model possible. Therefore, to achieve our goal, we train a CNN using the DQN model and game data relayed from ViZDoom which then develops the ability to make decisions on which keys to press based on the current screen image. A. Network Architecture II. MODEL The CNN takes as input a 60x40 pixel single-channel grayscaled version of the screen buffer, which is then passed through three convolutional layers, each followed by a respective max-pooling subsampling layer. The first convolutional layer contains 32 8x8 filters, the second 64 4x4 filters, and the third 64 3x3 filters. Each maxpooling layer uses a 2x2 shape. After this, the data is passed through a fully-connected 512-node rectifier layer, then a final fully-connected linear output layer. At the output of the network, each node corresponds to a possible combination of button presses and the weight at the node corresponds to the Q-value of taking that action from the current state. This network architecture is illustrated in Figure 1. At each layer of our network a Rectified Linear Unit (ReLU) was used as the activation function. The weights were initialized according to the methods proposed by He. et al in [4], which have shown to be perform well with networks using ReLUs as the activation function. B. Markov Decision Process (MDP) The process of playing the game is modeled as an extremely large, but finite, Markov Decision Process. A state, s, is simply the image pixel data contained in the screen buffer. The set of possible actions A = {a 1, a 2,..., a K } represents all possible combinations of button presses. For example, in the basic that was tested, in which the game map is a single corridor with a stationary monster on one end and the AI agent on the other, a very small set of actions is used, representing all 2 3 binary combinations of [left, right, shoot]. The rewards system is where the ViZDoom framework becomes especially useful. Because the framework

2 allows access to game state information, we are able to define rewards as a function of events in the game, such as a monster or the player dying. The maps/s themselves are hardcoded using editing software to return rewards to the framework when certain events occur. For example, in the basic, the simple corridor map used generates a +101 reward upon death of the monster, -5 reward when shooting but missing, and - 1 reward every frame while the player is still alive. Further discussion on the specific rewards implemented are provided below in Section V. Fig. 1: Depiction of the neural network architecture. Three convolutional layers with respective max-pooling subsampling, then a fullyconnected rectification layer followed by a fully-connected output layer. Input is a 1-channel image and output is a weight associated with each possible game action. III. ALGORITHM To train the agent to play the game, an ε-greedy Q- learning approach is used in an attempt to determine the optimal policy for the MDP. This optimal policy represents the ideal game strategy at each state s, and is found by selecting the action a A at the current state s that maximizes the following recursion: Q (s, a) = E s [ r + γ max a ] Q (s, a ) s, a In the above formulation, r represents the reward given for taking action a at state s and γ [0, 1) is the discount factor which controls how much we care about future versus immediate rewards. The optimal Q- value, Q (s, a), represents the expected score that would be achieved by taking the action a and then continually choosing actions that are expected to return the largest Q-value until the game ends. Note that this is an offpolicy learning algorithm since we are not directly evaluating Q-values based on some set policy. Because the state dimensionality and state space are so large, in order to promote adequate learning generalization, a neural network function approximation, referred to as a Q-network, parameterized by θ, is used to approximate the Q-function such that: Q(s, a; θ) Q(s, a) The parameters θ are optimized using stochastic gradient descent (SGD) on the cost function below, representing the squared difference between the current target and prediction values: L(θ i ) = A. Experience Replay [( r + γ max a Q(s, a ; θ i ) ) ] 2 Q(s, a; θ i ) For the SGD update procedure, a technique called experience replay is utilized. This technique involves storing a large set of previous transitions e t = (s t, a t, r t, s t) in a dataset D t = {e 1,..., e t } called the replay memory, across many run-throughs of the agent playing the game. During the update step at a given iteration t, a small set of transitions, referred to as a minibatch, are sampled uniformly at random from the dataset D t. Then, using the RMSProp algorithm, this minibatch is used to perform an update on the network parameters. This technique avoids training on consecutive transitions, which would lead to erratic updates and a slower learning rate due to the high correlation between the states. The replay memory is large but finite, and once full, when new transitions are added, the oldest are removed and replaced by the new. B. ε-greedy Q-Learning In the ε-greedy learning implementation, the parameter ε refers to the probability that the agent takes a random action a A while playing the game during training instead of taking the action arg max a A Q(s, a; θ) based on our current network parameterization θ. Starting with a ε close to one allows for exploration of the state space at the beginning of training, however, to allow the training to converge, we decrease the ε parameter linearly over the training time from 0.99 to 0.10, causing the agent to closer follow and refine its developed strategy. C. Action Repeat Another technique used was action repeat, in which the agent repeats an action choice over some number of frames before evaluating the Q-function again and choosing a new action to take. This technique provides several advantages. Using action repeat avoids populating the replay memory and thus training excessively on very similar states which leads to faster training. It also gives the game time to react to the selected action 2

3 (a) Basic (b) Health-gathering (c) Predict Fig. 2: Example screen images from the three test s. and respond with an appropriate reward, allowing the agent to more accurately associate the action chosen with the reward returned. In our implementation we found an action repeat of 4 frames worked well. IV. BASELINE & ORACLE Because training an agent for a multiplayer game of Doom is such a complex undertaking, we decided to instead break the problem up into simpler s. The three s we examined are explained in detail in Section V, though example screen images from each can be viewed above in Figure 2. The s emphasize two main points of the game that an agent should be proficient at; killing monsters and collecting health packs. To adequately evaluate the performance of our system, we first implemented a baseline algorithm which simply took random actions (i.e. ε = 1.0) for the entirety of its playtime. As might be expected, this resulted in very poor average performance. The average scores obtained by the agent in each test using this algorithm is summarized below in Table I. To give an upper bound on what we might be aiming for in regards to good performance, our oracle was simply the average score achieved by a human playing the game for each. These scores are also summarized below in Table I. TABLE I: Baseline and oracle performance results. Scenario Baseline Avg. Score Oracle Avg. Score Basic Health-gathering Prediction V. RESULTS The first we decided to explore, referred to as the basic, has the agent at one end of a square room and a monster at the other end. The agent can only take three possible actions: move left, move right, and shoot. The monster can be killed in one shot and the agent will get a reward of +101 if the monster is killed. The agent will get a reward of -1 for being alive at each state, and a reward of -5 for shooting and missing the monster, which incentivizes both precision and speed. The game engine returns these rewards to the agent as a result of its interactions with the world. Note that custom rewards can also be created, such as by using the engine to detect change in the agent s amount of ammo. Within our implementation, by using only the screen buffer the agent is able to make a decision on which action to take. The agent was trained over 20 epochs which each consisted of 5000 iterations to update the Q function. After each epoch the agent was tested on 100 episodes. The episode ends once the monster is killed, or after a timeout of 300 actions are taken. The results of the average score per episode for both testing and training are shown in Figure 3 and Figure 4 respectively. The training scores represent the average score over each epoch of training, and the testing scores represent the average score after each epoch. The average training score steadily increases until about the 9th epoch where it starts to level out. The testing scores also reach the maximum values in about 9 epochs. The agent was able to outperform both the baseline and oracle score. The agent had an average score of for the basic. The second second we tested, called the health-gathering, placed the agent in a square room where the floor is covered with acid. The agent can choose any combination of the following actions, move forward, turn left, and turn right. Standing in the acid will slowly reduce the agent s health. Health packs are spawned at random throughout the room and the agent must pick up these health packs in order to stay 3

4 alive. For this the agent only gets a reward of +1 for staying alive, there is no explicit reward given for collecting health packs. The agent must learn that collecting health packs is necessary to maximize its score. For this we used a timeout of 2100 actions, so the maximum score that can be achieved is The agent was trained in a similar fashion for the health gathering. There were 20 epochs of 5000 iterations, with 100 episodes of testing after each epoch. The averaged score per episode of testing and training are shown in Figures 5 and 6 respectively. These plots show that the average score does increase overtime, but there is some noise in the learning curve. This could be a sign that the network needs more time to train to a stable solution. After training it is interesting to view the agents decision making during the game. In Figure 7, the agents relative movements are plotted. This plot shows how the agent has learned to move in large circles around the room. This strategy ensures that the agent is always moving and does not get stuck against the wall or in a corner of the room. Since there is a fairly high density of health packs in the room, this circular path is likely to pick up health packs as well. The agent outperformed the baseline and came close to the performance of the oracle. The agent had an average test score of for the health gathering. On many episodes it would remain alive until time ran out. More training is necessary for the agent to learn how to obtain the maximal score on every episode. The final we decided to test was the prediction. In this the agent can has the following actions available, turn left, turn right, and shoot. The agent is placed in a square room, and at the far end of the room is a monster. The monster travels across the room on a fixed path, and is spawned somewhere randomly on this path. The agent this time is given a rocket launcher with a single rocket. There is a delay between when the agent fires and when the rocket reaches an object depending on the distance. In this the agent must predict where the monster will move to so that it can lead the shot and kill the monster. If the agent successfully hits the monster it will get a reward of +1. For each state that the agent is alive it will receive a reward of Each episode ends either when a timeout of 300 states has been reached, or when the rocket has been fired and struck something. We trained the network several different times while adjusting hyperparameters such as the amount of regularization and the network structure itself. The average score did not improve significantly after training times of epochs. Since this is more complex than the previous two we anticipated that it would need a larger training time, and figured that the score would increase after a larger amount of computation time. We also decided to expand the state for this. Instead of feeding in a single 60x40 image image we used as input a 2x60x60 pixel image. The first channel represent the current frame and the previous channel represented the previous frame. Our theory was that using the previous frame and current frame the network could infer which direction the monster was moving if the previous frame was added to the state. We finally trained our network for 220 epochs (aproximately 22 hours), the training scores can be seen in Figure 8, and the testing results are shown in Figure 9. Fig. 3: Average testing score per episode vs. epoch for basic Fig. 4: Average training score per episode vs. epoch for basic VI. DISCUSSION The results for the basic demonstrate that for the relatively small state- and action-spaces present here, the Q-function is able to converge quite quickly using our 4

5 Fig. 5: Average testing score per episode vs epoch for health-gathering Fig. 7: Path taken by agent in health gathering at test time Fig. 6: Average training score per episode vs epoch for health-gathering Fig. 8: Average training score per episode vs epoch for prediction a implementation and after training can be used to achieve very high scores on almost every test run. Convergence was achieved in about 9 epochs, which took about 28 minutes to run on our hardware. This outcome appears to indicate that the underlying algorithm and model are effective in training the agent to perform at least simple tasks such as this. The results for the health-gathering, a somewhat more complex task, indicate that the algorithm is also able to learn successful methods for slightly higher state- and action-space s. What is particularly interesting about the outcome of this is that although the agent does learn the successful method of survival by running around in circles (see Figure 7), it does not appear to have a strong, if any, understanding that picking up health packs is truly the key to staying alive. While the agent does continually pick up health packs as it runs around, it does not seem to specifically attempt to direct itself towards health packs, rather it simply tries to keep moving as much as possible in hopes of hitting some by chance. After 20 epochs of training on the health-gathering (taking a little over an hour), it s also interesting to note that the Q-function does not entirely converge and it s difficult to say whether or not the average score actually appears to still be on an upward trend. We hypothesize that this is due to the fact that the system has still yet to associate the health packs directly with success and would need significantly more training before this (convergence) occurs. The prediction did not preform as well as we had anticipated. At first glance it seems that the network may be over-fitting because the average testing score starts to decay after epoch 190. If we had more time we would experiment by increasing the amount of L2 regularization used in our loss function. We could also have added drop out layers to our network at train time to reduce avoid over-fitting. The learning rate and weight initialization are the other two hyperparameters we would like to experiment with to see how it effects the training time and results. Choosing a lower training rate 5

6 that individually represent important tasks in the game Doom. Our agent only uses the information from the image buffer as an input to determine which action to take. We show that on two of the three s our agent outperforms a random policy and comes very close to the level of a human player. The flaws of our method are pointed out and potential solutions are proposed for future work. REFERENCES Fig. 9: Average testing score per episode vs epoch for prediction and allowing the network to train longer may avoid the score from dipping at the end. The fact that the learning curve is fairly flat until a certain point may indicate that the initial weights were poorly chosen and that no major change to the network was made until much later during train time. For this particular this ε-greedy Q-learning implementation has a difficult time trying to learn the optimal policy. The probability of the correct sequence of actions occurring at random is very low in certain cases. For example, when the monster is spawned off screen the agent must turn in the correct direction a certain number of times just to see the monster, and then shoot at the precise time. The probability of this random walk occurring can be very small and will not occur often. Had we trained on data acquired from a human playing the game we could have potentially sped up the train time and allowed the agent to learn how to find the monster more easily. This will also avoid the neurons from becoming saturated or skewed by constantly having training examples that would result in a negative reward. However, training on a human playing does take away from the appeal of the agent discovering how to play on its own. After training the predict for 220 epochs it did not outperform our baseline score. The average test score over 100 episodes was -0.3, which indicates that it did not hit the monster at all during the 100 trials. We have pointed out potential flaws in our methods above. In addition longer training time may still be needed since the addition of the previous frame to the state increases the amount of variables in our network. [1] Mnih, Volodymyr, et al. Human-level control through deep reinforcement learning. Nature (2015): [2] Kempka, Micha, et al. ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning. arxiv preprint arxiv: (2016). [3] Lample, Guillaume, and Devendra Singh Chaplot. Playing FPS games with deep reinforcement learning. arxiv preprint arxiv: (2016). [4] K. He, X. Zhang, S. Ren and J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp VII. CONCLUSION In this paper we implemented a Deep Q-Network for teaching an agent how to play certain simple s 6

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault CS221 Project Final Report Deep Q-Learning on Arcade Game Assault Fabian Chan (fabianc), Xueyuan Mei (xmei9), You Guan (you17) Joint-project with CS229 1 Introduction Atari 2600 Assault is a game environment

More information

Playing FPS Games with Deep Reinforcement Learning

Playing FPS Games with Deep Reinforcement Learning Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Playing FPS Games with Deep Reinforcement Learning Guillaume Lample, Devendra Singh Chaplot {glample,chaplot}@cs.cmu.edu

More information

Playing Atari Games with Deep Reinforcement Learning

Playing Atari Games with Deep Reinforcement Learning Playing Atari Games with Deep Reinforcement Learning 1 Playing Atari Games with Deep Reinforcement Learning Varsha Lalwani (varshajn@iitk.ac.in) Masare Akshay Sunil (amasare@iitk.ac.in) IIT Kanpur CS365A

More information

DeepMind Self-Learning Atari Agent

DeepMind Self-Learning Atari Agent DeepMind Self-Learning Atari Agent Human-level control through deep reinforcement learning Nature Vol 518, Feb 26, 2015 The Deep Mind of Demis Hassabis Backchannel / Medium.com interview with David Levy

More information

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.

More information

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

AI Agents for Playing Tetris

AI Agents for Playing Tetris AI Agents for Playing Tetris Sang Goo Kang and Viet Vo Stanford University sanggookang@stanford.edu vtvo@stanford.edu Abstract Game playing has played a crucial role in the development and research of

More information

Applying Modern Reinforcement Learning to Play Video Games

Applying Modern Reinforcement Learning to Play Video Games THE CHINESE UNIVERSITY OF HONG KONG FINAL YEAR PROJECT REPORT (TERM 1) Applying Modern Reinforcement Learning to Play Video Games Author: Man Ho LEUNG Supervisor: Prof. LYU Rung Tsong Michael LYU1701 Department

More information

Learning from Hints: AI for Playing Threes

Learning from Hints: AI for Playing Threes Learning from Hints: AI for Playing Threes Hao Sheng (haosheng), Chen Guo (cguo2) December 17, 2016 1 Introduction The highly addictive stochastic puzzle game Threes by Sirvo LLC. is Apple Game of the

More information

Mutliplayer Snake AI

Mutliplayer Snake AI Mutliplayer Snake AI CS221 Project Final Report Felix CREVIER, Sebastien DUBOIS, Sebastien LEVY 12/16/2016 Abstract This project is focused on the implementation of AI strategies for a tailor-made game

More information

an AI for Slither.io

an AI for Slither.io an AI for Slither.io Jackie Yang(jackiey) Introduction Game playing is a very interesting topic area in Artificial Intelligence today. Most of the recent emerging AI are for turn-based game, like the very

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

Artificial Intelligence. Minimax and alpha-beta pruning

Artificial Intelligence. Minimax and alpha-beta pruning Artificial Intelligence Minimax and alpha-beta pruning In which we examine the problems that arise when we try to plan ahead to get the best result in a world that includes a hostile agent (other agent

More information

Swing Copters AI. Monisha White and Nolan Walsh Fall 2015, CS229, Stanford University

Swing Copters AI. Monisha White and Nolan Walsh  Fall 2015, CS229, Stanford University Swing Copters AI Monisha White and Nolan Walsh mewhite@stanford.edu njwalsh@stanford.edu Fall 2015, CS229, Stanford University 1. Introduction For our project we created an autonomous player for the game

More information

Heads-up Limit Texas Hold em Poker Agent

Heads-up Limit Texas Hold em Poker Agent Heads-up Limit Texas Hold em Poker Agent Nattapoom Asavareongchai and Pin Pin Tea-mangkornpan CS221 Final Project Report Abstract Our project aims to create an agent that is able to play heads-up limit

More information

Tutorial of Reinforcement: A Special Focus on Q-Learning

Tutorial of Reinforcement: A Special Focus on Q-Learning Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model

More information

Learning to Play Love Letter with Deep Reinforcement Learning

Learning to Play Love Letter with Deep Reinforcement Learning Learning to Play Love Letter with Deep Reinforcement Learning Madeleine D. Dawson* MIT mdd@mit.edu Robert X. Liang* MIT xbliang@mit.edu Alexander M. Turner* MIT turneram@mit.edu Abstract Recent advancements

More information

arxiv: v1 [cs.ce] 9 Jan 2018

arxiv: v1 [cs.ce] 9 Jan 2018 Predict Forex Trend via Convolutional Neural Networks Yun-Cheng Tsai, 1 Jun-Hao Chen, 2 Jun-Jie Wang 3 arxiv:1801.03018v1 [cs.ce] 9 Jan 2018 1 Center for General Education 2,3 Department of Computer Science

More information

ECE 517: Reinforcement Learning in Artificial Intelligence

ECE 517: Reinforcement Learning in Artificial Intelligence ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: Case Studies and Gradient Policy October 29, 2015 Dr. Itamar Arel College of Engineering Department of Electrical Engineering and

More information

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of Table of Contents Game Mechanics...2 Game Play...3 Game Strategy...4 Truth...4 Contrapositive... 5 Exhaustion...6 Burnout...8 Game Difficulty... 10 Experiment One... 12 Experiment Two...14 Experiment Three...16

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Deep Learning Barnabás Póczos Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2

More information

VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL

VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL Doron Sobol 1, Lior Wolf 1,2 & Yaniv Taigman 2 1 School of Computer Science, Tel-Aviv University 2 Facebook AI Research ABSTRACT

More information

Using Artificial intelligent to solve the game of 2048

Using Artificial intelligent to solve the game of 2048 Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial

More information

Deep Reinforcement Learning and Forward Modeling for StarCraft AI

Deep Reinforcement Learning and Forward Modeling for StarCraft AI M2 Mathématiques, Vision et Apprentissage École Normale Supérieure de Cachan Deep Reinforcement Learning and Forward Modeling for StarCraft AI Internship Report Alex Auvolat Under the supervision of: Gabriel

More information

AI Approaches to Ultimate Tic-Tac-Toe

AI Approaches to Ultimate Tic-Tac-Toe AI Approaches to Ultimate Tic-Tac-Toe Eytan Lifshitz CS Department Hebrew University of Jerusalem, Israel David Tsurel CS Department Hebrew University of Jerusalem, Israel I. INTRODUCTION This report is

More information

Deep RL For Starcraft II

Deep RL For Starcraft II Deep RL For Starcraft II Andrew G. Chang agchang1@stanford.edu Abstract Games have proven to be a challenging yet fruitful domain for reinforcement learning. One of the main areas that AI agents have surpassed

More information

Monte Carlo based battleship agent

Monte Carlo based battleship agent Monte Carlo based battleship agent Written by: Omer Haber, 313302010; Dror Sharf, 315357319 Introduction The game of battleship is a guessing game for two players which has been around for almost a century.

More information

AI Learning Agent for the Game of Battleship

AI Learning Agent for the Game of Battleship CS 221 Fall 2016 AI Learning Agent for the Game of Battleship Jordan Ebel (jebel) Kai Yee Wan (kaiw) Abstract This project implements a Battleship-playing agent that uses reinforcement learning to become

More information

Deep Learning for Autonomous Driving

Deep Learning for Autonomous Driving Deep Learning for Autonomous Driving Shai Shalev-Shwartz Mobileye IMVC dimension, March, 2016 S. Shalev-Shwartz is also affiliated with The Hebrew University Shai Shalev-Shwartz (MobilEye) DL for Autonomous

More information

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess Stefan Lüttgen Motivation Learn to play chess Computer approach different than human one Humans search more selective: Kasparov (3-5

More information

Transfer Deep Reinforcement Learning in 3D Environments: An Empirical Study

Transfer Deep Reinforcement Learning in 3D Environments: An Empirical Study Transfer Deep Reinforcement Learning in 3D Environments: An Empirical Study Devendra Singh Chaplot School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 chaplot@cs.cmu.edu Kanthashree

More information

Playing Geometry Dash with Convolutional Neural Networks

Playing Geometry Dash with Convolutional Neural Networks Playing Geometry Dash with Convolutional Neural Networks Ted Li Stanford University CS231N tedli@cs.stanford.edu Sean Rafferty Stanford University CS231N CS231A seanraff@cs.stanford.edu Abstract The recent

More information

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Applying Modern Reinforcement Learning to Play Video Games Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Outline Term 1 Review Term 2 Objectives Experiments & Results

More information

A Deep Q-Learning Agent for the L-Game with Variable Batch Training

A Deep Q-Learning Agent for the L-Game with Variable Batch Training A Deep Q-Learning Agent for the L-Game with Variable Batch Training Petros Giannakopoulos and Yannis Cotronis National and Kapodistrian University of Athens - Dept of Informatics and Telecommunications

More information

General Video Game AI: Learning from Screen Capture

General Video Game AI: Learning from Screen Capture General Video Game AI: Learning from Screen Capture Kamolwan Kunanusont University of Essex Colchester, UK Email: kkunan@essex.ac.uk Simon M. Lucas University of Essex Colchester, UK Email: sml@essex.ac.uk

More information

CS221 Project: Final Report Raiden AI Agent

CS221 Project: Final Report Raiden AI Agent CS221 Project: Final Report Raiden AI Agent Lu Bian lbian@stanford.edu Yiran Deng yrdeng@stanford.edu Xuandong Lei xuandong@stanford.edu 1 Introduction Raiden is a classic shooting game where the player

More information

Multi-Agent Simulation & Kinect Game

Multi-Agent Simulation & Kinect Game Multi-Agent Simulation & Kinect Game Actual Intelligence Eric Clymer Beth Neilsen Jake Piccolo Geoffry Sumter Abstract This study aims to compare the effectiveness of a greedy multi-agent system to the

More information

Deep Learning. Dr. Johan Hagelbäck.

Deep Learning. Dr. Johan Hagelbäck. Deep Learning Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Image Classification Image classification can be a difficult task Some of the challenges we have to face are: Viewpoint variation:

More information

Reinforcement Learning Applied to a Game of Deceit

Reinforcement Learning Applied to a Game of Deceit Reinforcement Learning Applied to a Game of Deceit Theory and Reinforcement Learning Hana Lee leehana@stanford.edu December 15, 2017 Figure 1: Skull and flower tiles from the game of Skull. 1 Introduction

More information

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Emeric Stéphane Boigné eboigne@stanford.edu Jan Felix Heyse heyse@stanford.edu Abstract Scaling

More information

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni. Lesson 08 Convolutional Neural Network Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni Lesson 08 Convolution we will consider 2D convolution the result

More information

arxiv: v1 [cs.lg] 2 Jan 2018

arxiv: v1 [cs.lg] 2 Jan 2018 Deep Learning for Identifying Potential Conceptual Shifts for Co-creative Drawing arxiv:1801.00723v1 [cs.lg] 2 Jan 2018 Pegah Karimi pkarimi@uncc.edu Kazjon Grace The University of Sydney Sydney, NSW 2006

More information

Chapter 3 Learning in Two-Player Matrix Games

Chapter 3 Learning in Two-Player Matrix Games Chapter 3 Learning in Two-Player Matrix Games 3.1 Matrix Games In this chapter, we will examine the two-player stage game or the matrix game problem. Now, we have two players each learning how to play

More information

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written

More information

USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER

USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER World Automation Congress 21 TSI Press. USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER Department of Computer Science Connecticut College New London, CT {ahubley,

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

CS221 Project Final Report Automatic Flappy Bird Player

CS221 Project Final Report Automatic Flappy Bird Player 1 CS221 Project Final Report Automatic Flappy Bird Player Minh-An Quinn, Guilherme Reis Introduction Flappy Bird is a notoriously difficult and addicting game - so much so that its creator even removed

More information

MFF UK Prague

MFF UK Prague MFF UK Prague 25.10.2018 Source: https://wall.alphacoders.com/big.php?i=324425 Adapted from: https://wall.alphacoders.com/big.php?i=324425 1996, Deep Blue, IBM AlphaGo, Google, 2015 Source: istan HONDA/AFP/GETTY

More information

Learning to Play 2D Video Games

Learning to Play 2D Video Games Learning to Play 2D Video Games Justin Johnson jcjohns@stanford.edu Mike Roberts mlrobert@stanford.edu Matt Fisher mdfisher@stanford.edu Abstract Our goal in this project is to implement a machine learning

More information

TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill

TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS Thomas Keller and Malte Helmert Presented by: Ryan Berryhill Outline Motivation Background THTS framework THTS algorithms Results Motivation Advances

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH Santiago Ontañón so367@drexel.edu Recall: Adversarial Search Idea: When there is only one agent in the world, we can solve problems using DFS, BFS, ID,

More information

AI Agent for Ants vs. SomeBees: Final Report

AI Agent for Ants vs. SomeBees: Final Report CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 1 AI Agent for Ants vs. SomeBees: Final Report Wanyi Qian, Yundong Zhang, Xiaotong Duan Abstract This project aims to build a real-time game playing

More information

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:

More information

arxiv: v1 [cs.lg] 30 Aug 2018

arxiv: v1 [cs.lg] 30 Aug 2018 Application of Self-Play Reinforcement Learning to a Four-Player Game of Imperfect Information Henry Charlesworth Centre for Complexity Science University of Warwick H.Charlesworth@warwick.ac.uk arxiv:1808.10442v1

More information

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Nikolai Yakovenko NVidia ADLR Group -- Santa Clara CA Columbia University Deep Learning Seminar April 2017 Poker is a Turn-Based

More information

Deep Learning for Infrastructure Assessment in Africa using Remote Sensing Data

Deep Learning for Infrastructure Assessment in Africa using Remote Sensing Data Deep Learning for Infrastructure Assessment in Africa using Remote Sensing Data Pascaline Dupas Department of Economics, Stanford University Data for Development Initiative @ Stanford Center on Global

More information

Deep Neural Networks (2) Tanh & ReLU layers; Generalisation and Regularisation

Deep Neural Networks (2) Tanh & ReLU layers; Generalisation and Regularisation Deep Neural Networks (2) Tanh & ReLU layers; Generalisation and Regularisation Steve Renals Machine Learning Practical MLP Lecture 4 9 October 2018 MLP Lecture 4 / 9 October 2018 Deep Neural Networks (2)

More information

Research on Hand Gesture Recognition Using Convolutional Neural Network

Research on Hand Gesture Recognition Using Convolutional Neural Network Research on Hand Gesture Recognition Using Convolutional Neural Network Tian Zhaoyang a, Cheng Lee Lung b a Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China E-mail address:

More information

Generating an appropriate sound for a video using WaveNet.

Generating an appropriate sound for a video using WaveNet. Australian National University College of Engineering and Computer Science Master of Computing Generating an appropriate sound for a video using WaveNet. COMP 8715 Individual Computing Project Taku Ueki

More information

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen Policy Teaching Through Reward Function Learning Haoqi Zhang, David Parkes, and Yiling Chen School of Engineering and Applied Sciences Harvard University ACM EC 2009 Haoqi Zhang (Harvard University) Policy

More information

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer Learning via Delayed Knowledge A Case of Jamming SaiDhiraj Amuru and R. Michael Buehrer 1 Why do we need an Intelligent Jammer? Dynamic environment conditions in electronic warfare scenarios failure of

More information

Comp 3211 Final Project - Poker AI

Comp 3211 Final Project - Poker AI Comp 3211 Final Project - Poker AI Introduction Poker is a game played with a standard 52 card deck, usually with 4 to 8 players per game. During each hand of poker, players are dealt two cards and must

More information

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms Felix Arnold, Bryan Horvat, Albert Sacks Department of Computer Science Georgia Institute of Technology Atlanta, GA 30318 farnold3@gatech.edu

More information

UAV-Aided 5G Communications with Deep Reinforcement Learning Against Jamming

UAV-Aided 5G Communications with Deep Reinforcement Learning Against Jamming 1 UAV-Aided 5G Communications with Deep Reinforcement Learning Against Jamming Xiaozhen Lu, Liang Xiao, Canhuang Dai Dept. of Communication Engineering, Xiamen Univ., Xiamen, China. Email: lxiao@xmu.edu.cn

More information

Human Level Control in Halo Through Deep Reinforcement Learning

Human Level Control in Halo Through Deep Reinforcement Learning 1 Human Level Control in Halo Through Deep Reinforcement Learning Samuel Colbran, Vighnesh Sachidananda Abstract In this report, a reinforcement learning agent and environment for the game Halo: Combat

More information

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO Introduction to RNNs for NLP SHANG GAO About Me PhD student in the Data Science and Engineering program Took Deep Learning last year Work in the Biomedical Sciences, Engineering, and Computing group at

More information

Fast Online Learning of Antijamming and Jamming Strategies

Fast Online Learning of Antijamming and Jamming Strategies Fast Online Learning of Antijamming and Jamming Strategies Y. Gwon, S. Dastangoo, C. Fossa, H. T. Kung December 9, 2015 Presented at the 58 th IEEE Global Communications Conference, San Diego, CA This

More information

Genbby Technical Paper

Genbby Technical Paper Genbby Team January 24, 2018 Genbby Technical Paper Rating System and Matchmaking 1. Introduction The rating system estimates the level of players skills involved in the game. This allows the teams to

More information

Statistical Analysis of Nuel Tournaments Department of Statistics University of California, Berkeley

Statistical Analysis of Nuel Tournaments Department of Statistics University of California, Berkeley Statistical Analysis of Nuel Tournaments Department of Statistics University of California, Berkeley MoonSoo Choi Department of Industrial Engineering & Operations Research Under Guidance of Professor.

More information

Andrei Behel AC-43И 1

Andrei Behel AC-43И 1 Andrei Behel AC-43И 1 History The game of Go originated in China more than 2,500 years ago. The rules of the game are simple: Players take turns to place black or white stones on a board, trying to capture

More information

Principles of Computer Game Design and Implementation. Lecture 20

Principles of Computer Game Design and Implementation. Lecture 20 Principles of Computer Game Design and Implementation Lecture 20 utline for today Sense-Think-Act Cycle: Thinking Acting 2 Agents and Virtual Player Agents, no virtual player Shooters, racing, Virtual

More information

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Last name: First name: SID: Class account login: Collaborators: CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Due: Monday 2/28 at 5:29pm either in lecture or in 283 Soda Drop Box (no slip days).

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Slides borrowed from Katerina Fragkiadaki Solving known MDPs: Dynamic Programming Markov Decision Process (MDP)! A Markov Decision Process

More information

Training a Minesweeper Solver

Training a Minesweeper Solver Training a Minesweeper Solver Luis Gardea, Griffin Koontz, Ryan Silva CS 229, Autumn 25 Abstract Minesweeper, a puzzle game introduced in the 96 s, requires spatial awareness and an ability to work with

More information

The Level is designed to be reminiscent of an old roman coliseum. It has an oval shape that

The Level is designed to be reminiscent of an old roman coliseum. It has an oval shape that Staging the player The Level is designed to be reminiscent of an old roman coliseum. It has an oval shape that forces the players to take one path to get to the flag but then allows them many paths when

More information

Deep Neural Network Architectures for Modulation Classification

Deep Neural Network Architectures for Modulation Classification Deep Neural Network Architectures for Modulation Classification Xiaoyu Liu, Diyu Yang, and Aly El Gamal School of Electrical and Computer Engineering Purdue University Email: {liu1962, yang1467, elgamala}@purdue.edu

More information

INFORMATION about image authenticity can be used in

INFORMATION about image authenticity can be used in 1 Constrained Convolutional Neural Networs: A New Approach Towards General Purpose Image Manipulation Detection Belhassen Bayar, Student Member, IEEE, and Matthew C. Stamm, Member, IEEE Abstract Identifying

More information

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm by Silver et al Published by Google Deepmind Presented by Kira Selby Background u In March 2016, Deepmind s AlphaGo

More information

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) AI Plays 2048 Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) Abstract The strategy game 2048 gained great popularity quickly. Although it is easy to play, people cannot win the game easily,

More information

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks 2015 IEEE Symposium Series on Computational Intelligence Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks Michiel van de Steeg Institute of Artificial Intelligence

More information

Predicting outcomes of professional DotA 2 matches

Predicting outcomes of professional DotA 2 matches Predicting outcomes of professional DotA 2 matches Petra Grutzik Joe Higgins Long Tran December 16, 2017 Abstract We create a model to predict the outcomes of professional DotA 2 (Defense of the Ancients

More information

2048: An Autonomous Solver

2048: An Autonomous Solver 2048: An Autonomous Solver Final Project in Introduction to Artificial Intelligence ABSTRACT. Our goal in this project was to create an automatic solver for the wellknown game 2048 and to analyze how different

More information

Evolutions of communication

Evolutions of communication Evolutions of communication Alex Bell, Andrew Pace, and Raul Santos May 12, 2009 Abstract In this paper a experiment is presented in which two simulated robots evolved a form of communication to allow

More information

Augmenting Self-Learning In Chess Through Expert Imitation

Augmenting Self-Learning In Chess Through Expert Imitation Augmenting Self-Learning In Chess Through Expert Imitation Michael Xie Department of Computer Science Stanford University Stanford, CA 94305 xie@cs.stanford.edu Gene Lewis Department of Computer Science

More information

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Programming an Othello AI Michael An (man4), Evan Liang (liange) Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black

More information

IBM SPSS Neural Networks

IBM SPSS Neural Networks IBM Software IBM SPSS Neural Networks 20 IBM SPSS Neural Networks New tools for building predictive models Highlights Explore subtle or hidden patterns in your data. Build better-performing models No programming

More information

A New Framework for Supervised Speech Enhancement in the Time Domain

A New Framework for Supervised Speech Enhancement in the Time Domain Interspeech 2018 2-6 September 2018, Hyderabad A New Framework for Supervised Speech Enhancement in the Time Domain Ashutosh Pandey 1 and Deliang Wang 1,2 1 Department of Computer Science and Engineering,

More information

Artificial Intelligence and Deep Learning

Artificial Intelligence and Deep Learning Artificial Intelligence and Deep Learning Cars are now driving themselves (far from perfectly, though) Speaking to a Bot is No Longer Unusual March 2016: World Go Champion Beaten by Machine AI: The Upcoming

More information

The Principles Of A.I Alphago

The Principles Of A.I Alphago The Principles Of A.I Alphago YinChen Wu Dr. Hubert Bray Duke Summer Session 20 july 2017 Introduction Go, a traditional Chinese board game, is a remarkable work of art which has been invented for more

More information

Success Stories of Deep RL. David Silver

Success Stories of Deep RL. David Silver Success Stories of Deep RL David Silver Reinforcement Learning (RL) RL is a general-purpose framework for decision-making An agent selects actions Its actions influence its future observations Success

More information

Co-Creative Level Design via Machine Learning

Co-Creative Level Design via Machine Learning Co-Creative Level Design via Machine Learning Matthew Guzdial, Nicholas Liao, and Mark Riedl College of Computing Georgia Institute of Technology Atlanta, GA 30332 mguzdial3@gatech.edu, nliao7@gatech.edu,

More information

Low frequency extrapolation with deep learning Hongyu Sun and Laurent Demanet, Massachusetts Institute of Technology

Low frequency extrapolation with deep learning Hongyu Sun and Laurent Demanet, Massachusetts Institute of Technology Hongyu Sun and Laurent Demanet, Massachusetts Institute of Technology SUMMARY The lack of the low frequency information and good initial model can seriously affect the success of full waveform inversion

More information

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS Kuan-Chuan Peng and Tsuhan Chen Cornell University School of Electrical and Computer Engineering Ithaca, NY 14850

More information

The Art of Neural Nets

The Art of Neural Nets The Art of Neural Nets Marco Tavora marcotav65@gmail.com Preamble The challenge of recognizing artists given their paintings has been, for a long time, far beyond the capability of algorithms. Recent advances

More information

Prediction of Cluster System Load Using Artificial Neural Networks

Prediction of Cluster System Load Using Artificial Neural Networks Prediction of Cluster System Load Using Artificial Neural Networks Y.S. Artamonov 1 1 Samara National Research University, 34 Moskovskoe Shosse, 443086, Samara, Russia Abstract Currently, a wide range

More information