Human Level Control in Halo Through Deep Reinforcement Learning

Size: px
Start display at page:

Download "Human Level Control in Halo Through Deep Reinforcement Learning"

Transcription

1 1 Human Level Control in Halo Through Deep Reinforcement Learning Samuel Colbran, Vighnesh Sachidananda Abstract In this report, a reinforcement learning agent and environment for the game Halo: Combat Evolved is detailed. The reinforcement agent approaches human level performance when racing in a vehicle on terrains with hills and obstacles. Index Terms Reinforcement Learning, Q-Learning, Function Approximation, Autonomous Vehicle, Halo 1 INTRODUCTION Many recent works in the field of artificial intelligence have focused on the development of actors that can learn how to maximise expected utility without prior knowledge of their environment through reinforcement learning. In this project, several approaches to developing an intelligent agent are evaluated in Halo: Combat Evolved, a game shown in figure 1 developed by Bungie and published by Microsoft Studios in vehicle control in Halo: Combat Evolved. The evaluation of the agent will be strictly concerned with driving an in-game car through a timed obstacle course. 2.2 Scope This agent must be able to (1) create a representation of it s environment from high dimensional pixel inputs and (2) generalize these inputs to past experiences to take action and make decisions with future uncertainty. The specifics for this abstract process are explained in section 4.4. Fig. 2. Reinforcement Learning Agent. Fig. 1. Halo - Combat Evolved. 2 TASK DEFINITION 2.1 Aim The aim of this project is to develop a Reinforcement Learning agent that is able to achieve human level 2.3 Evaluation In order to prove the agents ability to evaluate and understand the 3-dimensional game environment both visually and physically, the autonomous agent will engage in both a solo race and head to head against a human opponent. The race consists of markers to which the autonomous agent must navigate culminating with an endpoint. Only the location of the very next marker will be known to the agent. Game state will include RGB pixels, a depth map, location of the vehicle and more.

2 2 Map layout and game physics are unknown to the autonomous agent. 2.4 Dataset The Halo game was not designed in such a way to allow the integration of a reinforcement learning agent. The dataset we provide is the result of reverse engineering the proprietary Halo compiled assembly code. 3 INFRASTRUCTURE A large amount of infrastructure needed to be created prior to development of the artificial intelligence component. Halo is proprietary, closed source and does not provide an existing API for controlling the game. The following section describes the interface that was created in order to mesh a server side script containing artificial intelligence with the game. 3.1 Halo Plugin The original versions of Halo for Mac released in 2001, including a full version and demo version with limited features, only supported PowerPC architectures. When Apple transitioned the Mac lineup to Intel processors, an updated version of Halo was released with support for the Intel x86 architecture. The demo version of Halo was not updated, so members of the community took the new full version and stripped out any content that was not previously provided by the demo. With this, Halo Mini Demo was born. A plugin system was created during its development that enabled members of the community to inject their own code into the executable. This system was utilised to build the API that the artificial intelligence components could use to control the game. The system hooks into game functions (stored at a certain point in executable memory) with the following procedure: 1) Disable executable memory protection (to make it writable). 2) Move a few of the instructions stored at a particular function offset into a code cave (i.e. move them somewhere else). 3) At the end of the code cave, add an instruction which jumps to the original function. At this point, executing the instructions starting at the top of the code cave should be equivalent to calling the old function. 4) Write a new instruction at the function offset which jumps to a new set of instructions (i.e. the code we want to run instead). 5) Enable executable memory protection (to make it executable). In our new set of instructions, we can then jump to the code cave if we want to execute the old function at any time. The following functions and their corresponding location in the executable were found using a debugger, x86 assembly editors and other reverse engineering tools Python Bindings To speed up rapid prototyping, it was decided that the majority of the artificial intelligence code would be developed in Python rather than directly in the plugin with Objective-C++. All of the logic was then moved to the server script (described in section 3.2) and executed by the plugin when Halo was launched. To facilitate this, a Python interpreter was added to the plugin and then several bridge objects were developed to pass data between the API (C) and Python. This decision also made it easier to connect with other libraries such as tensor flow and keras. Additional features were added to support rapid prototyping. The plugin was designed to listen for any changes to the server script and then automatically reload the module without needing to reset the game. This meant that one could debug very easily. If something wasn t working, it was possible to insert a print statement or modify the debugging function and then immediately see the changes in action without needing to relaunch the game Render Objects (0x2305E0) The render objects function was modified (i.e. replaced and then the existing one was called) to facilitate the extraction of the diffuse and depth map into image files. The diffuse map is what one normally sees when they play Halo. The depth map contains information relating to the distance of the surfaces displayed on screen. A high value (white) means that the rendered surface is far away, whereas a low value (black) means that it is near to the viewpoint. An example of these two maps is shown in figure 3.

3 3 player (and the vehicle if they were in one) from the render pipeline Read Controls (0x13e726) The read controls function was fully replaced. In Halo, the controls are stored in a structure containing the following values: Fig. 3. Diffuse map (left) and Depth map (right). To facilitate debugging, an additional step was injected into the rendering pipeline that cleared the existing rendering and replaced it with a flat plane. This flat plane was then rendered with the diffuse texture. This step, with an unmodified diffuse texture, does not have any impact on the game. Imagine taking a snapshot of the currently displayed screen, deleting everything currently on the screen, and then rendering that snapshot to the screen. To facilitate debugging, the plugin provides a mutable copy of the diffuse texture to the server script. With this, the server script was then able to modify any of the pixels within the game screen, which greatly assisted in debugging. Figure 4 shows an example of the Q-value weight debug screen (i.e. the diffuse texture was modified by the server script to draw horizontal lines depicting the weights for each action at sampled pixels on the screen) that was made possible by the API. Fig. 4. Q-value weight display utilising the debug API Render Object (0x1a7afc) During early stages of the project development cycle, the presence of the car in the depth map was causing problems with user provided features. The render object function was modified to remove the 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) 13) 14) 15) 16) 17) jumping (bool) switch grenade (bool) interacting (bool) - used to enter vehicles etc. switch weapon (bool) meleeing (bool) flashlight (bool) throw grenade (bool) fire weapon (bool) crouching (bool) zooming (bool) scores (bool) - bring up the leader board reload weapon (bool) talk (bool) - bring up chat dialog movex (float) - move left or right movey (float) - move backwards or forwards lookx (float) - look left or right looky (float) - look upwards or downwards Rather than calling the old function, which would populate these values depending on which keys were currently pressed, the plugin instead asks the server script for controls. As the focus was on movement only, the server script was only able to set interacting (so that the actor could get in a vehicle), movex (so that the actor could move) and lookx (so that the actor could turn). An extended version of the API could be created that allowed the actor to set all of these controls and fully mimic a human player, but it was not necessary for the project Run Command (0x11e3de) Halo provides an in-game console where players can type in various debugging commands (such as reset the map, change map, list players etc). To assist with the debugging of learning, the run command function was augmented with the following additional commands. 1) 2) ai (on/off) - Turns the server script on or off. This was useful for starting at a certain test point or getting unstuck etc. debug (on/off) - Turns the debug display on or off. As Python is modifying a huge array during debugging it tends to slow, so this

4 4 is necessary when one is only interested in running the learning at maximum speed. 3) super (on/off) - Turns on supervised learning mode. As described in section 4.4, supervised learning was attempted and this mode was used to create a record of the diffuse map along with the users current controls a few times per second. 4) freedrive (on/off) - Gives control back to the user while still running the server scrip. This is useful if one wants to debug learned weights. 5) learn (on/off) - This passes a flag to the AI to control whether it should learn (i.e. have some exploration factor) or simply exploit as much as possible Other A few other game functions were modified to remove things such as antenna rendering, modify game resolution etc. 3.2 Server Script The server script contains the intelligence component and is written in Python. The plugin exposes a module called halo that the script can import to run the following functions: 1) console(string) - prints a string to the ingame console. 2) restart() - restarts the current match. 3) speed(float) - sets the game speed. 4) map(name, gametype) - sets the current game map. 5) ai(boolean) - sets whether the ai is in control. The script also implements the following class and functions, which are executed by the plugin at the appropriate time. 1) def configure() - called when Halo launches. Used to set the game speed, map file and other configuration settings using the halo module. 2) def setlearn(self, learn) - called when the user enters the learn on/off command in the in-game console. 3) def reload(self, prior) - called when the server script changes. The prior object contains the previous instance of the script (so one could copy over old weights etc) 4) def onframe(self, input) - called when a new frame is rendered by the game engine. 5) def oncontrol(self, state) - called periodically when the game is requesting player controls. 3.3 Client Script The server script is constrained to a 32-bit architecture as it runs within the 32-bit Halo process. This presented a problem when connecting to libraries such as tensorflow, which require a 64-bit instance of Python. Rather than going through the tedious process of compiling each library as a 32-bit version, it was decided that a new client script would be created. It would communicate with the server script using a simple networking library such as Pyro4 to speed up develop time. The client script is then able to run tensorflow and other libraries in 64-bit, possibly on a different machine that is connected to a very powerful GPU to speed up learning. Each state is sent over the network from the server to the client and then the client responds with a corresponding action. The client script is completely optional and requires a server script that supports the networking library. If one were to develop all of the logic within the server script and not require additional 64-bit libraries such as tensorflow, the client script would not be required. 3.4 Architecture Overall, the system we developed connects processes in C++ and Python. The C++ process executes Halo (the game environment) as well as a Halo Plugin. This Halo Plugin exposes control to a Python Server Script (running as a 32-bit process). The Python Server Script communicates to the Client Script through Remote Procedure Calls. The full diagram of this communication along with the data being sent is shown in figure 5. Fig. 5. Architecture Diagram.

5 5 4 APPROACH 4.1 Model In describing the model for our reinforcement learning algorithm, we begin by outlining the inputs and outputs to the algorithm which dictate how the states, policy, and objective are built. This treatment allows for an understanding of the Markov Decision Process that governs the autonomous agent s gameplay Sensor Input At each state of the algorithm, the values shown in table 1 will be received by the algorithm. TABLE 1 Sensor Inputs at Each State Vehicle Range Description (position x, position y, position z) R 3 Vehicle x, y, z coordinates (target x, target y, target z) (heading x, heading y, R 3 Next goal x, y, z coordinates R 3 Vehicle rotation x, y, z unit vector heading z) vehicle {0, 1} Whether the player is in a vehicle tid I Current race index. Starts at 0, then increases to 1 when the player hits the first checkpoint etc Actions Using the sensor inputs and the developed model, the autonomous agent will be able to drive the car through specifying a policy with the variables shown in table States Through the previously defined sensor inputs and actions, an understanding of the states and state space can be examined Unique State: Each state can be uniquely defined by position and heading of the vehicle, the target and whether or not the player is in a vehicle. These variables are shown in the first four rows of table 1. TABLE 2 Action Outputs of Each State Variable Range Description accelerate {-1, 0, 1} Acceleration constant. The values correspond to reverse, stop and accelerate respectively. steer [-2, 2] The direction and amount in which to steer. A negative number indicates steering to the left, 0 indicates continuing on a straight path, and a positive number indicates steering to the right Time Variant Aspects: The Markov Decision Process was trained under the assumption that there are no time variant included in the state space. This means that following a deterministic policy always leads to the same expected reward regardless of the time at which actions are taken Continuous State Space: Since the variables that dictate the possible actions that can be taken are continuous intervals (that are closed and bounded), we have a state space that requires continuous control. How we managed this is detailed in section Action Space Complexity: As the action space is continuous, it is infinitely dimensional and difficult to convert into a discrete set of actions. With the given interval from 0 to 1, if one was to create a 100 discrete choices from.00 to 1.00 and 400 discrete choices for our steering direction, they would have choices at each stage or 40, 000 choices to choose between Time between next State: Any value iterations need to be very fast due to the frame rate of the Halo game. Currently, the frame rate is about 30 FPS and the autonomous agent is expected to act on a policy and make decisions within 20 milliseconds. 4.2 Policy Stochastic Policy: The policy network implemented was used with an input of a state, as described previously, and output an action from the action space. A stochastic policy was also used to encourage exploration of the game surroundings. In addition to game state, the policy relied on a specific feature set θ. These features were learned through the sensory input and are described in detail in the Deep Q Learning portion of this paper.

6 6 π θ (a s) = P [a s, θ] Learning Policy: This policy is approximated through the notion of discounted reward. The following is a representation of the reward, whose details will be further mentioned in the next section. L(θ) = E[r 1 + γr 2 + γ 2 r π θ (s, a)] 4.3 Objective Reward: In implementing the learning policy, rewards have been defined as progression towards the next marker in our race. A discount factor γ as.9 was used and reward proportional to distance progressed towards the next state, where m is the next marker s position and l 1 is the location of the vehicle at state Algorithms r 1 = (m l 1 ) 2 (m l 0 ) 2 The proposed algorithm for our task is to use recent results from Google Deepmind including Deep Q Learning, Continuous Policy Gradient, and using the Actor-Critic algorithm in conjunction with Deep Q Learning Q-Learning with Function Approximation and Weight Regularization The first algorithm utilised was Q-Learning with Function Approximation. As the state space of this problem is enormous, function approximation is necessary to estimate the Q-value of states using a reduced number of parameters. If one was to try and apply normal Q-Learning, the algorithm would need training examples from every single possible location, target, heading etc, which is an inefficient approach. Instead, the following two features were used to approximate the value of a state. The intuition behind choosing these features was to maintain a course to minimise race time whilst being able to avoid any immediate obstacles. To simply the problem, the acceleration component was removed from the action (it was simply set to 1 at all times) and the steering direction was binned into 5 discrete choices , 0.05, 0.00, 0.05, Best Angle: - the best angle feature used the position, target and heading state variables to compute the steering direction that would optimise reaching the target. Using this feature as an action would be equivalent to running the baseline where the actor drives directly towards the next goal point. This is prone to crashing into obstacles and getting stuck, so the additional depth feature was also added Depth: - the depth at a subset of sampled pixels was added as features. It was hoped that the algorithm would assign reasonable weights to the depth features so that it could avoid obstacles which may cause it to crash or get stuck Exploration Policy An epsilon greedy approach was used to balance exploration with exploitation. The epsilon term was slowly decreased as more training samples were gathered to ensure that the algorithm would converge over time to full exploitation and have a reasonable chance of reaching race markers past the initial one. The process used to anneal the exploration rate is a stationary Gauss-Markov Process, also known as the Ornstein-Uhlenbeck process. The formulation for this process follows. dɛ t = θ(µ ɛ t )dt + σdw t Weight Regularization A novel approach was used to normalize the weights after each iteration relative to each action. This prevented one action from becoming too extreme (and having a very high weight) that would lead to that action always being chosen. Instead, the weights relative to each action summed to 1 so that the difference between the overall Q-values for each action was solely determined by the state rather than relatively extreme weights. After closer literature review, we found that this idea has been recently presented for faster convergence in deep neural networks by Tim Salimans and Diederik P. Kingma [7]. The proposal is to reparametrize weights in the following manner, where v is a vector, g is a scalar, and v denotes the Euclidean norm of v. w = g v v w = g

7 As proposed in this recent work, we also v find that decoupling the weight vector v from the norm of the weights w that the stochastic gradient descent we employ converges a lot faster. Furthermore, the gradient of our loss function that we optimize changes due to this weight reparametrization: v L = g L = wl v v g v wl g gl v 2 v Utilizing this modified weight vector and as a consequence, modified gradient, we find that our model converges much faster than otherwise, which we show in the Section 6 Error Analysis Supervised Learning Although not reinforcement learning, supervised learning was attempted as an experiment to see how its performance might compare with reinforcement learning. To collect the necessary data, a human driver drove around the race numerous times whilst the plugin recorded a frame and associated steering action a few times per second. The actions were simplified as in the previous section to convert the problem into a simple 5 class classification problem. The model used is a Convolutional Neural Network trained to minimize categorical crossentropy. Unfortunately due to limited computer performance (no GPU enhancements), we found the process to yield poor results. Our classification error cross-validated with 20 percent of data as a holdout was about 36 percent. L(w) = 1 N L(w) = 1 N N H(p n, q n ) n=1 N [y n log(ŷ n + (1 y n )log(1 ŷ n )] n= Deep Q-Learning In addition to using function approximation with linear regression, we implemented the Deep Q Learning algorithm as detailed by Google Deepmind. This combines the expressiveness of the Neural Networks we used in the Supervised Learning with the Q Learning model, which is more fit to the MDP style structure of gameplay. At the core of the Deep Q Learning algorithm that we use is the Q Learning with function approximation algorithm. At each step we perform the following update to our learned policy: ˆQ opt (s, a; w) = w φ(s, a) w w η[ ˆQ opt (s, a; w) (r + γ ˆV opt (s ))]φ(s, a) We tested both nonlinear function approximation, through Deep Neural Networks, and linear function approximation. In both implementations, we derive weights from the pixel (r,g,b) and depth (α) maps from gameplay. We compute hidden layer activations and then use these activations to find a score of the current state. After experimentation, we use solely the depth (α) map as it allows for better encoding of the environment as the agent needs to interact with it. The depth of objects was found to be far easier to interpret than the RGB values. However, with GPU enhancements we would have used both the pixel and depth maps. 7 The learning function with deep neural networks is shown below. Each neuron learns an activation (depicted by h j ). An overall score is computed for the input. σ(z) = (1 + exp z ) 1 Fig. 6. Convolutional Neural Network Model. The objective we train on and the predictor and categorical cross entropy loss is shown more explicitly, where g is the logistic function. ŷ = g(w x n ) h j = σ(v j φ(x)) score = w h This allows us a model free estimation of the game state. Q π (s, a) Q(s, a, w)

8 Policy Gradients In order to use continuous control (i.e. continuous action space) with this formulation, we will set up a new loss function derived on the policy gradient: dl(θ) dθ dl(θ) dθ = E x p(x θ) [ dq dθ ] = E x p(x θ) [ dq(s, a, w) da da dθ ] Our new loss function now looks like the following: Loss = [r + γq(s, a ) Q(s, a)] 2 We implement our Deep-Q network with and without Policy Gradients. 5 LITERATURE REVIEW In implementing and testing various models, literature regarding Deep Q Learning, Convolutional Neural Networks and Stochastic Optimization was reviewed. 5.1 Deep Q Learning Most of our insights on Deep Q Learning are derived through in class presentations on Deep Q Learning and Google Deepmind s recent papers. Specifically the following works were very useful towards building our Reinforcement Learning agent Human Level Control through Deep Reinforcement Learning This paper outlines the formulation for the Deep Q Network that was used in this project Continuous Control with Deep Reinforcement Learning In using policy gradients for our work, we use this work to dictate how the Loss Functions for the Deep Q Network change when the action space is continuous. 5.2 Convolutional Neural Networks In testing a Convolutional Neural Network for Supervised learning we use literature to dictate how to structure the task and learning of image classification ImageNet Classification with Deep Convolutional Neural Networks ImageNet has evolved as a canonical benchmark and this paper outlines some of the core architectural components of modern Convolutional Networks. The importance of convolutional, activation and fully connected layers are discussed in detail. 5.3 Stochastic Optimization The crux of most modern machine learning algorithms are performant algorithms for stochastic optimization. We reviewed literature in this field to better understand how to speed up convergence on learned policies Adam: A Method for Stochastic Optimization In this work, the authors present a Moment based approach towards updating weights through stochastic optimization Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks This work became part of how we trained function approximation in the Q Network for our Reinforcement Learning agent. We found this addition to speed up convergence of the network substantially and allowed our model to be run on a single quad core CPU. 6 ERROR ANALYSIS After implementing various models, we ran the algorithms on our defined task and noted performance. A summary of performance across models along with a detailed analysis of the best model is included next. We found the best model given our data and machine constraints to be Q Learning with linear function approximation Model Benchmarks We prepared two different solutions to the task of learning to drive a vehicle in Halo. The first solution used Convolutional Neural Networks and the second used reinforcement learning. Our results using Convolutional Neural Networks for classification was poor because we could only capture a few episodes worth of data and could only use one layer of the RGB pixel tensor due to computational limits.

9 Reinforcement Learning: When using Q Learning with function approximation and weight normalization, we found that we are able to converge to non-zero reward quite quickly. There is observed to be a lot of variance across episodes. To combat this further work could be done by using multiple models with voting such as the Asynchronous Advantage Actor Critic model. The agents goal is to maximize its score and will thus aim to finish as many checkpoints as possible with the lowest times achievable. The performance on a short course with moderate terrain and a longer course with heavier terrain is shown below. Fig. 9. Short Course Scores (3 Markers). Fig. 7. Q Learning Reward Over Time. We find that the agent is able to complete the course about 15% of the episodes Best Model Results When competing in the race format we found that our agent performs better than the baseline and is comparable to human level performance. The reinforcement learning agent crashes less, and more importantly, does not get stuck on obstacles like the baseline. To evaluate the ability of the agent to autonomously drive a vehicle, we simulate a race in which it must drive through areas of varying terrain with obstacles. To quantify its performance, we had it race through such a course and we timed the completion of the algorithms at multiple checkpoints throughout the race. We scored the contestants using the following rubric: Fig. 8. Score Per Checkpoint. Fig. 10. Long Course Scores (6 Markers). 7 CONCLUSION We develop and detail an environment and agent for human level control in Halo. We achieve near human level results in moderately challenging control conditions. Further work will be focused on augmenting the learning algorithms with more expressive function approximation, though this also requires better hardware. Another avenue for future work is training the agent on different tasks and investigating ways to transfer learning across these tasks. Doing so would be working towards more generalized artificial intelligence. Lastly we would like to thank the CS221 team for the class and Bryan Anenberg, our TA, for providing us with feedback throughout the course of the project.

10 10 REFERENCES [1] Human Level Control through Reinforcement Learning, /v518/n7540/full/nature14236.html [2] Continuous Control with Deep Reinforcement Learning, [3] Recurrent Policy Gradients, [4] DDPG Keras TORCS, [5] ImageNet Classification with Deep Convolutional Neural Networks, [6] Adam: A Method for Stochastic Optimization, [7] Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks,

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault CS221 Project Final Report Deep Q-Learning on Arcade Game Assault Fabian Chan (fabianc), Xueyuan Mei (xmei9), You Guan (you17) Joint-project with CS229 1 Introduction Atari 2600 Assault is a game environment

More information

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:

More information

Deep Learning. Dr. Johan Hagelbäck.

Deep Learning. Dr. Johan Hagelbäck. Deep Learning Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Image Classification Image classification can be a difficult task Some of the challenges we have to face are: Viewpoint variation:

More information

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Davis Ancona and Jake Weiner Abstract In this report, we examine the plausibility of implementing a NEAT-based solution

More information

Heads-up Limit Texas Hold em Poker Agent

Heads-up Limit Texas Hold em Poker Agent Heads-up Limit Texas Hold em Poker Agent Nattapoom Asavareongchai and Pin Pin Tea-mangkornpan CS221 Final Project Report Abstract Our project aims to create an agent that is able to play heads-up limit

More information

an AI for Slither.io

an AI for Slither.io an AI for Slither.io Jackie Yang(jackiey) Introduction Game playing is a very interesting topic area in Artificial Intelligence today. Most of the recent emerging AI are for turn-based game, like the very

More information

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures

More information

AI Agents for Playing Tetris

AI Agents for Playing Tetris AI Agents for Playing Tetris Sang Goo Kang and Viet Vo Stanford University sanggookang@stanford.edu vtvo@stanford.edu Abstract Game playing has played a crucial role in the development and research of

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Deep Learning Barnabás Póczos Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2

More information

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Last name: First name: SID: Class account login: Collaborators: CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Due: Monday 2/28 at 5:29pm either in lecture or in 283 Soda Drop Box (no slip days).

More information

Embedding Artificial Intelligence into Our Lives

Embedding Artificial Intelligence into Our Lives Embedding Artificial Intelligence into Our Lives Michael Thompson, Synopsys D&R IP-SOC DAYS Santa Clara April 2018 1 Agenda Introduction What AI is and is Not Where AI is being used Rapid Advance of AI

More information

Training a Minesweeper Solver

Training a Minesweeper Solver Training a Minesweeper Solver Luis Gardea, Griffin Koontz, Ryan Silva CS 229, Autumn 25 Abstract Minesweeper, a puzzle game introduced in the 96 s, requires spatial awareness and an ability to work with

More information

Deep Learning for Autonomous Driving

Deep Learning for Autonomous Driving Deep Learning for Autonomous Driving Shai Shalev-Shwartz Mobileye IMVC dimension, March, 2016 S. Shalev-Shwartz is also affiliated with The Hebrew University Shai Shalev-Shwartz (MobilEye) DL for Autonomous

More information

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni. Lesson 08 Convolutional Neural Network Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni Lesson 08 Convolution we will consider 2D convolution the result

More information

Mastering the game of Omok

Mastering the game of Omok Mastering the game of Omok 6.S198 Deep Learning Practicum 1 Name: Jisoo Min 2 3 Instructors: Professor Hal Abelson, Natalie Lao 4 TA Mentor: Martin Schneider 5 Industry Mentor: Stan Bileschi 1 jisoomin@mit.edu

More information

Learning to Play Love Letter with Deep Reinforcement Learning

Learning to Play Love Letter with Deep Reinforcement Learning Learning to Play Love Letter with Deep Reinforcement Learning Madeleine D. Dawson* MIT mdd@mit.edu Robert X. Liang* MIT xbliang@mit.edu Alexander M. Turner* MIT turneram@mit.edu Abstract Recent advancements

More information

AI Learning Agent for the Game of Battleship

AI Learning Agent for the Game of Battleship CS 221 Fall 2016 AI Learning Agent for the Game of Battleship Jordan Ebel (jebel) Kai Yee Wan (kaiw) Abstract This project implements a Battleship-playing agent that uses reinforcement learning to become

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

Neural Networks The New Moore s Law

Neural Networks The New Moore s Law Neural Networks The New Moore s Law Chris Rowen, PhD, FIEEE CEO Cognite Ventures December 216 Outline Moore s Law Revisited: Efficiency Drives Productivity Embedded Neural Network Product Segments Efficiency

More information

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Applying Modern Reinforcement Learning to Play Video Games Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Outline Term 1 Review Term 2 Objectives Experiments & Results

More information

Applying Modern Reinforcement Learning to Play Video Games

Applying Modern Reinforcement Learning to Play Video Games THE CHINESE UNIVERSITY OF HONG KONG FINAL YEAR PROJECT REPORT (TERM 1) Applying Modern Reinforcement Learning to Play Video Games Author: Man Ho LEUNG Supervisor: Prof. LYU Rung Tsong Michael LYU1701 Department

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

DeepMind Self-Learning Atari Agent

DeepMind Self-Learning Atari Agent DeepMind Self-Learning Atari Agent Human-level control through deep reinforcement learning Nature Vol 518, Feb 26, 2015 The Deep Mind of Demis Hassabis Backchannel / Medium.com interview with David Levy

More information

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab.  김강일 신경망기반자동번역기술 Konkuk University Computational Intelligence Lab. http://ci.konkuk.ac.kr kikim01@kunkuk.ac.kr 김강일 Index Issues in AI and Deep Learning Overview of Machine Translation Advanced Techniques in

More information

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Emeric Stéphane Boigné eboigne@stanford.edu Jan Felix Heyse heyse@stanford.edu Abstract Scaling

More information

Saphira Robot Control Architecture

Saphira Robot Control Architecture Saphira Robot Control Architecture Saphira Version 8.1.0 Kurt Konolige SRI International April, 2002 Copyright 2002 Kurt Konolige SRI International, Menlo Park, California 1 Saphira and Aria System Overview

More information

Learning to Play 2D Video Games

Learning to Play 2D Video Games Learning to Play 2D Video Games Justin Johnson jcjohns@stanford.edu Mike Roberts mlrobert@stanford.edu Matt Fisher mdfisher@stanford.edu Abstract Our goal in this project is to implement a machine learning

More information

Texas Hold em Inference Bot Proposal. By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005

Texas Hold em Inference Bot Proposal. By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005 Texas Hold em Inference Bot Proposal By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005 1 Introduction One of the key goals in Artificial Intelligence is to create cognitive systems that

More information

MESA Cyber Robot Challenge: Robot Controller Guide

MESA Cyber Robot Challenge: Robot Controller Guide MESA Cyber Robot Challenge: Robot Controller Guide Overview... 1 Overview of Challenge Elements... 2 Networks, Viruses, and Packets... 2 The Robot... 4 Robot Commands... 6 Moving Forward and Backward...

More information

Lane Detection in Automotive

Lane Detection in Automotive Lane Detection in Automotive Contents Introduction... 2 Image Processing... 2 Reading an image... 3 RGB to Gray... 3 Mean and Gaussian filtering... 5 Defining our Region of Interest... 6 BirdsEyeView Transformation...

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

CSE 258 Winter 2017 Assigment 2 Skill Rating Prediction on Online Video Game

CSE 258 Winter 2017 Assigment 2 Skill Rating Prediction on Online Video Game ABSTRACT CSE 258 Winter 2017 Assigment 2 Skill Rating Prediction on Online Video Game In competitive online video game communities, it s common to find players complaining about getting skill rating lower

More information

Transferring Deep Reinforcement Learning from a Game Engine Simulation for Robots

Transferring Deep Reinforcement Learning from a Game Engine Simulation for Robots Transferring Deep Reinforcement Learning from a Game Engine Simulation for Robots Christoffer Bredo Lillelund Msc in Medialogy Aalborg University CPH Clille13@student.aau.dk May 2018 Abstract Simulations

More information

Chapter 3 Learning in Two-Player Matrix Games

Chapter 3 Learning in Two-Player Matrix Games Chapter 3 Learning in Two-Player Matrix Games 3.1 Matrix Games In this chapter, we will examine the two-player stage game or the matrix game problem. Now, we have two players each learning how to play

More information

Deep RL For Starcraft II

Deep RL For Starcraft II Deep RL For Starcraft II Andrew G. Chang agchang1@stanford.edu Abstract Games have proven to be a challenging yet fruitful domain for reinforcement learning. One of the main areas that AI agents have surpassed

More information

Automated Planetary Terrain Mapping of Mars Using Image Pattern Recognition

Automated Planetary Terrain Mapping of Mars Using Image Pattern Recognition Automated Planetary Terrain Mapping of Mars Using Image Pattern Recognition Design Document Version 2.0 Team Strata: Sean Baquiro Matthew Enright Jorge Felix Tsosie Schneider 2 Table of Contents 1 Introduction.3

More information

CS221 Final Project Report Learn to Play Texas hold em

CS221 Final Project Report Learn to Play Texas hold em CS221 Final Project Report Learn to Play Texas hold em Yixin Tang(yixint), Ruoyu Wang(rwang28), Chang Yue(changyue) 1 Introduction Texas hold em, one of the most popular poker games in casinos, is a variation

More information

Alternation in the repeated Battle of the Sexes

Alternation in the repeated Battle of the Sexes Alternation in the repeated Battle of the Sexes Aaron Andalman & Charles Kemp 9.29, Spring 2004 MIT Abstract Traditional game-theoretic models consider only stage-game strategies. Alternation in the repeated

More information

INTRODUCTION TO GAME AI

INTRODUCTION TO GAME AI CS 387: GAME AI INTRODUCTION TO GAME AI 3/31/2016 Instructor: Santiago Ontañón santi@cs.drexel.edu Class website: https://www.cs.drexel.edu/~santi/teaching/2016/cs387/intro.html Outline Game Engines Perception

More information

Optimal Yahtzee performance in multi-player games

Optimal Yahtzee performance in multi-player games Optimal Yahtzee performance in multi-player games Andreas Serra aserra@kth.se Kai Widell Niigata kaiwn@kth.se April 12, 2013 Abstract Yahtzee is a game with a moderately large search space, dependent on

More information

CS221 Project Final Report Automatic Flappy Bird Player

CS221 Project Final Report Automatic Flappy Bird Player 1 CS221 Project Final Report Automatic Flappy Bird Player Minh-An Quinn, Guilherme Reis Introduction Flappy Bird is a notoriously difficult and addicting game - so much so that its creator even removed

More information

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi Learning to Play like an Othello Master CS 229 Project Report December 13, 213 1 Abstract This project aims to train a machine to strategically play the game of Othello using machine learning. Prior to

More information

Neural Networks for Real-time Pathfinding in Computer Games

Neural Networks for Real-time Pathfinding in Computer Games Neural Networks for Real-time Pathfinding in Computer Games Ross Graham 1, Hugh McCabe 1 & Stephen Sheridan 1 1 School of Informatics and Engineering, Institute of Technology at Blanchardstown, Dublin

More information

Playing Atari Games with Deep Reinforcement Learning

Playing Atari Games with Deep Reinforcement Learning Playing Atari Games with Deep Reinforcement Learning 1 Playing Atari Games with Deep Reinforcement Learning Varsha Lalwani (varshajn@iitk.ac.in) Masare Akshay Sunil (amasare@iitk.ac.in) IIT Kanpur CS365A

More information

IBM SPSS Neural Networks

IBM SPSS Neural Networks IBM Software IBM SPSS Neural Networks 20 IBM SPSS Neural Networks New tools for building predictive models Highlights Explore subtle or hidden patterns in your data. Build better-performing models No programming

More information

City Research Online. Permanent City Research Online URL:

City Research Online. Permanent City Research Online URL: Child, C. H. T. & Trusler, B. P. (2014). Implementing Racing AI using Q-Learning and Steering Behaviours. Paper presented at the GAMEON 2014 (15th annual European Conference on Simulation and AI in Computer

More information

USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES

USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES Thomas Hartley, Quasim Mehdi, Norman Gough The Research Institute in Advanced Technologies (RIATec) School of Computing and Information

More information

Dynamic Throttle Estimation by Machine Learning from Professionals

Dynamic Throttle Estimation by Machine Learning from Professionals Dynamic Throttle Estimation by Machine Learning from Professionals Nathan Spielberg and John Alsterda Department of Mechanical Engineering, Stanford University Abstract To increase the capabilities of

More information

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH Santiago Ontañón so367@drexel.edu Recall: Adversarial Search Idea: When there is only one agent in the world, we can solve problems using DFS, BFS, ID,

More information

Keytar Hero. Bobby Barnett, Katy Kahla, James Kress, and Josh Tate. Teams 9 and 10 1

Keytar Hero. Bobby Barnett, Katy Kahla, James Kress, and Josh Tate. Teams 9 and 10 1 Teams 9 and 10 1 Keytar Hero Bobby Barnett, Katy Kahla, James Kress, and Josh Tate Abstract This paper talks about the implementation of a Keytar game on a DE2 FPGA that was influenced by Guitar Hero.

More information

Generating an appropriate sound for a video using WaveNet.

Generating an appropriate sound for a video using WaveNet. Australian National University College of Engineering and Computer Science Master of Computing Generating an appropriate sound for a video using WaveNet. COMP 8715 Individual Computing Project Taku Ueki

More information

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Scott Watson, Andrew Vardy, Wolfgang Banzhaf Department of Computer Science Memorial University of Newfoundland St John s.

More information

CS 480: GAME AI TACTIC AND STRATEGY. 5/15/2012 Santiago Ontañón

CS 480: GAME AI TACTIC AND STRATEGY. 5/15/2012 Santiago Ontañón CS 480: GAME AI TACTIC AND STRATEGY 5/15/2012 Santiago Ontañón santi@cs.drexel.edu https://www.cs.drexel.edu/~santi/teaching/2012/cs480/intro.html Reminders Check BBVista site for the course regularly

More information

arxiv: v1 [cs.lg] 30 May 2016

arxiv: v1 [cs.lg] 30 May 2016 Deep Reinforcement Learning Radio Control and Signal Detection with KeRLym, a Gym RL Agent Timothy J O Shea and T. Charles Clancy Virginia Polytechnic Institute and State University arxiv:1605.09221v1

More information

Experiment 02 Interaction Objects

Experiment 02 Interaction Objects Experiment 02 Interaction Objects Table of Contents Introduction...1 Prerequisites...1 Setup...1 Player Stats...2 Enemy Entities...4 Enemy Generators...9 Object Tags...14 Projectile Collision...16 Enemy

More information

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer Learning via Delayed Knowledge A Case of Jamming SaiDhiraj Amuru and R. Michael Buehrer 1 Why do we need an Intelligent Jammer? Dynamic environment conditions in electronic warfare scenarios failure of

More information

Swing Copters AI. Monisha White and Nolan Walsh Fall 2015, CS229, Stanford University

Swing Copters AI. Monisha White and Nolan Walsh  Fall 2015, CS229, Stanford University Swing Copters AI Monisha White and Nolan Walsh mewhite@stanford.edu njwalsh@stanford.edu Fall 2015, CS229, Stanford University 1. Introduction For our project we created an autonomous player for the game

More information

Sensible Chuckle SuperTuxKart Concrete Architecture Report

Sensible Chuckle SuperTuxKart Concrete Architecture Report Sensible Chuckle SuperTuxKart Concrete Architecture Report Sam Strike - 10152402 Ben Mitchell - 10151495 Alex Mersereau - 10152885 Will Gervais - 10056247 David Cho - 10056519 Michael Spiering Table of

More information

AI Agent for Ants vs. SomeBees: Final Report

AI Agent for Ants vs. SomeBees: Final Report CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 1 AI Agent for Ants vs. SomeBees: Final Report Wanyi Qian, Yundong Zhang, Xiaotong Duan Abstract This project aims to build a real-time game playing

More information

Reinforcement Learning Applied to a Game of Deceit

Reinforcement Learning Applied to a Game of Deceit Reinforcement Learning Applied to a Game of Deceit Theory and Reinforcement Learning Hana Lee leehana@stanford.edu December 15, 2017 Figure 1: Skull and flower tiles from the game of Skull. 1 Introduction

More information

An Adaptive Intelligence For Heads-Up No-Limit Texas Hold em

An Adaptive Intelligence For Heads-Up No-Limit Texas Hold em An Adaptive Intelligence For Heads-Up No-Limit Texas Hold em Etan Green December 13, 013 Skill in poker requires aptitude at a single task: placing an optimal bet conditional on the game state and the

More information

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu DeepStack: Expert-Level AI in Heads-Up No-Limit Poker Surya Prakash Chembrolu AI and Games AlphaGo Go Watson Jeopardy! DeepBlue -Chess Chinook -Checkers TD-Gammon -Backgammon Perfect Information Games

More information

CS221 Project: Final Report Raiden AI Agent

CS221 Project: Final Report Raiden AI Agent CS221 Project: Final Report Raiden AI Agent Lu Bian lbian@stanford.edu Yiran Deng yrdeng@stanford.edu Xuandong Lei xuandong@stanford.edu 1 Introduction Raiden is a classic shooting game where the player

More information

Biologically Inspired Embodied Evolution of Survival

Biologically Inspired Embodied Evolution of Survival Biologically Inspired Embodied Evolution of Survival Stefan Elfwing 1,2 Eiji Uchibe 2 Kenji Doya 2 Henrik I. Christensen 1 1 Centre for Autonomous Systems, Numerical Analysis and Computer Science, Royal

More information

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.

More information

STRATEGO EXPERT SYSTEM SHELL

STRATEGO EXPERT SYSTEM SHELL STRATEGO EXPERT SYSTEM SHELL Casper Treijtel and Leon Rothkrantz Faculty of Information Technology and Systems Delft University of Technology Mekelweg 4 2628 CD Delft University of Technology E-mail: L.J.M.Rothkrantz@cs.tudelft.nl

More information

Knowledge Enhanced Electronic Logic for Embedded Intelligence

Knowledge Enhanced Electronic Logic for Embedded Intelligence The Problem Knowledge Enhanced Electronic Logic for Embedded Intelligence Systems (military, network, security, medical, transportation ) are getting more and more complex. In future systems, assets will

More information

INTRODUCTION TO DEEP LEARNING. Steve Tjoa June 2013

INTRODUCTION TO DEEP LEARNING. Steve Tjoa June 2013 INTRODUCTION TO DEEP LEARNING Steve Tjoa kiemyang@gmail.com June 2013 Acknowledgements http://ufldl.stanford.edu/wiki/index.php/ UFLDL_Tutorial http://youtu.be/ayzoubkuf3m http://youtu.be/zmnoatzigik 2

More information

Turtlebot Laser Tag. Jason Grant, Joe Thompson {jgrant3, University of Notre Dame Notre Dame, IN 46556

Turtlebot Laser Tag. Jason Grant, Joe Thompson {jgrant3, University of Notre Dame Notre Dame, IN 46556 Turtlebot Laser Tag Turtlebot Laser Tag was a collaborative project between Team 1 and Team 7 to create an interactive and autonomous game of laser tag. Turtlebots communicated through a central ROS server

More information

Creating a Dominion AI Using Genetic Algorithms

Creating a Dominion AI Using Genetic Algorithms Creating a Dominion AI Using Genetic Algorithms Abstract Mok Ming Foong Dominion is a deck-building card game. It allows for complex strategies, has an aspect of randomness in card drawing, and no obvious

More information

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland An Introduction to Convolutional Neural Networks Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland Sources & Resources - Andrej Karpathy, CS231n http://cs231n.github.io/convolutional-networks/

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence CS482, CS682, MW 1 2:15, SEM 201, MS 227 Prerequisites: 302, 365 Instructor: Sushil Louis, sushil@cse.unr.edu, http://www.cse.unr.edu/~sushil Non-classical search - Path does not

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1

CS 188 Fall Introduction to Artificial Intelligence Midterm 1 CS 188 Fall 2018 Introduction to Artificial Intelligence Midterm 1 You have 120 minutes. The time will be projected at the front of the room. You may not leave during the last 10 minutes of the exam. Do

More information

VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL

VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL Doron Sobol 1, Lior Wolf 1,2 & Yaniv Taigman 2 1 School of Computer Science, Tel-Aviv University 2 Facebook AI Research ABSTRACT

More information

Research on Hand Gesture Recognition Using Convolutional Neural Network

Research on Hand Gesture Recognition Using Convolutional Neural Network Research on Hand Gesture Recognition Using Convolutional Neural Network Tian Zhaoyang a, Cheng Lee Lung b a Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China E-mail address:

More information

Andrei Behel AC-43И 1

Andrei Behel AC-43И 1 Andrei Behel AC-43И 1 History The game of Go originated in China more than 2,500 years ago. The rules of the game are simple: Players take turns to place black or white stones on a board, trying to capture

More information

Tutorial of Reinforcement: A Special Focus on Q-Learning

Tutorial of Reinforcement: A Special Focus on Q-Learning Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model

More information

Challenges in Transition

Challenges in Transition Challenges in Transition Keynote talk at International Workshop on Software Engineering Methods for Parallel and High Performance Applications (SEM4HPC 2016) 1 Kazuaki Ishizaki IBM Research Tokyo kiszk@acm.org

More information

New Developments in VBS3 GameTech 2014

New Developments in VBS3 GameTech 2014 New Developments in VBS3 GameTech 2014 Agenda VBS3 status VBS3 v3.4 released VBS3 v3.6 in development Key new VBS3 capabilities Paged, correlated terrain Command and control Advanced wounding Helicopter

More information

Implicit Fitness Functions for Evolving a Drawing Robot

Implicit Fitness Functions for Evolving a Drawing Robot Implicit Fitness Functions for Evolving a Drawing Robot Jon Bird, Phil Husbands, Martin Perris, Bill Bigge and Paul Brown Centre for Computational Neuroscience and Robotics University of Sussex, Brighton,

More information

CS221 Project Final Report Gomoku Game Agent

CS221 Project Final Report Gomoku Game Agent CS221 Project Final Report Gomoku Game Agent Qiao Tan qtan@stanford.edu Xiaoti Hu xiaotihu@stanford.edu 1 Introduction Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally

More information

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game Outline Game Playing ECE457 Applied Artificial Intelligence Fall 2007 Lecture #5 Types of games Playing a perfect game Minimax search Alpha-beta pruning Playing an imperfect game Real-time Imperfect information

More information

Deep Green. System for real-time tracking and playing the board game Reversi. Final Project Submitted by: Nadav Erell

Deep Green. System for real-time tracking and playing the board game Reversi. Final Project Submitted by: Nadav Erell Deep Green System for real-time tracking and playing the board game Reversi Final Project Submitted by: Nadav Erell Introduction to Computational and Biological Vision Department of Computer Science, Ben-Gurion

More information

Radio Deep Learning Efforts Showcase Presentation

Radio Deep Learning Efforts Showcase Presentation Radio Deep Learning Efforts Showcase Presentation November 2016 hume@vt.edu www.hume.vt.edu Tim O Shea Senior Research Associate Program Overview Program Objective: Rethink fundamental approaches to how

More information

Responding to Voice Commands

Responding to Voice Commands Responding to Voice Commands Abstract: The goal of this project was to improve robot human interaction through the use of voice commands as well as improve user understanding of the robot s state. Our

More information

What is Artificial Intelligence? Alternate Definitions (Russell + Norvig) Human intelligence

What is Artificial Intelligence? Alternate Definitions (Russell + Norvig) Human intelligence CSE 3401: Intro to Artificial Intelligence & Logic Programming Introduction Required Readings: Russell & Norvig Chapters 1 & 2. Lecture slides adapted from those of Fahiem Bacchus. What is AI? What is

More information

Training a Neural Network for Checkers

Training a Neural Network for Checkers Training a Neural Network for Checkers Daniel Boonzaaier Supervisor: Adiel Ismail June 2017 Thesis presented in fulfilment of the requirements for the degree of Bachelor of Science in Honours at the University

More information

Learning and Using Models of Kicking Motions for Legged Robots

Learning and Using Models of Kicking Motions for Legged Robots Learning and Using Models of Kicking Motions for Legged Robots Sonia Chernova and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {soniac, mmv}@cs.cmu.edu Abstract

More information

It s Over 400: Cooperative reinforcement learning through self-play

It s Over 400: Cooperative reinforcement learning through self-play CIS 520 Spring 2018, Project Report It s Over 400: Cooperative reinforcement learning through self-play Team Members: Hadi Elzayn (PennKey: hads; Email: hads@sas.upenn.edu) Mohammad Fereydounian (PennKey:

More information

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017 Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER 2017 April 6, 2017 Upcoming Misc. Check out course webpage and schedule Check out Canvas, especially for deadlines Do the survey by tomorrow,

More information

Deep Reinforcement Learning and Forward Modeling for StarCraft AI

Deep Reinforcement Learning and Forward Modeling for StarCraft AI M2 Mathématiques, Vision et Apprentissage École Normale Supérieure de Cachan Deep Reinforcement Learning and Forward Modeling for StarCraft AI Internship Report Alex Auvolat Under the supervision of: Gabriel

More information

Hierarchical Controller for Robotic Soccer

Hierarchical Controller for Robotic Soccer Hierarchical Controller for Robotic Soccer Byron Knoll Cognitive Systems 402 April 13, 2008 ABSTRACT RoboCup is an initiative aimed at advancing Artificial Intelligence (AI) and robotics research. This

More information

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Programming an Othello AI Michael An (man4), Evan Liang (liange) Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black

More information

Stress Testing the OpenSimulator Virtual World Server

Stress Testing the OpenSimulator Virtual World Server Stress Testing the OpenSimulator Virtual World Server Introduction OpenSimulator (http://opensimulator.org) is an open source project building a general purpose virtual world simulator. As part of a larger

More information