ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT

Size: px
Start display at page:

Download "ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT"

Transcription

1 ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT PATRICK HALUPTZOK, XU MIAO Abstract. In this paper the development of a robot controller for Robocode is discussed. The Java robot consisted of an aiming strategy based on modeling the enemy robot s movements with a Markov model. The gun was aimed to the location the enemy robot was expected to be when the bullet had traveled far enough to hit the enemy. The robot s movements were based on maintaining an optimal distance from the enemy to enable dodging the enemy s firing and still being close enough to hit the enemy reliably. The Q learning method was used to learn the optimal movement strategy. The robot won the in class robot war competition. 1. AIBot movement strategy A general introduction to Robocode is presented in Appendix 1, the following description assumes a good understanding of general Robocode functionality and strategy. Often in a 1v1 contest both robots implement great aiming and tracking systems, so the key to victory is often in movement strategy. By missing bullets fired by the enemy and avoiding hitting the wall energy is preserved so you can outlast the enemy. Additionally maintaining good position where you aren t cornered so you can dodge bullets and don t bump the wall is important. We started by studying the tactics used by many of the most successful robots in the 1v1 tournaments. To avoid being able to be targeted properly by the enemy many tanks change their heading and velocity randomly continuously. Another strategy is monitoring the enemy s energy level. When the enemy fires a bullet its energy level will drop between.1 and 3.0 depending on the amount of energy given to the bullet. If my tank only fires bullets with energy of.8 or higher then when the enemy is hit by my bullet it s energy drops by 3.2 or more - so it is easy to tell if they enemy s energy level drop was because it fired a bullet or because it was hit by a bullet. Some tanks move fairly predictably and then when they detect the firing of a bullet they modify their movement exactly at that point so that if their movement has been modeled by the enemy it will likely miss. Q Learning was used to control the robot with the goal that it would learn to move the robot to the most advantageous field position that minimized the likelihood of being hit by the enemy or bumping into the wall. One feature to aid in Key words and phrases. Robocode, Reinforcement Learning. 1

2 2 PATRICK HALUPTZOK, XU MIAO Figure 1. Showing the evasion technique used to reduce the probability of being hit. The optimal strategy is to be randomly positioned in a uniform distribution of the range of, where is the range your robot can move by the time bullet arrives evasion was generated by monitoring the enemy s energy level and providing as a feature of the MDP space whether the enemy had fired in the previous time step. When the enemy fires a bullet it takes time to reach our robot, and the hope was the Q Learning would learn to take evasive action. There is a range of area we can reach by the time the bullet travels far enough to hit us. This range of area can be delimited by the minimum and maximum angle of the enemy s gun at the moment he fires. It was hoped the Q Learning evasion strategy would make our tank s position be uniformly distributed between the minimum and maximum gun angle settings of the enemy by the time the bullet arrives - so the best the enemy can do is randomly aim it s gun in that possible angle range and fire. It is the strategy that gives the minimal probability of our tank being hit by enemy fire. In the diagram below we illustrate this ideal evasion strategy, showing the angle that we want to uniformly distribute our position across Q-Learning. Q Learning was the approach implemented for controlling movement. The approach was to use a number of features to describe the current state of the world and from that state learn what the best movement to make would be. The utility estimate for each state action pair was stored in a table. The reward feedback from the environment was -1 for hitting a wall and -10 for being hit by an enemy bullet - these events caused events to fire in Robocode so the reward feedback was easy to track. The Q Learning feedback was independent of the accuracy of firing to simplify the approach. This has the obvious problem that as my robot stays further from the enemy it will likely get hit less often, but also miss hitting the enemy more often. To start training Q(s,a) was initialized with very optimistic utility values to start - to encourage it to try all the state action pairs. In Q Learning my MDP state space was defined by the horizontal and vertical location of the enemy tank, the angle of my tank from the enemy tank, the distance from the enemy tank, and whether the enemy tank had just fired. The state space would be too large using raw measurements so I bucketed the measurements. There was 7 horizontal, 7 vertical, 8 angle, 6 distance and 2 enemy fired buckets giving 7x7x8x6x2=4704 unique states.

3 3 3 QLStates[HORZ B UCKET S][V ERT B UCKET S][DIST B UCKET S][ANGLE B UCKET S][F IRED B U For each state I allowed 7x7=49 total actions - 7 of the actions were moving closer, further, or staying the same relative to the enemy. 7 were related to moving clockwise, counter-clockwise or staying the same angle relative to the enemy. Each movement in distance or angle was for each direction a small amount (1), a large amount (15), or a random amount (15 * random). The update formula was: (1.1) aestatecosts[ihorzp rev][iv ertp rev][idistp rev][ianglep rev][if iredp rev][iangleactp rev][idistactp rev Where alpha was 0.01 and gamma was and ebestaction was the best aestatecosts for the current state across all the current possible actions, which are the last 2 array indexes. Q Learning was fun to watch converge. I saved the updated weights out from each battle that it fought in to load in the next fight. So initially it would just sit or do very random movements. After a number of battles it would converge to somewhat reasonable behavior, circling the opponent a weaving back and forth. Eventually it converged to a response - but there was enough noise and variation in the Q(s,a) values over time from the just random luck of how well the opponent was doing in targeting the robot to change the favored action for each state from time to time. In implementing the Q-Learning approach to control movement, multiple potential actions combined with different combinations of features to define the input space were tried. One potential problem was the coarseness of the state space. In it taking an action in a state often resulted in still being in the same state. For example being really close to the enemy will result in getting hit by an enemy s bullet with a much higher probability. So moving away should give a higher utility than moving closer or staying in the same position - but before you move far enough away to matter statistically and get into the state for the further away bucket you may get hit multiple times, changing the Q(s,a) for moving away to not look so great. Possibly I could do better here if I updated the Q(s,a) learning to use a model - it knows what state its heading towards and could use that information to take a combined weight of the current state and the next state its heading towards. Also a finer grained state space or using function approximation for the utility function with a decaying alpha may result in better stabilization of the optimal movement strategy. In addition to our learning strategies for moving and shooting Patrick built another hard-coded tank for CSE573 assignment 1. This tank s hard-coded movement and aiming strategies could be replaced independently with the learning based strategies. A table below shows how adding combining the AI and non-ai implementations compared to other tanks. The important point illustrated in Table 1 is the serious degradation in performance the AI approaches have when combined due to the numerous missed time slices Lessons Learned. First the most effective tank needs to be able to react quickly at run time. Many of our learning techniques used a great deal of computing time, causing missed time slices which resulted in sluggish and less optimal

4 4 PATRICK HALUPTZOK, XU MIAO Figure 2. Motion prediction performance. If optimizing tank effectiveness was the only priority we would simplify our learning techniques to be less ambitious but be guaranteed to run in the time slices allotted - and hard-code more of the strategy that we were learning directly into the tank. Another choice would be to modify Robocode to allow more time per time step. 2. AIBot aiming strategy The aiming itself actually is an instance of motion tracking problem, in our agent, which can be summarized as: Observe opponent s movement Update inner states Predict opponent s next movement Fire According to this scheme, we applied several different models and algorithms. And all of them are based on the basic physics of the motion Physics of motion. In every tick, the tank has a specific speed (v) and a heading. And the tank will go along the direction of the heading with distance of the magnitude of the speed. After that, the tank will turn a specific turning angle (θ). Also, 8 v 8 ( v) θ ( v) So the prediction scheme is to sum up all the predicted vectors as shown in Fig1. This technique is also called virtual bullets. One important thing is in every tick the possible acceleration rate is fixed to be 1, 0 or -2. So if at time slice t, the opponent is at speed v, the next time slice there are only 3 possible value of the speed.

5 Simple Markov Model. It is natural to think of this problem as a markov process. Here we describe a simple markov model which is the first model we implements. States: < v, θ > We update the transition probability according to the observation. These two variables are fully observable, so the updating is only a task of statistics. The prediction is to find the next state with the maximum likelyhood. S t+1 = argmax St+1 T (S t, S t+1 ) This stationary Markov Model provides an average motion approximation. It works well onto sample robots, because they are all single strategy of movement. However mostly smarter robot can switch among strategies which can not be described by a simple stationary model, even if there are only two strategies. So we need some more sophisticated approaches. A second thought comes to us is a n-order Markov process. We implements a 2-order Markov process. It turns out that the accuracy is improved a little, but it slows the robot a lot as the time goes on(a huge transition matrix). We tried to prune some states by examining the probability according to the conditional probability distribution and independent conditions. But that makes it even worse Reinforcement Learning. In robocode project, we always have no prior knowledge about the transition model about the motion of the enemy. So we turn on to reinforcement learning, since this technique can be model-free. We reduce the tracking problem as: States: < v, θ > Action: predict the next state the opponent will be in Reward: The next observed state gets immediate reward 1 If fired bullet hits the opponent, all the predicted states, generating that bullet, get delayed reward 4 If fired bullet missed the opponent, all the predicted states, generating that bullet, get delayed reward -2 If fired bullet hits bullet, all the predicted states, generating that bullet, get delayed reward 1 There is only one action that is to find the next state with maximal possible utility. S t+1 = argmax St+1 T (S t, a, S t+1 ) U(S t+1 ) The action is fixed, so the policy is also fixed too. We implement both TDlearning and ADP-learning algorithm. ADP-learning has better accuracy, but time consuming so that it skips a lot of turns and does nothing. So the overall performance is worse Modified Reinforcement Learning. Using the immediate reward corresponding to single state seems not very correct, because it is the transition from one state to another state causes the reward. So after the first RL model, we did a slight change on that: Make the immediate reward associating the path instead of state. The modified model is:

6 6 PATRICK HALUPTZOK, XU MIAO Table 1. a. Against SpinBot PA HA MR WIN/LOSE SMM 86% % 20/0 TD-RL 2% % 18/2 ADL-RL 23% % 18/2 TD-MRL 89% % 20/0 Table 2. b. Against Marvin I PA HA MR WIN/LOSE SMM 35% TD-RL 18% % 16/4 ADL-RL 23% % 10/10 TD-MRL 32% 1.4 0% 19/1 States: < S t 1, S t > Action: predict the next path the opponent will be taking Reward: The next observed path gets immediate reward 1 If fired bullet hits the opponent, all the predicted pathes, generating that bullet, get delayed reward 4 If fired bullet missed the opponent, all the predicted pathes, generating that bullet, get delayed reward -2 If fired bullet hits bullet, all the predicted pathes, generating that bullet, get delayed reward 1 The action prediction is: P t+1 = argmax Pt+1 T (P t, a, P t+1 ) U(P t+1 ) = argmax Pt+1 T (S t, a, S t+1 ) U(P t+1 ) Then we implemented a TD-learning algorithm. And also this is the model we used in the final tournament Experiment And Analysis Performance measure of SMM, TD-RL, ADP-RL and TD-MRL. We measures three values: prediction accuracy(pa): The accuracy of the next state predicted based on current state. hitenemy accuracy(ha): The accuracy of our robot hitting enemy. HA = ihit/im issed missedturn rate(mr): The rate of missed turn, which represents the computational intensiveness. M R = im issedt urn/(ihit+im issed+ihitbullet+ im issedt urn) We made games against a spinbot, a Marvin I and a Fractal MC respectively. Marvin I is the first version of Marvin robot from our colleague, Fractal MC is a pure dodging robot with excellent movement behavior. The result is shown in Table 1. From Table 1.b we can see TD-MRL did better than others although its PA is not as good as SMM.

7 3 7 Table 3. c. Against Fractal MC PA HA MR WIN/LOSE SMM 8% % 3/17 TD-RL 12% % 15/5 ADL-RL TD-MRL 23% % 5/15 Figure 3. Motion prediction For this phenomenon, we thought the reinforcement learning is helping to achieving the goal of making it more accurate that the summed predicting vector equals the actual moving vector (shown in Fig2), instead of achieving the goal of making it more accurate to predict the next time slice state. Another interesting thing is TD-RL doing much better to Fractal MC, on the other hand, TD-MRL didn t do that good. At the same time, we found actually SMM did better in simple pattern robot. It seems SMM can predict more accurately on simple pattern, meanwhile, RL bot can predict more complicated patterns Adaptation measure. Not like a general reinforcement learning problem, robocode project gives the reward dynamically. In other words, the reward is not directly depended on the state we formulated. The reward function is changing along the time. Therefore, the utility of a state normally not converging at all. Instead, it is oscillating (as shown in Fig4). Intuitively we think this oscillation is the response of the opponent s switching strategies Opened problems. In our RL models, we only think of the states with velocity and turning angle. However there are a lot of events affecting the opponent s movement we ignored, for example, if the opponent approaching to the wall, if the

8 8 PATRICK HALUPTZOK, XU MIAO Utility(<0,0>) 1 u time Figure 4. Motion prediction opponent is approaching to our robot, if the opponent having sensed a bullet approaching, if it is long enough to switch the strategy of the opponent and etc. All these could be the inner states too. However to characterize all these states and computation with full state space will cause the robot slowed down(higher MR) and finally disabled due to too many skipped turns. Developing a fast algorithm dealing with big states space could be a open problem. Another open problem is to learn the firing power. We could consider our robot firing with different firing power as different actions, then use the full reinforcement learning method to generate the optimal policy. Again, it will also make the computation heavy BN with latent variables model. The normal model for modelling motion is using Bayesian Nets or Hidden Markov Model. So after exploring the application of reinforcement learning, we explored the possibility of application of DBN approaches. Finally we come out with a BN with latent variables model. It is similar to HMM. HMM uses only one factored variable describing the state. We could vectorize the state space and serialized into one variable, but there are some features could be directly updated by the evidences/observations instead of using the whole transition matrix, for example, the velocity and turning angle. So we choose BN with latent variable to represent the model.

9 3 9 To make it simple, we start with thinking one latent variable HittingWall?, so the problem is formulated as: States: < v, θ, HittingW all? > Evidences: < v, θ, d wall, d center >, here d wall is the minimal distance of the opponent to one of the walls, and d center is the distance of the opponent to the center of the game field Transition model: online updating Sensor model: the linear Guassian model P (< d wall, d center > HittingW all?) = 1 e (HittingW all? d wall +(1 HittingW all?) d center ) 2 λ 2 2πλ P (< v, θ > < v, θ >) = 1; P (E t X t ) = P (< d wall, d center > HittingW all?)p (< v, θ > < v, θ >) = P (< d wall, d center > HittingW all?) So the Forward-Backward algorithm is now as: Forward(); Normalize(); Backward(); UpdateTransition(); this is to update transition matrix by the estimated states EM(λ); this is to update the sensor model to get maximal likelihood of data classification(according to HittingWall?) In addition, we assume: (w λ)(h λ) (2.1) P (HittingW all?) = 1 w h Here w is the width of the battle field, h is the height of the battle field. In the prediction part, we uses Viterbi algorithm. All of these algorithms require the history from the start time point, but that will make the computation really slow. So we keep a piece of memory with a certain length. So that every time learning, we only learn a certain time after situation, and every time predicting, we only predict within a certain time. We started this approach after the final tournament, but we could t finish it due to the tight time. We think this one will working better but might be slower. So the total performance might not be better. 3. Conclusions AI techniques can be used to control the movement of a robot or predict the movement of an enemy robot for aiming. However great robot performance requires structuring the robot with lots of domain knowledge. Providing the learning algorithm with the important relevant features to model the enemy s movement, and designing effective performance measures for optimizing the movement and distance from the enemy are just as important as the specific learning algorithms used. Robocode robots are a great example of how important domain knowledge is

10 10 PATRICK HALUPTZOK, XU MIAO combined with AI to get great results. Approaches in class that didn t incorporate much domain knowledge - such as the GA approach that just tried to converge from basic optimizing rough performance measures didn t converge to optimal solutions - the space of solutions was too large. Constraining the solution space by hardcoding in known important features and basic frame work functionality allowed much more effective solutions to be found. References [1] [2] Russel, S. J. and Norvig, P. (2003) Artificial Intelligence A Modern Approach Second Edition: [3] Sutton, R.S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA 4. Division of Labor Xu Miao wrote the aiming code and that section of the report and Patrick Haluptzok wrote the movement code and the rest of the report. Microsoft Department of Computer Science and Engineering, University of Washington address: xm@u.washington.edu, patrickh@windows.microsoft.com

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:

More information

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

A. Rules of blackjack, representations, and playing blackjack

A. Rules of blackjack, representations, and playing blackjack CSCI 4150 Introduction to Artificial Intelligence, Fall 2005 Assignment 7 (140 points), out Monday November 21, due Thursday December 8 Learning to play blackjack In this assignment, you will implement

More information

Texas Hold em Inference Bot Proposal. By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005

Texas Hold em Inference Bot Proposal. By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005 Texas Hold em Inference Bot Proposal By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005 1 Introduction One of the key goals in Artificial Intelligence is to create cognitive systems that

More information

Learning to Play Love Letter with Deep Reinforcement Learning

Learning to Play Love Letter with Deep Reinforcement Learning Learning to Play Love Letter with Deep Reinforcement Learning Madeleine D. Dawson* MIT mdd@mit.edu Robert X. Liang* MIT xbliang@mit.edu Alexander M. Turner* MIT turneram@mit.edu Abstract Recent advancements

More information

Alternation in the repeated Battle of the Sexes

Alternation in the repeated Battle of the Sexes Alternation in the repeated Battle of the Sexes Aaron Andalman & Charles Kemp 9.29, Spring 2004 MIT Abstract Traditional game-theoretic models consider only stage-game strategies. Alternation in the repeated

More information

CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9

CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9 CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9 Learning to play blackjack In this assignment, you will implement

More information

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) AI Plays 2048 Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) Abstract The strategy game 2048 gained great popularity quickly. Although it is easy to play, people cannot win the game easily,

More information

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( ) COMP3211 Project Artificial Intelligence for Tron game Group 7 Chiu Ka Wa (20369737) Chun Wai Wong (20265022) Ku Chun Kit (20123470) Abstract Tron is an old and popular game based on a movie of the same

More information

CS221 Project: Final Report Raiden AI Agent

CS221 Project: Final Report Raiden AI Agent CS221 Project: Final Report Raiden AI Agent Lu Bian lbian@stanford.edu Yiran Deng yrdeng@stanford.edu Xuandong Lei xuandong@stanford.edu 1 Introduction Raiden is a classic shooting game where the player

More information

Five-In-Row with Local Evaluation and Beam Search

Five-In-Row with Local Evaluation and Beam Search Five-In-Row with Local Evaluation and Beam Search Jiun-Hung Chen and Adrienne X. Wang jhchen@cs axwang@cs Abstract This report provides a brief overview of the game of five-in-row, also known as Go-Moku,

More information

Hierarchical Controller for Robotic Soccer

Hierarchical Controller for Robotic Soccer Hierarchical Controller for Robotic Soccer Byron Knoll Cognitive Systems 402 April 13, 2008 ABSTRACT RoboCup is an initiative aimed at advancing Artificial Intelligence (AI) and robotics research. This

More information

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play NOTE Communicated by Richard Sutton TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play Gerald Tesauro IBM Thomas 1. Watson Research Center, I? 0. Box 704, Yorktozon Heights, NY 10598

More information

CMSC 671 Project Report- Google AI Challenge: Planet Wars

CMSC 671 Project Report- Google AI Challenge: Planet Wars 1. Introduction Purpose The purpose of the project is to apply relevant AI techniques learned during the course with a view to develop an intelligent game playing bot for the game of Planet Wars. Planet

More information

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

Name: Your EdX Login: SID: Name of person to left: Exam Room: Name of person to right: Primary TA:

Name: Your EdX Login: SID: Name of person to left: Exam Room: Name of person to right: Primary TA: UC Berkeley Computer Science CS188: Introduction to Artificial Intelligence Josh Hug and Adam Janin Midterm I, Fall 2016 This test has 8 questions worth a total of 100 points, to be completed in 110 minutes.

More information

TUD Poker Challenge Reinforcement Learning with Imperfect Information

TUD Poker Challenge Reinforcement Learning with Imperfect Information TUD Poker Challenge 2008 Reinforcement Learning with Imperfect Information Outline Reinforcement Learning Perfect Information Imperfect Information Lagging Anchor Algorithm Matrix Form Extensive Form Poker

More information

CS188 Spring 2014 Section 3: Games

CS188 Spring 2014 Section 3: Games CS188 Spring 2014 Section 3: Games 1 Nearly Zero Sum Games The standard Minimax algorithm calculates worst-case values in a zero-sum two player game, i.e. a game in which for all terminal states s, the

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Reinforcement Learning Assumptions we made so far: Known state space S Known transition model T(s, a, s ) Known reward function R(s) not realistic for many real agents Reinforcement

More information

ECE 517: Reinforcement Learning in Artificial Intelligence

ECE 517: Reinforcement Learning in Artificial Intelligence ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: Case Studies and Gradient Policy October 29, 2015 Dr. Itamar Arel College of Engineering Department of Electrical Engineering and

More information

Optimal Yahtzee performance in multi-player games

Optimal Yahtzee performance in multi-player games Optimal Yahtzee performance in multi-player games Andreas Serra aserra@kth.se Kai Widell Niigata kaiwn@kth.se April 12, 2013 Abstract Yahtzee is a game with a moderately large search space, dependent on

More information

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS Thong B. Trinh, Anwer S. Bashi, Nikhil Deshpande Department of Electrical Engineering University of New Orleans New Orleans, LA 70148 Tel: (504) 280-7383 Fax:

More information

CS 221 Othello Project Professor Koller 1. Perversi

CS 221 Othello Project Professor Koller 1. Perversi CS 221 Othello Project Professor Koller 1 Perversi 1 Abstract Philip Wang Louis Eisenberg Kabir Vadera pxwang@stanford.edu tarheel@stanford.edu kvadera@stanford.edu In this programming project we designed

More information

Heads-up Limit Texas Hold em Poker Agent

Heads-up Limit Texas Hold em Poker Agent Heads-up Limit Texas Hold em Poker Agent Nattapoom Asavareongchai and Pin Pin Tea-mangkornpan CS221 Final Project Report Abstract Our project aims to create an agent that is able to play heads-up limit

More information

Learning from Hints: AI for Playing Threes

Learning from Hints: AI for Playing Threes Learning from Hints: AI for Playing Threes Hao Sheng (haosheng), Chen Guo (cguo2) December 17, 2016 1 Introduction The highly addictive stochastic puzzle game Threes by Sirvo LLC. is Apple Game of the

More information

The first topic I would like to explore is probabilistic reasoning with Bayesian

The first topic I would like to explore is probabilistic reasoning with Bayesian Michael Terry 16.412J/6.834J 2/16/05 Problem Set 1 A. Topics of Fascination The first topic I would like to explore is probabilistic reasoning with Bayesian nets. I see that reasoning under situations

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

Game Design Verification using Reinforcement Learning

Game Design Verification using Reinforcement Learning Game Design Verification using Reinforcement Learning Eirini Ntoutsi Dimitris Kalles AHEAD Relationship Mediators S.A., 65 Othonos-Amalias St, 262 21 Patras, Greece and Department of Computer Engineering

More information

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi Learning to Play like an Othello Master CS 229 Project Report December 13, 213 1 Abstract This project aims to train a machine to strategically play the game of Othello using machine learning. Prior to

More information

AI Learning Agent for the Game of Battleship

AI Learning Agent for the Game of Battleship CS 221 Fall 2016 AI Learning Agent for the Game of Battleship Jordan Ebel (jebel) Kai Yee Wan (kaiw) Abstract This project implements a Battleship-playing agent that uses reinforcement learning to become

More information

Multi-Robot Coordination. Chapter 11

Multi-Robot Coordination. Chapter 11 Multi-Robot Coordination Chapter 11 Objectives To understand some of the problems being studied with multiple robots To understand the challenges involved with coordinating robots To investigate a simple

More information

Towards Strategic Kriegspiel Play with Opponent Modeling

Towards Strategic Kriegspiel Play with Opponent Modeling Towards Strategic Kriegspiel Play with Opponent Modeling Antonio Del Giudice and Piotr Gmytrasiewicz Department of Computer Science, University of Illinois at Chicago Chicago, IL, 60607-7053, USA E-mail:

More information

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Last name: First name: SID: Class account login: Collaborators: CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Due: Monday 2/28 at 5:29pm either in lecture or in 283 Soda Drop Box (no slip days).

More information

TRI-ALLIANCE FABRICATING Mertztown, PA Job #1

TRI-ALLIANCE FABRICATING Mertztown, PA Job #1 Report on Vibratory Stress Relief Prepared by Bruce B. Klauba Product Group Manager TRI-ALLIANCE FABRICATING Mertztown, PA Job #1 TRI-ALLIANCE FABRICATING subcontracted VSR TECHNOLOGY to stress relieve

More information

A Learning Infrastructure for Improving Agent Performance and Game Balance

A Learning Infrastructure for Improving Agent Performance and Game Balance A Learning Infrastructure for Improving Agent Performance and Game Balance Jeremy Ludwig and Art Farley Computer Science Department, University of Oregon 120 Deschutes Hall, 1202 University of Oregon Eugene,

More information

Gilbert Peterson and Diane J. Cook University of Texas at Arlington Box 19015, Arlington, TX

Gilbert Peterson and Diane J. Cook University of Texas at Arlington Box 19015, Arlington, TX DFA Learning of Opponent Strategies Gilbert Peterson and Diane J. Cook University of Texas at Arlington Box 19015, Arlington, TX 76019-0015 Email: {gpeterso,cook}@cse.uta.edu Abstract This work studies

More information

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Monte Carlo Tree Search and AlphaGo Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Zero-Sum Games and AI A player s utility gain or loss is exactly balanced by the combined gain or loss of opponents:

More information

COOPERATIVE STRATEGY BASED ON ADAPTIVE Q- LEARNING FOR ROBOT SOCCER SYSTEMS

COOPERATIVE STRATEGY BASED ON ADAPTIVE Q- LEARNING FOR ROBOT SOCCER SYSTEMS COOPERATIVE STRATEGY BASED ON ADAPTIVE Q- LEARNING FOR ROBOT SOCCER SYSTEMS Soft Computing Alfonso Martínez del Hoyo Canterla 1 Table of contents 1. Introduction... 3 2. Cooperative strategy design...

More information

CS221 Project Final Report Gomoku Game Agent

CS221 Project Final Report Gomoku Game Agent CS221 Project Final Report Gomoku Game Agent Qiao Tan qtan@stanford.edu Xiaoti Hu xiaotihu@stanford.edu 1 Introduction Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally

More information

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Davis Ancona and Jake Weiner Abstract In this report, we examine the plausibility of implementing a NEAT-based solution

More information

Playing Othello Using Monte Carlo

Playing Othello Using Monte Carlo June 22, 2007 Abstract This paper deals with the construction of an AI player to play the game Othello. A lot of techniques are already known to let AI players play the game Othello. Some of these techniques

More information

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.

More information

Documentation and Discussion

Documentation and Discussion 1 of 9 11/7/2007 1:21 AM ASSIGNMENT 2 SUBJECT CODE: CS 6300 SUBJECT: ARTIFICIAL INTELLIGENCE LEENA KORA EMAIL:leenak@cs.utah.edu Unid: u0527667 TEEKO GAME IMPLEMENTATION Documentation and Discussion 1.

More information

Dipartimento di Elettronica Informazione e Bioingegneria Robotics

Dipartimento di Elettronica Informazione e Bioingegneria Robotics Dipartimento di Elettronica Informazione e Bioingegneria Robotics Behavioral robotics @ 2014 Behaviorism behave is what organisms do Behaviorism is built on this assumption, and its goal is to promote

More information

Dota2 is a very popular video game currently.

Dota2 is a very popular video game currently. Dota2 Outcome Prediction Zhengyao Li 1, Dingyue Cui 2 and Chen Li 3 1 ID: A53210709, Email: zhl380@eng.ucsd.edu 2 ID: A53211051, Email: dicui@eng.ucsd.edu 3 ID: A53218665, Email: lic055@eng.ucsd.edu March

More information

Photography Help Sheets

Photography Help Sheets Photography Help Sheets Phone: 01233 771915 Web: www.bigcatsanctuary.org Using your Digital SLR What is Exposure? Exposure is basically the process of recording light onto your digital sensor (or film).

More information

Reinforcement Learning Applied to a Game of Deceit

Reinforcement Learning Applied to a Game of Deceit Reinforcement Learning Applied to a Game of Deceit Theory and Reinforcement Learning Hana Lee leehana@stanford.edu December 15, 2017 Figure 1: Skull and flower tiles from the game of Skull. 1 Introduction

More information

Temporal-Difference Learning in Self-Play Training

Temporal-Difference Learning in Self-Play Training Temporal-Difference Learning in Self-Play Training Clifford Kotnik Jugal Kalita University of Colorado at Colorado Springs, Colorado Springs, Colorado 80918 CLKOTNIK@ATT.NET KALITA@EAS.UCCS.EDU Abstract

More information

Agent Smith: An Application of Neural Networks to Directing Intelligent Agents in a Game Environment

Agent Smith: An Application of Neural Networks to Directing Intelligent Agents in a Game Environment Agent Smith: An Application of Neural Networks to Directing Intelligent Agents in a Game Environment Jonathan Wolf Tyler Haugen Dr. Antonette Logar South Dakota School of Mines and Technology Math and

More information

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel Foundations of AI 6. Adversarial Search Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard & Bernhard Nebel Contents Game Theory Board Games Minimax Search Alpha-Beta Search

More information

Lets start learning how Wink s bottom sensors work. He can use these sensors to see lines and measure when the surface he is driving on has changed.

Lets start learning how Wink s bottom sensors work. He can use these sensors to see lines and measure when the surface he is driving on has changed. Lets start learning how Wink s bottom sensors work. He can use these sensors to see lines and measure when the surface he is driving on has changed. Bottom Sensor Basics... IR Light Sources Light Sensors

More information

Extending the STRADA Framework to Design an AI for ORTS

Extending the STRADA Framework to Design an AI for ORTS Extending the STRADA Framework to Design an AI for ORTS Laurent Navarro and Vincent Corruble Laboratoire d Informatique de Paris 6 Université Pierre et Marie Curie (Paris 6) CNRS 4, Place Jussieu 75252

More information

Learning and Using Models of Kicking Motions for Legged Robots

Learning and Using Models of Kicking Motions for Legged Robots Learning and Using Models of Kicking Motions for Legged Robots Sonia Chernova and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {soniac, mmv}@cs.cmu.edu Abstract

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1

CS 188 Fall Introduction to Artificial Intelligence Midterm 1 CS 188 Fall 2018 Introduction to Artificial Intelligence Midterm 1 You have 120 minutes. The time will be projected at the front of the room. You may not leave during the last 10 minutes of the exam. Do

More information

PHYSICS 220 LAB #1: ONE-DIMENSIONAL MOTION

PHYSICS 220 LAB #1: ONE-DIMENSIONAL MOTION /53 pts Name: Partners: PHYSICS 22 LAB #1: ONE-DIMENSIONAL MOTION OBJECTIVES 1. To learn about three complementary ways to describe motion in one dimension words, graphs, and vector diagrams. 2. To acquire

More information

Comp 3211 Final Project - Poker AI

Comp 3211 Final Project - Poker AI Comp 3211 Final Project - Poker AI Introduction Poker is a game played with a standard 52 card deck, usually with 4 to 8 players per game. During each hand of poker, players are dealt two cards and must

More information

CSE 258 Winter 2017 Assigment 2 Skill Rating Prediction on Online Video Game

CSE 258 Winter 2017 Assigment 2 Skill Rating Prediction on Online Video Game ABSTRACT CSE 258 Winter 2017 Assigment 2 Skill Rating Prediction on Online Video Game In competitive online video game communities, it s common to find players complaining about getting skill rating lower

More information

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Valentijn Muijrers 3275183 Valentijn.Muijrers@phil.uu.nl Supervisor: Gerard Vreeswijk 7,5 ECTS

More information

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN FACULTY OF COMPUTING AND INFORMATICS UNIVERSITY MALAYSIA SABAH 2014 ABSTRACT The use of Artificial Intelligence

More information

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017 Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER 2017 April 6, 2017 Upcoming Misc. Check out course webpage and schedule Check out Canvas, especially for deadlines Do the survey by tomorrow,

More information

AI Agents for Playing Tetris

AI Agents for Playing Tetris AI Agents for Playing Tetris Sang Goo Kang and Viet Vo Stanford University sanggookang@stanford.edu vtvo@stanford.edu Abstract Game playing has played a crucial role in the development and research of

More information

Statistical Analysis of Nuel Tournaments Department of Statistics University of California, Berkeley

Statistical Analysis of Nuel Tournaments Department of Statistics University of California, Berkeley Statistical Analysis of Nuel Tournaments Department of Statistics University of California, Berkeley MoonSoo Choi Department of Industrial Engineering & Operations Research Under Guidance of Professor.

More information

Making Simple Decisions CS3523 AI for Computer Games The University of Aberdeen

Making Simple Decisions CS3523 AI for Computer Games The University of Aberdeen Making Simple Decisions CS3523 AI for Computer Games The University of Aberdeen Contents Decision making Search and Optimization Decision Trees State Machines Motivating Question How can we program rules

More information

An Empirical Evaluation of Policy Rollout for Clue

An Empirical Evaluation of Policy Rollout for Clue An Empirical Evaluation of Policy Rollout for Clue Eric Marshall Oregon State University M.S. Final Project marshaer@oregonstate.edu Adviser: Professor Alan Fern Abstract We model the popular board game

More information

SHOCK AND VIBRATION RESPONSE SPECTRA COURSE Unit 4. Random Vibration Characteristics. By Tom Irvine

SHOCK AND VIBRATION RESPONSE SPECTRA COURSE Unit 4. Random Vibration Characteristics. By Tom Irvine SHOCK AND VIBRATION RESPONSE SPECTRA COURSE Unit 4. Random Vibration Characteristics By Tom Irvine Introduction Random Forcing Function and Response Consider a turbulent airflow passing over an aircraft

More information

City Research Online. Permanent City Research Online URL:

City Research Online. Permanent City Research Online URL: Child, C. H. T. & Trusler, B. P. (2014). Implementing Racing AI using Q-Learning and Steering Behaviours. Paper presented at the GAMEON 2014 (15th annual European Conference on Simulation and AI in Computer

More information

AI Agent for Ants vs. SomeBees: Final Report

AI Agent for Ants vs. SomeBees: Final Report CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 1 AI Agent for Ants vs. SomeBees: Final Report Wanyi Qian, Yundong Zhang, Xiaotong Duan Abstract This project aims to build a real-time game playing

More information

A Reinforcement Learning Approach for Solving KRK Chess Endgames

A Reinforcement Learning Approach for Solving KRK Chess Endgames A Reinforcement Learning Approach for Solving KRK Chess Endgames Zacharias Georgiou a Evangelos Karountzos a Matthia Sabatelli a Yaroslav Shkarupa a a Rijksuniversiteit Groningen, Department of Artificial

More information

Reactive Planning for Micromanagement in RTS Games

Reactive Planning for Micromanagement in RTS Games Reactive Planning for Micromanagement in RTS Games Ben Weber University of California, Santa Cruz Department of Computer Science Santa Cruz, CA 95064 bweber@soe.ucsc.edu Abstract This paper presents an

More information

USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES

USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES Thomas Hartley, Quasim Mehdi, Norman Gough The Research Institute in Advanced Technologies (RIATec) School of Computing and Information

More information

CSE548, AMS542: Analysis of Algorithms, Fall 2016 Date: Sep 25. Homework #1. ( Due: Oct 10 ) Figure 1: The laser game.

CSE548, AMS542: Analysis of Algorithms, Fall 2016 Date: Sep 25. Homework #1. ( Due: Oct 10 ) Figure 1: The laser game. CSE548, AMS542: Analysis of Algorithms, Fall 2016 Date: Sep 25 Homework #1 ( Due: Oct 10 ) Figure 1: The laser game. Task 1. [ 60 Points ] Laser Game Consider the following game played on an n n board,

More information

DeepMind Self-Learning Atari Agent

DeepMind Self-Learning Atari Agent DeepMind Self-Learning Atari Agent Human-level control through deep reinforcement learning Nature Vol 518, Feb 26, 2015 The Deep Mind of Demis Hassabis Backchannel / Medium.com interview with David Levy

More information

Learning a Value Analysis Tool For Agent Evaluation

Learning a Value Analysis Tool For Agent Evaluation Learning a Value Analysis Tool For Agent Evaluation Martha White Michael Bowling Department of Computer Science University of Alberta International Joint Conference on Artificial Intelligence, 2009 Motivation:

More information

Learning Unit Values in Wargus Using Temporal Differences

Learning Unit Values in Wargus Using Temporal Differences Learning Unit Values in Wargus Using Temporal Differences P.J.M. Kerbusch 16th June 2005 Abstract In order to use a learning method in a computer game to improve the perfomance of computer controlled entities,

More information

Generalized Game Trees

Generalized Game Trees Generalized Game Trees Richard E. Korf Computer Science Department University of California, Los Angeles Los Angeles, Ca. 90024 Abstract We consider two generalizations of the standard two-player game

More information

CSCI 699: Topics in Learning and Game Theory Fall 2017 Lecture 3: Intro to Game Theory. Instructor: Shaddin Dughmi

CSCI 699: Topics in Learning and Game Theory Fall 2017 Lecture 3: Intro to Game Theory. Instructor: Shaddin Dughmi CSCI 699: Topics in Learning and Game Theory Fall 217 Lecture 3: Intro to Game Theory Instructor: Shaddin Dughmi Outline 1 Introduction 2 Games of Complete Information 3 Games of Incomplete Information

More information

Learning Attentive-Depth Switching while Interacting with an Agent

Learning Attentive-Depth Switching while Interacting with an Agent Learning Attentive-Depth Switching while Interacting with an Agent Chyon Hae Kim, Hiroshi Tsujino, and Hiroyuki Nakahara Abstract This paper addresses a learning system design for a robot based on an extended

More information

CS325 Artificial Intelligence Ch. 5, Games!

CS325 Artificial Intelligence Ch. 5, Games! CS325 Artificial Intelligence Ch. 5, Games! Cengiz Günay, Emory Univ. vs. Spring 2013 Günay Ch. 5, Games! Spring 2013 1 / 19 AI in Games A lot of work is done on it. Why? Günay Ch. 5, Games! Spring 2013

More information

History and Philosophical Underpinnings

History and Philosophical Underpinnings History and Philosophical Underpinnings Last Class Recap game-theory why normal search won t work minimax algorithm brute-force traversal of game tree for best move alpha-beta pruning how to improve on

More information

UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces

UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces Jacob Schrum, Igor Karpov, and Risto Miikkulainen {schrum2,ikarpov,risto}@cs.utexas.edu Our Approach: UT^2 Evolve

More information

CS 188: Artificial Intelligence Fall AI Applications

CS 188: Artificial Intelligence Fall AI Applications CS 188: Artificial Intelligence Fall 2009 Lecture 27: Conclusion 12/3/2009 Dan Klein UC Berkeley AI Applications 2 1 Pacman Contest Challenges: Long term strategy Multiple agents Adversarial utilities

More information

Introduction to Spring 2009 Artificial Intelligence Final Exam

Introduction to Spring 2009 Artificial Intelligence Final Exam CS 188 Introduction to Spring 2009 Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet, double-sided. Please use non-programmable

More information

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms Felix Arnold, Bryan Horvat, Albert Sacks Department of Computer Science Georgia Institute of Technology Atlanta, GA 30318 farnold3@gatech.edu

More information

Q Learning Behavior on Autonomous Navigation of Physical Robot

Q Learning Behavior on Autonomous Navigation of Physical Robot The 8th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI 211) Nov. 23-26, 211 in Songdo ConventiA, Incheon, Korea Q Learning Behavior on Autonomous Navigation of Physical Robot

More information

LAB 1 Linear Motion and Freefall

LAB 1 Linear Motion and Freefall Cabrillo College Physics 10L Name LAB 1 Linear Motion and Freefall Read Hewitt Chapter 3 What to learn and explore A bat can fly around in the dark without bumping into things by sensing the echoes of

More information

Real-Time Connect 4 Game Using Artificial Intelligence

Real-Time Connect 4 Game Using Artificial Intelligence Journal of Computer Science 5 (4): 283-289, 2009 ISSN 1549-3636 2009 Science Publications Real-Time Connect 4 Game Using Artificial Intelligence 1 Ahmad M. Sarhan, 2 Adnan Shaout and 2 Michele Shock 1

More information

MyPawns OppPawns MyKings OppKings MyThreatened OppThreatened MyWins OppWins Draws

MyPawns OppPawns MyKings OppKings MyThreatened OppThreatened MyWins OppWins Draws The Role of Opponent Skill Level in Automated Game Learning Ying Ge and Michael Hash Advisor: Dr. Mark Burge Armstrong Atlantic State University Savannah, Geogia USA 31419-1997 geying@drake.armstrong.edu

More information

TJHSST Senior Research Project Evolving Motor Techniques for Artificial Life

TJHSST Senior Research Project Evolving Motor Techniques for Artificial Life TJHSST Senior Research Project Evolving Motor Techniques for Artificial Life 2007-2008 Kelley Hecker November 2, 2007 Abstract This project simulates evolving virtual creatures in a 3D environment, based

More information

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search CSE 473: Artificial Intelligence Fall 2017 Adversarial Search Mini, pruning, Expecti Dieter Fox Based on slides adapted Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Dan Weld, Stuart Russell or Andrew Moore

More information

Industry 4.0. Advanced and integrated SAFETY tools for tecnhical plants

Industry 4.0. Advanced and integrated SAFETY tools for tecnhical plants Industry 4.0 Advanced and integrated SAFETY tools for tecnhical plants Industry 4.0 Industry 4.0 is the digital transformation of manufacturing; leverages technologies, such as Big Data and Internet of

More information

Contents. List of Figures

Contents. List of Figures 1 Contents 1 Introduction....................................... 3 1.1 Rules of the game............................... 3 1.2 Complexity of the game............................ 4 1.3 History of self-learning

More information

CSE 473 Midterm Exam Feb 8, 2018

CSE 473 Midterm Exam Feb 8, 2018 CSE 473 Midterm Exam Feb 8, 2018 Name: This exam is take home and is due on Wed Feb 14 at 1:30 pm. You can submit it online (see the message board for instructions) or hand it in at the beginning of class.

More information

Evolving Robocode Tank Fighters

Evolving Robocode Tank Fighters Evolving Robocode Tank Fighters Jacob Eisenstein October 28, 2003 Abstract In this paper, I describe the application of genetic programming to evolve a controller for a robotic tank in a simulated environment.

More information

Biologically Inspired Embodied Evolution of Survival

Biologically Inspired Embodied Evolution of Survival Biologically Inspired Embodied Evolution of Survival Stefan Elfwing 1,2 Eiji Uchibe 2 Kenji Doya 2 Henrik I. Christensen 1 1 Centre for Autonomous Systems, Numerical Analysis and Computer Science, Royal

More information

HUJI AI Course 2012/2013. Bomberman. Eli Karasik, Arthur Hemed

HUJI AI Course 2012/2013. Bomberman. Eli Karasik, Arthur Hemed HUJI AI Course 2012/2013 Bomberman Eli Karasik, Arthur Hemed Table of Contents Game Description...3 The Original Game...3 Our version of Bomberman...5 Game Settings screen...5 The Game Screen...6 The Progress

More information

Swarm AI: A Solution to Soccer

Swarm AI: A Solution to Soccer Swarm AI: A Solution to Soccer Alex Kutsenok Advisor: Michael Wollowski Senior Thesis Rose-Hulman Institute of Technology Department of Computer Science and Software Engineering May 10th, 2004 Definition

More information

Intuition Mini-Max 2

Intuition Mini-Max 2 Games Today Saying Deep Blue doesn t really think about chess is like saying an airplane doesn t really fly because it doesn t flap its wings. Drew McDermott I could feel I could smell a new kind of intelligence

More information

Design. BE 1200 Winter 2012 Quiz 6/7 Line Following Program Garan Marlatt

Design. BE 1200 Winter 2012 Quiz 6/7 Line Following Program Garan Marlatt Design My initial concept was to start with the Linebot configuration but with two light sensors positioned in front, on either side of the line, monitoring reflected light levels. A third light sensor,

More information