Learning Agents in Quake III

Size: px
Start display at page:

Download "Learning Agents in Quake III"

Transcription

1 Learning Agents in Quake III Remco Bonse, Ward Kockelkorn, Ruben Smelik, Pim Veelders and Wilco Moerman Department of Computer Science University of Utrecht, The Netherlands Abstract This paper shows the results of applying Reinforcement Learning (RL) to train combat movement behaviour for a QUAKE III ARENA [3] bot. Combat movement of a QUAKE III bot is the part that steers the bot while in battle with an enemy bot or human player. The aim of this research is to train a bot to perform better than the standard QUAKE III bot it is based on, by only changing its combat movement behaviour. We extended a standard bot with a Q-learning algorithm that trains a Neural Network to map a game state vector to Q-values, one for each possible action. This RL bot (to which we refer as NeurioN) is trained in a reduced QUAKE III environment, to decrease noise and to make the training phase more effective. The training consists of one-to-one combat with its non-learning counterpart, in runs up to kills (frags). Reward is given for avoiding damage, thereby letting the bot learn to avoid getting hit. We found it is possible to improve a QUAKE III bot by using RL to train combat movement. The trained bot outperforms its non-learning counterparts, but does not appear smarter in combat with human opponents. 1 Introduction Games are more and more considered a suitable testing ground for artificial intelligence research [5]. Reasons for this include their ever increasing realism, their limited and simulated environment (in which agents do not require sensors and imaging techniques to perceive their surroundings), their accessibility and inexpensiveness and the fact that game industry is big business. Artificial Intelligence has been successfully applied to classic games like Othello and Chess, but, contemporary computer games, like First Person Shooters (FPS), which can be regarded as the most popular games nowadays, typically have limited artificial intelligence (rule-based agents or agents based on finite state machines). As a result, FPS agents (called bots ) are by far not able to compete with human players on their tactical and learning abilities. With the FPS rendering computations being moved more and more to the graphics card s GPU, CPU cycles become available for AI More elaborate and computational intensive AI techniques might now be used in these games, adding to the game s realism and creating more human-like artificial opponents. This research explores the application of Machine Learning (ML) methods to FPS s, focusing on one of the most popular FPS games: QUAKE III ARENA [3]. Bots in QUAKE III have limited intelligence and are no match for an expert human player. We will extent such a bot with ML capabilities and evaluate its performance against non-learning bots and human players. For more information on QUAKE III and its bots, we refer to [10, 11]. Previous research on applying AI techniques to FPS bots include [4], in which Laird et al. present a bot for QUAKE II, based on (800+) rules, that can infer how to set-up an ambush, in a way quite resembling a human player. Related to our research is the work of Zanetti et al. [12]. They have implemented a QUAKE III bot that uses three NN s for: movement during combat, aim and shoot (including selecting which weapon to use) and path planning in non-combat situations. Their 1

2 networks are trained using Genetic Algorithms (GA) and a training set of recordings of matches of expert human players. The goal is for the networks to imitate the behavior of these experts. The resulting bot turns out to be far from competitive, but still has learned several expert behaviors. A bot has several independent decision nodes (e.g. aiming and shooting, goal selection, chatting). Our focus is on movement behaviour of a bot in combat mode (i.e. upon encountering a nearby enemy). This behaviour typically includes dodging rockets and moving in complex patterns as to increase the aim and shoot difficulty for the enemy. Human expert players excel at combat movement with techniques as circle strafing and, of course, a lot of jumping around, thereby making it very difficult for their opponents to get a clear shot. Combat movement is one of the parts of FPS AI that is not often used in this type of research (whereas goal and weapon selection are). This made it interesting for us to examine. As mentioned, Evolutionary Algorithms have already been applied succesfully to similar problems in FPS s, albeit limited to research environments, yet little is known of the suitability of Reinforcement Learning (RL, see e.g. [1, 9]) in such games. Therefore, we have chosen to apply RL to improve the combat performance of a QUAKE III bot. As we did not have the time and rescources (i.e. expert players) to implement supervised learning and because supervised learning has already been applied (somewhat) succesfully [12], we use an unsupervised learning algorithm. 2 Reinforcement Learning We have implemented the Q-learning algorithm (see [9]) as described below. 1 Initialize Q(s, a) arbitrarily 2 Repeat ( for each combat sequence): 3 Initialize state s 4 Repeat ( for each step of combat sequence) 5 Select action a based on s using softmax 6 Execute a, receive reward r, s 7 Q(s, a) Q(s, a) + α[r + γ max a Q(s, a ) Q(s, a)] 8 s s 9 Until s is non - combat or bot is killed Listing 1: Q-learning algorithm As follows from Listing 1, we consider each combat sequence a RL episode. A combat sequence starts when the combat movement decision node of the QUAKE III AI is first addressed and ends when more than 10 in-game seconds have passed since the last call, in order to keep the Q-value function smooth. Since the QUAKE III engine decides when to move into combat mode it might be 100 milliseconds or just as well be 100 seconds since the last combat movement call. In this (fast) game this would mean a completely different state, which has minimal correlation with the action taken in the last known state. If the time limit has expired, the last state in the sequence cannot be rewarded properly and is discarded. However, if the bot is killed not too long after the last combat movement, it still receives a (negative) reward. We have also experimented with an adaptation of Q-learning known as Advantage Learning. Advantage Learning is implemented by replacing line 7 in Listing 1 with the following equation: A(s, a) A(s, a) + α [r + γ max a A(s, a ) A(s, a)] Tk Here the A-value of a s, a pair is the advantage the agent gets by selecting a in state s. T stands for the time passed since the last time the action is selected and k is a scaling factor. Both α (in Q-learning) and k (in Advantage Learning) are 0, 1]. Since we work with discrete time, the T term is always 1. Advantage learning uses the advantage that a certain state-action-pair (Qvalue) has over the current Q-value. This advantage is then scaled (using the scaling factor k). This (1) 2

3 algorithm is useful when the Q-values do not differ very much. Normal Q-learning would have to become increasingly accurate, to be able to represent the very small, but meaningfull differences in adjecent Q-values, since policy has to be able to accurately determine the maximum over all the Q-values in a given state. Since the Q-values are approximated, this poses a severe problem for the function approximator. It is easy for the approximator, to decrease the overall error by roughly approximating the Q-values, however, it is hard, requires lots of training time, and might even be impossible (given the structure of the approximator, for instance not enough hidden neurons) to accurately approximate the small but important differences in Q-values. Advantage Learning learns the scaled differences (advantages) between Q-values, which are larger, and therefore has a better chance of approximating these advantages, then Q-learning has in approximating the Q-values. Since the adavantages correlate to the Q-values, the policy can just take the maximum advantage when the best action needs to be selected. [2] For action selection we use a softmax-policy with Boltzmann selection. The chance P that action a is chosen in state s is defined as in Equation 2. P = e Q t (a) τ n b=1 e Q t (b) τ (2) As is common to softmax, action a 1 is chosen more often in state s if Q(s, a 1 ) Q(s, a i ), i A. But since the policy is stochastic, there will always be some exploration. In the beginning, much exploration is performed, because all Q-values are initialized with random values. However, as the learning process progresses, actions that have been rewarded highly, will have higher Q-values than others, thereby exponentially increasing their chance of being selected. Because the state space is continuous we use a Multilayer Perceptron (MLP) to approximate the Q-value function. The input layer consists of 10 neurons for the state vector; the output layer consists of 18 neurons that contain the Q-values for each action. We use one hidden layer, and vary the number of hidden neurons during our experiments, ranging from 0 to 30. Where 0 hidden neurons means there is no hidden layer. We use sigmoid activation functions for the hidden neurons and linear functions for the output neurons. This is because we expect a continuous output, but we also want to reduce the effects of noise. The sigmoid functions in the hidden layer filter out small noise in the input. In most experiments, we initialized the weights with a random value i from the uniform distribution over and We also conducted some experiments with a higher margin. More about the parameters of the MLP can be found in Section 4. We considered several reward rules that might lead to good tactical movement. Possible reward rules are: R (st,a) = (health t 1 health t ) health t 1 (3) R (st,a) = numfrags (health t 1 health t ) health t 1 (4) R (st,a) = (enemyhealth t 1 enemyhealth t ) enemyhealth t 1 (health t 1 health t ) health t 1 (5) R (st,a) = accuratehits (health t 1 health t ) health t 1 (6) Rule 3 mainly lets the bot minimize damage to itself and thus lead to self preserving bots, which favor evasive actions over aggressive actions. Rule 4 will eventually lead to good game play because it is our intuition this is what human players do; keep good health and whenever it is possible frag 1 someone. Frags do not happen very often and a good hit does not always mean a frag. Therefore, we came up with the next rule. In rule 5 every hit will be taken into account 1 Fragging is slang for killing someone in a computer game. 3

4 (a) Cube map (b) Round Rocket map Figure 1: Reduced training environments and it will be more attractive to attack a player with low health or run away if your own health is low. But we argued that the health loss of the enemy is not directly related to tactical movement. For example if the enemy happens to take a health pack, the last chosen action will be considered bad, while it could be a good evasive maneuver. Rule 6 does not have this disadvantage, but depends too much on aiming skills of the bot. We have chosen to use the rule (3) which rewards evasive actions, as this is the most important goal of combat movement: evade enemy fire. Rule 3 takes in consideration the minimal amount of information to evaluate the action. 3 Environment To eliminate a part of the randomness commonly present in FPS s, we use a reduced training environment in the form of a custom level, which is a simple round arena in which the bots fight (Figure 1(b)). In this map, bots enter combat mode almost immediately when they spawn. To eliminate the influences of different types of weapons, we have chosen to allow only the use of one type of weapon. To speed-up the learning process, the custom map does not feature any cover, nor health packs and other items, other than this weapon and corresponding ammunition. The bot we created, which we call NeurioN, is based on one of the standard QUAKE III bots, named Sarge. Except for their looks, sounds, and, most importantly, their combat movement behaviour, these bots are identical. For NeurioN we train this part of the AI with RL, while Sarge remained fixed during training. The combat movement is called irregulary by the bot AI decision loop. It is not necessary for a bot to be in combat mode to shoot at an opponent, but when a bot is close to an opponent and has the opponent in sight, the combat mode will be called. By making a small custom map in which the bots cannot hide from eachother, we forced the bots to enter their combat mode more often. The normal behaviour of a bot in combat mode is based on its preference to move and / or jump in unpredictable directions. In determining these directions, the environment is not taken into account. The same goes for incoming rockets, for which a bot is blind. The only exceptions to this are collisions with walls and falling off edges. Because it is not easy to extract information about imminent collisions or drops off edges, we have chosen not to include this information in the state vector of the bot we train. 3.1 Problems with the environment The environment used for the experiment, QUAKE III ARENA [3], seems very suitable for these types of experiments, since it s source code is freely available and the game is very stable and well known. However, it is not without its disadvantages. The first problem we ran into is the fact that despite the release of the source of QUAKE III, it is still obvious the game is a commercial product that was not created to be an experimental environment. The source code is hardly documented and despite some webpages on the internet 4

5 0.55 Sarge vs Sarge Control run (moving average 5000 shown every 500) Sarge 1 win % 0.5 Sarge total frags Figure 2: Advantage for the first spawned bot [7, 8, 6] information is scarce. The engine is created to work in real-time. While speeding up the simulation is possible, the training is still slow. During the creation our reduced training environment, we encountered several other problems. One problem was the cube shaped map we started out with (Figure 1(a)). When in combat mode, the learning bot does not take its static surroundings into account. Therefore more often than desirable NeurioN would end up in a corner, with a very large chance of getting shot. To counter this problem a round map was created, so that there are no corners where the bot can get stuck. Another problem we discovered while training in our map is when the map does not have any items for bots to pick up, they remain standing still, having no goals at all. If both bots are standing still with their backs facing each other, nothing will happen. This situation is quite rare and therefore took a while to discover what was ruining our experiments. The problem was tackled by adding excessive amounts of weapons and ammunition-crates. We also discovered a problem that is caused by the order the different bots in a game are given time to choose their actions. This is done sequentially, which results in a small but significant advantage for the bot that is added first to the arena. When two identical (non-learning, standard QUAKE III) bots are added to a game, the bot first in the list wins about 52% of the time, as can be seen in Figure 2. This is a significant difference. Finally we found out that the shooting/aiming precision of Sarge at the highest difficulty level is so good that there is not much room to evade the shots. This is especially the case with weapons that fire their ammunition with a high velocity. The bullets of these weapons are almost instantaneously, which means that they impact directly when fired. Because we wanted the bot to learn combat movement it was better to only use the rocket launcher as the only weapon. The rocket launcher is the weapon with the slowest ammunition in QUAKE III, so it is the easiest weapon to evade and thus the best weapon to learn tactical movement in combat. 3.2 States After various different sets of state information, we have finally chosen to use the following state vector: distance to opponent a number between 0 and 1, where 1 is very close and 0 is far away. This means that when the opponent is out of sight the distance will also be 0; relative angle of placement of opponent relative angle of bot to opponent: a number between 0 and 1, with 1 if the opponent is standing in front of the bot and 0 if the opponent is standing 5

6 (a) Opponent placement (b) Opponent viewangle (c) Projectile placement (d) Projectile direction behind the bot. See Figure 3(a); Figure 3: The state information relative side of placement of opponent 1 if the opponent is standing to the left, -1 if the opponent is standing on the right; relative angle of opponent to bot a number between 0 and 1, with 1 if the bot is standing in front of the opponent and 0 if the opponent is standing behind the opponent. See Figure 3(b); relative side of placement of the bot -1 or 1 depending on the sign of the previous input; distance to the nearest projectile a number between 0 and 1, where 1 is very close and 0 is far away. If there is no projectile heading towards Neurion 0 is also given; relative angle of placement of the nearest projectile a number between 0 and 1, with 1 if the projectile is in front of the bot and 0 if the projectile is behind the bot. See Figure 3(c); relative side of placement of the nearest projectile 1 if the projectile is to the left, -1 if the projectile is to the right; relative angle of nearest projectile to bot a number between 0 and 1, if the projectile is heading straight for the bot this will be 1, if the angle between the projectile and the bot is equal or larger than a certain threshold this will be 0. Any angle beyond the threshold indicates the projectile isn t a thread for the bot; See Figure 3(d); relative side of bot with regard to the nearest projectile 1 if Neurion is to the left, -1 if NeurioN is to the right of the projectile Information not used for states Some information that seems important to describe the game state is left out for the following reasons: Health of NeurioN the health of a player is of no influence on the movement of that player when in direct combat; Health of the opponent same as above, the health of the opponent is of no effect to the combat movement; Current weapon of the opponent in our training environment two ballistic weapons are available, the machine gun for which there is no extra ammunition available and the rocket launcher. The rocket launcher is by far the most popular weapon from those two for Sarge and thus also for NeurioN. As soon as they get one, they will switch to the rocket launcher. This results in effectively only one weapon that is used, therefore the weapon that the opponent uses is known and it is not necessary to include this information in the state vector. 6

7 3.3 Actions Figure 4: The actions visualized We consider 18 (legal) combinations of five elementary actions: Move forward (W); Move backward (S); Strafe left (A); Strafe right (D); Jump (J); So, the possible actions are {, W, WA, WJ,..., SDJ, D, DJ} (Figure 4). In this system other moves would be theoretically possible, like for example moving forward and backward (WS) simultaneously, but of course this is impossible, so these kind of options are left out as moves. This results in the 18 legal moves, 3 options for forward movement (being forward, nothing, backward), 3 options sideways in the same manner and 2 options for jumping (to jump or not to jump... ). The orientation is not considered in our combat movement, because it is determined by the aiming function of the bot, which would overwrite any change made in the rotation of the bot with its own values. The movement of a bot is relative to its orientation vector, given by the aiming function. 4 Setup of Experiments When using RL and NN s, a number of parameters have to be tweaked, as they have a large influence on the resulting training performance. The training phase consists of a game with a very high fraglimit ( 200, 000 frags). Each setting is run several times, to validate the results (as the game is stochastic and the network initialization may be of influence). Because RL starts each training run with a randomly initialized NN and contains several variables of which the optimal setting is unknown, we did a broad sweep in the parameter space. The settings we tried consist of the following variables: Number of hidden neurons n {0, 5, 15, 30} Discount γ {0.95, 0.80} Temperature τ {10, 50} Learningrate α {0.001, 0.003, 0.01} Timescale factor k = {1, 0.5, 0.25} 2 2 k = 1 is equal to Q-learning 7

8 Because we had some trouble with determining which information had to be part of the state vector and made some incorrect implementations of it, many training runs performed earlier unfortunately had to be discarded. 5 Results During the broad sweep, several combinations of parameters were found, that resulted in succesful learning after some 100,000 or more fraggs. It was found that 15 hidden neurons, combined with a learningrate of 0.01, a discount of 0.95 and a temperature of 50 worked well, see Figure 5(a). This was achieved with the Q-Learning algorithm (and therefore the advantage k factor equals 1). This combination of parameters therefore served as the base for further investigation, and if not otherwise mentioned, these values are used. Increasing the learningrate above 0.01 resulted in instable neural nets which sometimes learned, but most of the time remained at the initial level of success, see Figure 5(b). A learningrate of 0.01 was therefore used in further experiments. Increasing the range in which the initial weights were randomly distributed, to 0.1 or even 0.5 did not result in any stable learning (at least, with using base values for the other parameters). Sometimes a decent learning curve was seen, but most of the time learning did not occur. See Figure 6(a) A margin of 0.01 for the initialisation of the random weights was therefore used. Varying the timescale factor k never resulted in bad learning or non-learning situation, see Figure 6(b). Most of the values for k resulted in roughly the same learning curves, although k = 0.1 resulted in a quicker learning phase, but a less stable, more jagged curve, as the large standard deviation indicates. Decreasing or increasing the number of neurons to 10 did not result in good results (Figure 7(a)), or at least not in the timeframe that the base setting with 15 hidden neurons needed to achieve a good success rate. And since the base already needed 100, ,000 frags, taking some 10 hours of computing time, other numbers of hidden neurons were not extensivly investigated. A lower temperature of 10 did not results in learning, however a higher temp of 80 did (Figure 7(b)). A higher temp resulted in much quicker learning, but also a more jagged learning curve. As a comparison, the temperature was varied when 30 hidden neurons were used (other settings were the same as the base ). This experiment revealed that the temperature did not have any significant effect on the average that was reached, but the variation increased with the higher temperature, resulting in some good runs and some runs were nothing was learned, see Figure 8(a). 6 Conclusion Learning a bot how to move in combat situations is a difficult task. The experiment depends on both choosing the right input vector and settings as well as on the used environment, QUAKE III in our case. However, using a NN with 15 hidden neurons, the learning bot can be improved to a 65% chance of winning a 1-on-1 combat against its non learning equal. This means that a bot with a trained NN wins almost twice as often as a preprogrammed bot. The learning phase takes quite some time, often more than 100,000 to 200,000 frags, on a normal pc this takes about 10 hours. Some settings show a capricious learning process, but the speed of learning is higher in these cases. Despite the fact that the numerical results show an improvement, human players will hardly notice any difference in behaviour between the standard bot and the learning bot. This is largely due to the fact that the bot is not always in combat mode when shot at (as one might expect), hence not using his NN. 8

9 6.1 Future Work As an extension of the research described in this paper it is possible to look into the influence of a larger space vector, in which information concerning the surroundings of the bot is taken into account. Moreover it would be interesting to let the bot learn its combat movement in a more complex environment. One can think of more opponents, weapons and items. Or just more complex maps, as to let the bot learn in a normal QUAKE III game. Lastly the use of memory could be a good research subject in which the bot uses its last action as an input. This could be done by using recurrent NN s References [1] M. Harmon and S. Harmon. Reinforcement learning: a tutorial, umass.edu./ mharmon/rltutorial/frames.html. [2] M. E. Harmon and L. C. Baird. Multiplayer residual advantage learning with general function approximation. Technical Report Tech. Rep. WL-TR-1065, Wright Laboratory, WL/AACF, 2241 Avionics Circle, WrightPatterson Air Force Base, OH , [3] id Software. Quake III Arena, quake3-arena/. [4] J. E. Laird. It knows what you re going to do: adding anticipation to a Quakebot. In AGENTS 01: Proceedings of the fifth international conference on Autonomous agents, pages , New York, NY, USA, ACM Press. [5] J. E. Laird and M. van Lent. Human-Level AI s Killer Application: Interactive Computer Games. In Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pages AAAI Press / The MIT Press, [6] PhaethonH. Quake iii: Arena, baseq3 mod commentary. phaethon/q3mc/q3mc.html. [7] PlanetQuake.com. Code3arena. [8] spirolat@gmx.net. Quake 3 game-module documentation. index.htm. [9] R. S. Sutton and A. G. Barto. Course Notes: Reinforcement Learning I: An Introduction. MIT Press: Cambridge, MA, [10] J. van Waveren. The Quake III Arena Bot. Master s thesis, University of Technology Delft, [11] J. van Waveren and L. Rothkrantz. Artificial player for Quake III Arena. International Journal of Intelligent Games & Simulation (IJIGS), 1(1):25 32, March [12] S. Zanetti and A. E. Rhalibi. Machine learning techniques for FPS in Q3. In ACE 04: Proceedings of the 2004 ACM SIGCHI International Conference on Advances in computer entertainment technology, pages , New York, NY, USA, ACM Press. 9

10 (a) (b) Figure 5: Results 10

11 (a) (b) Figure 6: Results 11

12 (a) (b) Figure 7: Results 12

13 (a) Figure 8: Results 13

CS 354R: Computer Game Technology

CS 354R: Computer Game Technology CS 354R: Computer Game Technology Introduction to Game AI Fall 2018 What does the A stand for? 2 What is AI? AI is the control of every non-human entity in a game The other cars in a car game The opponents

More information

Evolutionary Neural Networks for Non-Player Characters in Quake III

Evolutionary Neural Networks for Non-Player Characters in Quake III Evolutionary Neural Networks for Non-Player Characters in Quake III Joost Westra and Frank Dignum Abstract Designing and implementing the decisions of Non- Player Characters in first person shooter games

More information

Dynamic Scripting Applied to a First-Person Shooter

Dynamic Scripting Applied to a First-Person Shooter Dynamic Scripting Applied to a First-Person Shooter Daniel Policarpo, Paulo Urbano Laboratório de Modelação de Agentes FCUL Lisboa, Portugal policarpodan@gmail.com, pub@di.fc.ul.pt Tiago Loureiro vectrlab

More information

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

Strategic and Tactical Reasoning with Waypoints Lars Lidén Valve Software

Strategic and Tactical Reasoning with Waypoints Lars Lidén Valve Software Strategic and Tactical Reasoning with Waypoints Lars Lidén Valve Software lars@valvesoftware.com For the behavior of computer controlled characters to become more sophisticated, efficient algorithms are

More information

Case-based Action Planning in a First Person Scenario Game

Case-based Action Planning in a First Person Scenario Game Case-based Action Planning in a First Person Scenario Game Pascal Reuss 1,2 and Jannis Hillmann 1 and Sebastian Viefhaus 1 and Klaus-Dieter Althoff 1,2 reusspa@uni-hildesheim.de basti.viefhaus@gmail.com

More information

USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER

USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER World Automation Congress 21 TSI Press. USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER Department of Computer Science Connecticut College New London, CT {ahubley,

More information

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

Learning to play Dominoes

Learning to play Dominoes Learning to play Dominoes Ivan de Jesus P. Pinto 1, Mateus R. Pereira 1, Luciano Reis Coutinho 1 1 Departamento de Informática Universidade Federal do Maranhão São Luís,MA Brazil navi1921@gmail.com, mateus.rp.slz@gmail.com,

More information

Evolutions of communication

Evolutions of communication Evolutions of communication Alex Bell, Andrew Pace, and Raul Santos May 12, 2009 Abstract In this paper a experiment is presented in which two simulated robots evolved a form of communication to allow

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

an AI for Slither.io

an AI for Slither.io an AI for Slither.io Jackie Yang(jackiey) Introduction Game playing is a very interesting topic area in Artificial Intelligence today. Most of the recent emerging AI are for turn-based game, like the very

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN FACULTY OF COMPUTING AND INFORMATICS UNIVERSITY MALAYSIA SABAH 2014 ABSTRACT The use of Artificial Intelligence

More information

Creating a Poker Playing Program Using Evolutionary Computation

Creating a Poker Playing Program Using Evolutionary Computation Creating a Poker Playing Program Using Evolutionary Computation Simon Olsen and Rob LeGrand, Ph.D. Abstract Artificial intelligence is a rapidly expanding technology. We are surrounded by technology that

More information

Behaviour Patterns Evolution on Individual and Group Level. Stanislav Slušný, Roman Neruda, Petra Vidnerová. CIMMACS 07, December 14, Tenerife

Behaviour Patterns Evolution on Individual and Group Level. Stanislav Slušný, Roman Neruda, Petra Vidnerová. CIMMACS 07, December 14, Tenerife Behaviour Patterns Evolution on Individual and Group Level Stanislav Slušný, Roman Neruda, Petra Vidnerová Department of Theoretical Computer Science Institute of Computer Science Academy of Science of

More information

situation where it is shot from behind. As a result, ICE is designed to jump in the former case and occasionally look back in the latter situation.

situation where it is shot from behind. As a result, ICE is designed to jump in the former case and occasionally look back in the latter situation. Implementation of a Human-Like Bot in a First Person Shooter: Second Place Bot at BotPrize 2008 Daichi Hirono 1 and Ruck Thawonmas 1 1 Graduate School of Science and Engineering, Ritsumeikan University,

More information

Game Artificial Intelligence ( CS 4731/7632 )

Game Artificial Intelligence ( CS 4731/7632 ) Game Artificial Intelligence ( CS 4731/7632 ) Instructor: Stephen Lee-Urban http://www.cc.gatech.edu/~surban6/2018-gameai/ (soon) Piazza T-square What s this all about? Industry standard approaches to

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

Evolving Parameters for Xpilot Combat Agents

Evolving Parameters for Xpilot Combat Agents Evolving Parameters for Xpilot Combat Agents Gary B. Parker Computer Science Connecticut College New London, CT 06320 parker@conncoll.edu Matt Parker Computer Science Indiana University Bloomington, IN,

More information

the gamedesigninitiative at cornell university Lecture 23 Strategic AI

the gamedesigninitiative at cornell university Lecture 23 Strategic AI Lecture 23 Role of AI in Games Autonomous Characters (NPCs) Mimics personality of character May be opponent or support character Strategic Opponents AI at player level Closest to classical AI Character

More information

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Davis Ancona and Jake Weiner Abstract In this report, we examine the plausibility of implementing a NEAT-based solution

More information

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms Felix Arnold, Bryan Horvat, Albert Sacks Department of Computer Science Georgia Institute of Technology Atlanta, GA 30318 farnold3@gatech.edu

More information

ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT

ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT ROBOCODE PROJECT AIBOT - MARKOV MODEL DRIVEN AIMING COMBINED WITH Q LEARNING FOR MOVEMENT PATRICK HALUPTZOK, XU MIAO Abstract. In this paper the development of a robot controller for Robocode is discussed.

More information

Learning to Play Love Letter with Deep Reinforcement Learning

Learning to Play Love Letter with Deep Reinforcement Learning Learning to Play Love Letter with Deep Reinforcement Learning Madeleine D. Dawson* MIT mdd@mit.edu Robert X. Liang* MIT xbliang@mit.edu Alexander M. Turner* MIT turneram@mit.edu Abstract Recent advancements

More information

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Valentijn Muijrers 3275183 Valentijn.Muijrers@phil.uu.nl Supervisor: Gerard Vreeswijk 7,5 ECTS

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

CS221 Project: Final Report Raiden AI Agent

CS221 Project: Final Report Raiden AI Agent CS221 Project: Final Report Raiden AI Agent Lu Bian lbian@stanford.edu Yiran Deng yrdeng@stanford.edu Xuandong Lei xuandong@stanford.edu 1 Introduction Raiden is a classic shooting game where the player

More information

The Evolution of Multi-Layer Neural Networks for the Control of Xpilot Agents

The Evolution of Multi-Layer Neural Networks for the Control of Xpilot Agents The Evolution of Multi-Layer Neural Networks for the Control of Xpilot Agents Matt Parker Computer Science Indiana University Bloomington, IN, USA matparker@cs.indiana.edu Gary B. Parker Computer Science

More information

Playing Othello Using Monte Carlo

Playing Othello Using Monte Carlo June 22, 2007 Abstract This paper deals with the construction of an AI player to play the game Othello. A lot of techniques are already known to let AI players play the game Othello. Some of these techniques

More information

Efficiency and Effectiveness of Game AI

Efficiency and Effectiveness of Game AI Efficiency and Effectiveness of Game AI Bob van der Putten and Arno Kamphuis Center for Advanced Gaming and Simulation, Utrecht University Padualaan 14, 3584 CH Utrecht, The Netherlands Abstract In this

More information

CMSC 671 Project Report- Google AI Challenge: Planet Wars

CMSC 671 Project Report- Google AI Challenge: Planet Wars 1. Introduction Purpose The purpose of the project is to apply relevant AI techniques learned during the course with a view to develop an intelligent game playing bot for the game of Planet Wars. Planet

More information

Temporal-Difference Learning in Self-Play Training

Temporal-Difference Learning in Self-Play Training Temporal-Difference Learning in Self-Play Training Clifford Kotnik Jugal Kalita University of Colorado at Colorado Springs, Colorado Springs, Colorado 80918 CLKOTNIK@ATT.NET KALITA@EAS.UCCS.EDU Abstract

More information

Online Interactive Neuro-evolution

Online Interactive Neuro-evolution Appears in Neural Processing Letters, 1999. Online Interactive Neuro-evolution Adrian Agogino (agogino@ece.utexas.edu) Kenneth Stanley (kstanley@cs.utexas.edu) Risto Miikkulainen (risto@cs.utexas.edu)

More information

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS Thong B. Trinh, Anwer S. Bashi, Nikhil Deshpande Department of Electrical Engineering University of New Orleans New Orleans, LA 70148 Tel: (504) 280-7383 Fax:

More information

Learning Unit Values in Wargus Using Temporal Differences

Learning Unit Values in Wargus Using Temporal Differences Learning Unit Values in Wargus Using Temporal Differences P.J.M. Kerbusch 16th June 2005 Abstract In order to use a learning method in a computer game to improve the perfomance of computer controlled entities,

More information

UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces

UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces Jacob Schrum, Igor Karpov, and Risto Miikkulainen {schrum2,ikarpov,risto}@cs.utexas.edu Our Approach: UT^2 Evolve

More information

City Research Online. Permanent City Research Online URL:

City Research Online. Permanent City Research Online URL: Child, C. H. T. & Trusler, B. P. (2014). Implementing Racing AI using Q-Learning and Steering Behaviours. Paper presented at the GAMEON 2014 (15th annual European Conference on Simulation and AI in Computer

More information

Basic AI Techniques for o N P N C P C Be B h e a h v a i v ou o r u s: s FS F T S N

Basic AI Techniques for o N P N C P C Be B h e a h v a i v ou o r u s: s FS F T S N Basic AI Techniques for NPC Behaviours: FSTN Finite-State Transition Networks A 1 a 3 2 B d 3 b D Action State 1 C Percept Transition Team Buddies (SCEE) Introduction Behaviours characterise the possible

More information

Adjustable Group Behavior of Agents in Action-based Games

Adjustable Group Behavior of Agents in Action-based Games Adjustable Group Behavior of Agents in Action-d Games Westphal, Keith and Mclaughlan, Brian Kwestp2@uafortsmith.edu, brian.mclaughlan@uafs.edu Department of Computer and Information Sciences University

More information

Extending the STRADA Framework to Design an AI for ORTS

Extending the STRADA Framework to Design an AI for ORTS Extending the STRADA Framework to Design an AI for ORTS Laurent Navarro and Vincent Corruble Laboratoire d Informatique de Paris 6 Université Pierre et Marie Curie (Paris 6) CNRS 4, Place Jussieu 75252

More information

Implicit Fitness Functions for Evolving a Drawing Robot

Implicit Fitness Functions for Evolving a Drawing Robot Implicit Fitness Functions for Evolving a Drawing Robot Jon Bird, Phil Husbands, Martin Perris, Bill Bigge and Paul Brown Centre for Computational Neuroscience and Robotics University of Sussex, Brighton,

More information

5.4 Imperfect, Real-Time Decisions

5.4 Imperfect, Real-Time Decisions 5.4 Imperfect, Real-Time Decisions Searching through the whole (pruned) game tree is too inefficient for any realistic game Moves must be made in a reasonable amount of time One has to cut off the generation

More information

A Reinforcement Learning Approach for Solving KRK Chess Endgames

A Reinforcement Learning Approach for Solving KRK Chess Endgames A Reinforcement Learning Approach for Solving KRK Chess Endgames Zacharias Georgiou a Evangelos Karountzos a Matthia Sabatelli a Yaroslav Shkarupa a a Rijksuniversiteit Groningen, Department of Artificial

More information

Hierarchical Controller for Robotic Soccer

Hierarchical Controller for Robotic Soccer Hierarchical Controller for Robotic Soccer Byron Knoll Cognitive Systems 402 April 13, 2008 ABSTRACT RoboCup is an initiative aimed at advancing Artificial Intelligence (AI) and robotics research. This

More information

Cylinder of Zion. Design by Bart Vossen (100932) LD1 3D Level Design, Documentation version 1.0

Cylinder of Zion. Design by Bart Vossen (100932) LD1 3D Level Design, Documentation version 1.0 Cylinder of Zion Documentation version 1.0 Version 1.0 The document was finalized, checking and fixing minor errors. Version 0.4 The research section was added, the iterations section was finished and

More information

An analysis of Cannon By Keith Carter

An analysis of Cannon By Keith Carter An analysis of Cannon By Keith Carter 1.0 Deploying for Battle Town Location The initial placement of the towns, the relative position to their own soldiers, enemy soldiers, and each other effects the

More information

Game Design Verification using Reinforcement Learning

Game Design Verification using Reinforcement Learning Game Design Verification using Reinforcement Learning Eirini Ntoutsi Dimitris Kalles AHEAD Relationship Mediators S.A., 65 Othonos-Amalias St, 262 21 Patras, Greece and Department of Computer Engineering

More information

Multi-Robot Coordination. Chapter 11

Multi-Robot Coordination. Chapter 11 Multi-Robot Coordination Chapter 11 Objectives To understand some of the problems being studied with multiple robots To understand the challenges involved with coordinating robots To investigate a simple

More information

Principles of Computer Game Design and Implementation. Lecture 20

Principles of Computer Game Design and Implementation. Lecture 20 Principles of Computer Game Design and Implementation Lecture 20 utline for today Sense-Think-Act Cycle: Thinking Acting 2 Agents and Virtual Player Agents, no virtual player Shooters, racing, Virtual

More information

Five-In-Row with Local Evaluation and Beam Search

Five-In-Row with Local Evaluation and Beam Search Five-In-Row with Local Evaluation and Beam Search Jiun-Hung Chen and Adrienne X. Wang jhchen@cs axwang@cs Abstract This report provides a brief overview of the game of five-in-row, also known as Go-Moku,

More information

Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar

Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar Othello Rules Two Players (Black and White) 8x8 board Black plays first Every move should Flip over at least

More information

Quake III Fortress Game Review CIS 487

Quake III Fortress Game Review CIS 487 Quake III Fortress Game Review CIS 487 Jeff Lundberg September 23, 2002 jlundber@umich.edu Quake III Fortress : Game Review Basic Information Quake III Fortress is a remake of the original Team Fortress

More information

Biologically Inspired Embodied Evolution of Survival

Biologically Inspired Embodied Evolution of Survival Biologically Inspired Embodied Evolution of Survival Stefan Elfwing 1,2 Eiji Uchibe 2 Kenji Doya 2 Henrik I. Christensen 1 1 Centre for Autonomous Systems, Numerical Analysis and Computer Science, Royal

More information

Playing Atari Games with Deep Reinforcement Learning

Playing Atari Games with Deep Reinforcement Learning Playing Atari Games with Deep Reinforcement Learning 1 Playing Atari Games with Deep Reinforcement Learning Varsha Lalwani (varshajn@iitk.ac.in) Masare Akshay Sunil (amasare@iitk.ac.in) IIT Kanpur CS365A

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Lecture 01 - Introduction Edirlei Soares de Lima What is Artificial Intelligence? Artificial intelligence is about making computers able to perform the

More information

Training a Neural Network for Checkers

Training a Neural Network for Checkers Training a Neural Network for Checkers Daniel Boonzaaier Supervisor: Adiel Ismail June 2017 Thesis presented in fulfilment of the requirements for the degree of Bachelor of Science in Honours at the University

More information

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

the gamedesigninitiative at cornell university Lecture 3 Design Elements

the gamedesigninitiative at cornell university Lecture 3 Design Elements Lecture 3 Reminder: Aspects of a Game Players: How do humans affect game? Goals: What is player trying to do? Rules: How can player achieve goal? Challenges: What obstacles block goal? 2 Formal Players:

More information

Spotting the Difference: Identifying Player Opponent Preferences in FPS Games

Spotting the Difference: Identifying Player Opponent Preferences in FPS Games Spotting the Difference: Identifying Player Opponent Preferences in FPS Games David Conroy, Peta Wyeth, and Daniel Johnson Queensland University of Technology, Science and Engineering Faculty, Brisbane,

More information

Andrei Behel AC-43И 1

Andrei Behel AC-43И 1 Andrei Behel AC-43И 1 History The game of Go originated in China more than 2,500 years ago. The rules of the game are simple: Players take turns to place black or white stones on a board, trying to capture

More information

Coevolution and turnbased games

Coevolution and turnbased games Spring 5 Coevolution and turnbased games A case study Joakim Långberg HS-IKI-EA-05-112 [Coevolution and turnbased games] Submitted by Joakim Långberg to the University of Skövde as a dissertation towards

More information

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016 Artificial Neural Networks Artificial Intelligence Santa Clara, 2016 Simulate the functioning of the brain Can simulate actual neurons: Computational neuroscience Can introduce simplified neurons: Neural

More information

Opponent Modelling In World Of Warcraft

Opponent Modelling In World Of Warcraft Opponent Modelling In World Of Warcraft A.J.J. Valkenberg 19th June 2007 Abstract In tactical commercial games, knowledge of an opponent s location is advantageous when designing a tactic. This paper proposes

More information

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of Table of Contents Game Mechanics...2 Game Play...3 Game Strategy...4 Truth...4 Contrapositive... 5 Exhaustion...6 Burnout...8 Game Difficulty... 10 Experiment One... 12 Experiment Two...14 Experiment Three...16

More information

the gamedesigninitiative at cornell university Lecture 3 Design Elements

the gamedesigninitiative at cornell university Lecture 3 Design Elements Lecture 3 Reminder: Aspects of a Game Players: How do humans affect game? Goals: What is player trying to do? Rules: How can player achieve goal? Challenges: What obstacles block goal? 2 Formal Players:

More information

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi Learning to Play like an Othello Master CS 229 Project Report December 13, 213 1 Abstract This project aims to train a machine to strategically play the game of Othello using machine learning. Prior to

More information

TGD3351 Game Algorithms TGP2281 Games Programming III. in my own words, better known as Game AI

TGD3351 Game Algorithms TGP2281 Games Programming III. in my own words, better known as Game AI TGD3351 Game Algorithms TGP2281 Games Programming III in my own words, better known as Game AI An Introduction to Video Game AI In a nutshell B.CS (GD Specialization) Game Design Fundamentals Game Physics

More information

Evolving robots to play dodgeball

Evolving robots to play dodgeball Evolving robots to play dodgeball Uriel Mandujano and Daniel Redelmeier Abstract In nearly all videogames, creating smart and complex artificial agents helps ensure an enjoyable and challenging player

More information

CS221 Project Final Report Gomoku Game Agent

CS221 Project Final Report Gomoku Game Agent CS221 Project Final Report Gomoku Game Agent Qiao Tan qtan@stanford.edu Xiaoti Hu xiaotihu@stanford.edu 1 Introduction Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally

More information

Down In Flames WWI 9/7/2005

Down In Flames WWI 9/7/2005 Down In Flames WWI 9/7/2005 Introduction Down In Flames - WWI depicts the fun and flavor of World War I aerial dogfighting. You get to fly the colorful and agile aircraft of WWI as you make history in

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory AI Challenge One 140 Challenge 1 grades 120 100 80 60 AI Challenge One Transform to graph Explore the

More information

TGD3351 Game Algorithms TGP2281 Games Programming III. in my own words, better known as Game AI

TGD3351 Game Algorithms TGP2281 Games Programming III. in my own words, better known as Game AI TGD3351 Game Algorithms TGP2281 Games Programming III in my own words, better known as Game AI An Introduction to Video Game AI A round of introduction In a nutshell B.CS (GD Specialization) Game Design

More information

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS The major design challenges of ASIC design consist of microscopic issues and macroscopic issues [1]. The microscopic issues are ultra-high

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Bart Selman Reinforcement Learning R&N Chapter 21 Note: in the next two parts of RL, some of the figure/section numbers refer to an earlier edition of R&N

More information

Implementing Reinforcement Learning in Unreal Engine 4 with Blueprint. by Reece A. Boyd

Implementing Reinforcement Learning in Unreal Engine 4 with Blueprint. by Reece A. Boyd Implementing Reinforcement Learning in Unreal Engine 4 with Blueprint by Reece A. Boyd A thesis presented to the Honors College of Middle Tennessee State University in partial fulfillment of the requirements

More information

Artificial Intelligence. Cameron Jett, William Kentris, Arthur Mo, Juan Roman

Artificial Intelligence. Cameron Jett, William Kentris, Arthur Mo, Juan Roman Artificial Intelligence Cameron Jett, William Kentris, Arthur Mo, Juan Roman AI Outline Handicap for AI Machine Learning Monte Carlo Methods Group Intelligence Incorporating stupidity into game AI overview

More information

Evolved Neurodynamics for Robot Control

Evolved Neurodynamics for Robot Control Evolved Neurodynamics for Robot Control Frank Pasemann, Martin Hülse, Keyan Zahedi Fraunhofer Institute for Autonomous Intelligent Systems (AiS) Schloss Birlinghoven, D-53754 Sankt Augustin, Germany Abstract

More information

Experiments with Learning for NPCs in 2D shooter

Experiments with Learning for NPCs in 2D shooter 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Agent Smith: An Application of Neural Networks to Directing Intelligent Agents in a Game Environment

Agent Smith: An Application of Neural Networks to Directing Intelligent Agents in a Game Environment Agent Smith: An Application of Neural Networks to Directing Intelligent Agents in a Game Environment Jonathan Wolf Tyler Haugen Dr. Antonette Logar South Dakota School of Mines and Technology Math and

More information

Decision Making in Multiplayer Environments Application in Backgammon Variants

Decision Making in Multiplayer Environments Application in Backgammon Variants Decision Making in Multiplayer Environments Application in Backgammon Variants PhD Thesis by Nikolaos Papahristou AI researcher Department of Applied Informatics Thessaloniki, Greece Contributions Expert

More information

Designing BOTs with BDI Agents

Designing BOTs with BDI Agents Designing BOTs with BDI Agents Purvag Patel, and Henry Hexmoor Computer Science Department, Southern Illinois University, Carbondale, IL, 62901, USA purvag@siu.edu and hexmoor@cs.siu.edu ABSTRACT In modern

More information

Strategic Path Planning on the Basis of Risk vs. Time

Strategic Path Planning on the Basis of Risk vs. Time Strategic Path Planning on the Basis of Risk vs. Time Ashish C. Singh and Lawrence Holder School of Electrical Engineering and Computer Science Washington State University Pullman, WA 99164 ashish.singh@ignitionflorida.com,

More information

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:

More information

A Multi-Agent Potential Field-Based Bot for a Full RTS Game Scenario

A Multi-Agent Potential Field-Based Bot for a Full RTS Game Scenario Proceedings of the Fifth Artificial Intelligence for Interactive Digital Entertainment Conference A Multi-Agent Potential Field-Based Bot for a Full RTS Game Scenario Johan Hagelbäck and Stefan J. Johansson

More information

AI Approaches to Ultimate Tic-Tac-Toe

AI Approaches to Ultimate Tic-Tac-Toe AI Approaches to Ultimate Tic-Tac-Toe Eytan Lifshitz CS Department Hebrew University of Jerusalem, Israel David Tsurel CS Department Hebrew University of Jerusalem, Israel I. INTRODUCTION This report is

More information

Ar#ficial)Intelligence!!

Ar#ficial)Intelligence!! Introduc*on! Ar#ficial)Intelligence!! Roman Barták Department of Theoretical Computer Science and Mathematical Logic So far we assumed a single-agent environment, but what if there are more agents and

More information

Monte Carlo based battleship agent

Monte Carlo based battleship agent Monte Carlo based battleship agent Written by: Omer Haber, 313302010; Dror Sharf, 315357319 Introduction The game of battleship is a guessing game for two players which has been around for almost a century.

More information

Learning Artificial Intelligence in Large-Scale Video Games

Learning Artificial Intelligence in Large-Scale Video Games Learning Artificial Intelligence in Large-Scale Video Games A First Case Study with Hearthstone: Heroes of WarCraft Master Thesis Submitted for the Degree of MSc in Computer Science & Engineering Author

More information

Along the way, battle with other ships to loot or destroy them. The Board

Along the way, battle with other ships to loot or destroy them. The Board Sky Pirates and the Quest for Helium A take that pick up and deliver game for 2 6 players Helium a speculative lifting gas nearly as good as hydrogen but not subject to flame. No hydrogen dirigible would

More information

USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES

USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES Thomas Hartley, Quasim Mehdi, Norman Gough The Research Institute in Advanced Technologies (RIATec) School of Computing and Information

More information

Adaptive Shooting for Bots in First Person Shooter Games using Reinforcement Learning

Adaptive Shooting for Bots in First Person Shooter Games using Reinforcement Learning IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 1 Adaptive Shooting for Bots in First Person Shooter Games using Reinforcement Learning Frank G. Glavin and Michael G. Madden Abstract In

More information

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH Santiago Ontañón so367@drexel.edu Recall: Problem Solving Idea: represent the problem we want to solve as: State space Actions Goal check Cost function

More information

A Learning Infrastructure for Improving Agent Performance and Game Balance

A Learning Infrastructure for Improving Agent Performance and Game Balance A Learning Infrastructure for Improving Agent Performance and Game Balance Jeremy Ludwig and Art Farley Computer Science Department, University of Oregon 120 Deschutes Hall, 1202 University of Oregon Eugene,

More information

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Applying Modern Reinforcement Learning to Play Video Games Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Outline Term 1 Review Term 2 Objectives Experiments & Results

More information

Inference of Opponent s Uncertain States in Ghosts Game using Machine Learning

Inference of Opponent s Uncertain States in Ghosts Game using Machine Learning Inference of Opponent s Uncertain States in Ghosts Game using Machine Learning Sehar Shahzad Farooq, HyunSoo Park, and Kyung-Joong Kim* sehar146@gmail.com, hspark8312@gmail.com,kimkj@sejong.ac.kr* Department

More information

Dota2 is a very popular video game currently.

Dota2 is a very popular video game currently. Dota2 Outcome Prediction Zhengyao Li 1, Dingyue Cui 2 and Chen Li 3 1 ID: A53210709, Email: zhl380@eng.ucsd.edu 2 ID: A53211051, Email: dicui@eng.ucsd.edu 3 ID: A53218665, Email: lic055@eng.ucsd.edu March

More information

Royale Politique. A funny game developed for the RV course - University of Pisa

Royale Politique. A funny game developed for the RV course - University of Pisa Royale Politique A funny game developed for the RV course - University of Pisa First of all Based on an idea matured during the last elections turn:

More information

Neuro-Visual Control in the Quake II Environment. Matt Parker and Bobby D. Bryant Member, IEEE. Abstract

Neuro-Visual Control in the Quake II Environment. Matt Parker and Bobby D. Bryant Member, IEEE. Abstract 1 Neuro-Visual Control in the Quake II Environment Matt Parker and Bobby D. Bryant Member, IEEE Abstract A wide variety of tasks may be performed by humans using only visual data as input. Creating artificial

More information

Learning Character Behaviors using Agent Modeling in Games

Learning Character Behaviors using Agent Modeling in Games Proceedings of the Fifth Artificial Intelligence for Interactive Digital Entertainment Conference Learning Character Behaviors using Agent Modeling in Games Richard Zhao, Duane Szafron Department of Computing

More information