Linköpings University. Marine rush. Teaching an agent StarCraft 2 through reinforced learning. Erik Kindberg

Size: px
Start display at page:

Download "Linköpings University. Marine rush. Teaching an agent StarCraft 2 through reinforced learning. Erik Kindberg"

Transcription

1 Linköpings University Marine rush Teaching an agent StarCraft 2 through reinforced learning Erik Kindberg

2 Table of contents 1. Introduction... 1 Starcraft What is Starcraft PySC Reinforcement learning Q learning... 2 The Game The bot The map Implementation... 4 PYSC2 Package... 5 QLearningTable choose_action learn check_state_exist... 6 sparse_agent transformdistance TransformLocation splitaction step Results Reflections Reference List Appendices Appendix 1 Python Code... 12

3 ABSTRACT This report concerns the implementation of a reinforcement trained bot in the game StarCraft II. The bot has access to a few of all possible actions to limit the action space and it improved through Q-learning techniques. It faced the easiest built-in game AI and was fairly successful, eventually it was able to win more then it lost.

4 1. INTRODUCTION The purpose of this project is to implement a bot in the strategy game StarCraft II. The term bot will be interchangeable with intelligent agent in this report. I prefer to use the term bot, because it is the commonly used term within the game. The bot will be tested against another scripted bot and improve through reinforced learning techniques. The bot will be programmed in Python 3 and will be implemented through the environment PySC2. STARCRAFT WHAT IS STARCRAFT 2 StarCraft 2 is a PC strategy game developed by Blizzard Entertainment (Blizzard Enterainment, 2018). The game is played between two or more agents, human or computer controlled. The purpose is to destroy your opponents, by building a base, recruiting soldiers, and deploying them in a more effective manner than your opponents. A common distinction you make is between the micro game and the macro game. The micro game can be likened to tactics and concerns placement and manoeuvring of your soldiers. The macro game can be likened to strategy, and concerns what buildings you build in your base and what types of soldiers you recruit. StarCraft II provides a difficult challenge for intelligent agents, described in the original DeepMind blogpost: Even StarCraft s action space presents a challenge with a choice of more than 300 basic actions that can be taken. Contrast this with Atari games, which only have about 10 (e.g. up, down, left, right etc). On top of this, actions in StarCraft are hierarchical, can be modified and augmented, with many of them requiring a point on the screen. Even assuming a small screen size of 84x84 there are roughly 100 million possible actions available (DeepMind, 2018) PYSC2 PySC2, is an environment for AI research within the game StarCraft 2. It is a collaboration between Blizzard entertainment, and DeepMind a Google owned company focusing on AI research (Vinyals, o.a., 2018). StarCraft II was chosen for AI research for a 1

5 few reasons. Firstly, the above-mentioned action space complexity, another challenge is that the length of a game has a lot of variance, meaning that actions may not pay off for a long time, or at all. Another challenge is the map which is only partly observable. This requires a combination of memory and planning. A benefit of using StarCraft 2 is the large playerbase, ensuring that a large quantity of replay data is available as well as many talented human opponents for AI (DeepMind, 2018). REINFORCEMENT LEARNING Reinforcement learning is an aspect of machine learning, inspired by behaviourism (Reinforcement Learning, 2018). The general idea is to define a goal, encourage behaviour that leads the bot closer towards the goal, good behaviour, through some kind of reward, and discourage bad behaviour, on the opposite end. In complex domains supervised learning might not be possible, since it requires accurate and consistent evaluation which might not be available. Especially if the states are sequential, dependant on each other. If each action is not individual, it is harder to accurately label them, making supervised training tricky (Russell & Norvig, 2010). Reinforcement training is about putting an agent into an environment where it does not know what the possible actions will result in or how the environment works. The agent will figure out optimal policies through positive and negative feedback (Russell & Norvig, 2010) Q LEARNING The variant of reinforcement learning used to implement the bot, is called Q-learning. Q-learning is an active learning method. In active learning, an agent needs to decide on an action to take, compared to passive learning where the actions are static, state S will always perform action A. Q-learning stores action-utility rather than utility, representing the expected utility with each action instead of storing a utility function on outcomes for each state. Q- learning is also a model-free method. This means that it does not need an explicit model for its environment. More specifically, the agent does not need a transition model. The agent searches for optimal policies through predicting the value function of a policy, without a model for the environment (Russell & Norvig, 2010). In simple terms, this is the Q-learning algorithm that is going to be implemented in the bot (Zhou, 2018). Initialize the state s. 2

6 Choose action a in state s. Take action a, observe reward rand next state s Update learning table for the performed action a in state s by adding the reward and TD error Repeat until the last state has been reached. In the case of Staracraft II, until a match has ended. THE GAME THE BOT Because of the complexity of StarCraft II, the bot has certain hard-coded limitations. The bot will only play as one of the three available races, the Terrans. The Terrans have fourteen different buildings and sixteen different types of soldiers available. The bot is limited to two different types of buildings, barracks, and supply depots. It is also limited to one type of soldier, the marine. Barracks are used to build marines, supply units raise the number of marines that can be recruited and at least one is required to have been built, before you can build the barracks. The marine is the most basic kind of soldier available to the Terrans. Both players start with a certain amount of SCV units, these gathers resources to build things with and builds buildings, and a command center. The command center has functions that isn t used by the bot. The opponent will be set to the easiest difficulty THE MAP The bot will play on the simplest map called Simple64x64, which is the simplest map available in PySC2, it consists of two, symmetrical plateaus where the starting bases and a 3

7 valley in-between. All units and buildings have a vision range, and the bot will see only what it s units and buildings see, fog will cover the rest of the map. The bot s units and buildings will be represented by green squares on the map, and the opponent s buildings and units will be represented as blue squares. Figure 2. The map. 2. IMPLEMENTATION The implementation consists of the PySC2 environment and the bot. The agent class is constructed from a tutorial by Steven Brown (2018) and the Q-learning algorithm is taken from a tutorial by Morvan Zhou (2018). This is enough code to produce a functioning bot, and is contained within one file, sparse_agent.py. The bot contains two classes, QLearningTable and SparseAgent. Firstly, a number of variables are declared that will correspond with the game. For example, _NO_OP = actions.functions.no_op.id This tells the bot to do nothing, which can seem strange, but there are cases where doing nothing is beneficial, such as when waiting for a marine to be recruited. Outside of the classes is a for-loop. This loop reduces the bot s model for the minimap from a 64x64 grid to a 4x4 grid. The bot uses the game s minimap to issue attack orders, and a 4x4 grid reduces the number of possible actions greatly. The marines have an attack range large enough that it can still cover the entire map with a 4x4 grid. 4

8 Figure 3. Mini-map (Brown, 2018) PYSC2 PACKAGE The environment connects the bot to the game through the python package Pysc2. A complete description of this package would require a report of its own. I am encouraging anyone who wants to know more to visit the GitHub page: QLEARNINGTABLE In the init method, certain parameters for the Q-learning algorithm. learning_rate 0.1 This parameter states how prone the bot is to follow new information (Wikimedia Foundation, Inc, 2018). e_greedy 0.9 This parameter decides how greedy the algorithm will be, set to 1 the algorithm will always choose the action that it thinks is optimal, but you want it to explore other options sometimes, because for it to discover the global optimal policy. With a greedy factor of 0.9, the bot will perform a random action one fifth of the times (Wikimedia Foundation, Inc, 2018). reward_decay Decides the importance of future rewards, a reward decay of 0 means that the bot only focuses on the short-term gains, and a reward decay of 1 means that it is very focused on the long-term gains. The bot s reward decay is fairly high at 0.9. It is appropriate, because 5

9 many of the actions taken in StarCraft II does not pay off until many steps later (Wikimedia Foundation, Inc, 2018) CHOOSE_ACTION This method tells the bot what action to choose. First it checks if a state exists, it then randomly generates a number between 0 and 1. If the number is below the greedy factor, the action with the highest score from the learning table is chosen. If the number is on par or above the greedy factor a random action is chosen LEARN The core of the QLearningTable class is the learn method. This is where the bot learns, and the learning table is updated. The method uses four parameters, the current state, the action to be taken, the reward and the next state. It first checks if the current and next state exists with the method check_state_exist. It then checks if the next state is the last state. If not, we update the target with the reward decay with the next state that has the highest value among all possible next states and adds it with the reward. if s_!= 'terminal': q_target = r + self.gamma * self.q_table.ix[s_, :].max() If we are on the last state, only the reward matters for the target. else: q_target = r # next state is terminal Then we update the state and action tuple in the learning table by adding the learning rate times the TD error CHECK_STATE_EXIST This method checks if a state exists in the learning table. If not, the state is added to the learning table. 6

10 SPARSE_AGENT TRANSFORMDISTANCE This method takes the coordinates of the bot s base and computes other positions on the map, relative to the player base. This function as well as transformlocation converts map data such that actions are taken as if the base is in the bottom right, lowering the amount of computation needed for the bot. The bot does not actually need to take random start location into account when making decisions TRANSFORMLOCATION This method converts absolute x and y coordinates on the map, rather than relative distances to the bot s base SPLITACTION This method is a support method, it allows the extraction of required information from smart actions, that contain several actions STEP This is the major part of the agent class. Every game is called an episode. The agent acts in steps, and for every step, this method is looped. Firstly, the loops check if it is the last step of an episode. If that is the case, the class updates the Q-learning table and saves it to an external file in the pickle format. Saving the Q-learning in an external file comes in handy, because playing many game is time-consuming and being able to stop and then start where you left the bot, comes in handy. In the last step of an episode the bot also applies the reward. The learning table is updated with the reward, the rewards are sparse, plus one for a win, zero for a draw and minus one for a loss. Then the bot checks if it is the first step of an episode. In the first step of a game, the position of the bot s base is established, the position of the opponent s base is inferred from this, since there are only two possible start locations. Then the loop keep tracks of the number of friendly buildings on the map. The smart actions are multiple actions condensed into one. These are actions that are closely connected, for example, the attack action consists of selection units, send them towards the coordinates that are going to be attacked and do nothing. The actions are 7

11 condensed to make the action space less complex, we give the bot certain hints about which actions go together, all smart actions contain three actions in total. The main actions that the bot is capable of are: Do nothing Do nothing for three steps Build supply depot select an SCV build a supply depot and send the SCV back to gathering resources Build barracks - Like the above command, but the SCV builds barracks rather than a supply depot Build marine Select all built barracks and fill the recruit queue with marines Attack (x,y) This action attacks a certain coordinate on the map We then check which action in a smart action we currently are at. If it is the first action we setup the state so it includes the count of the bot s buildings and the number of marines the bot has. Then we divide the map into four quadrants and mark it as hot if enemy units can be observed in it. Again, if the base is at the bottom-right of the map, we invert the quadrants, so the perspective stays constant. If the episode isn t on the first step, we call the learn method from the QLearningTable class with the reward zero since the bot hasn t won or lost. Since this is only done on the first step of each smart action, the table is only updated every third step in each episode. We then choose a smart action from the choose_action method in QLearningTable. The loop then extracts the needed information for the first step in the smart action by using the splitaction method and executes the first step for the chosen smart action.on the second and third step of each smart action, the bot simply executes the relevant action. The bot remembers which step it is on by a counter and resets the counter on the third step of each action. 8

12 3. RESULTS After 2477 matches against the scripted opponent, the bot s win rate surpassed it s loss rate. The bot had the following performance curve: % % 80.00% Win %, Loss % and Draw % Axeltitel 60.00% 40.00% 20.00% Win % Loss % Draw % 0.00% Figure 4. Win/draw/loss ratio 4. REFLECTIONS StarCraft II provides a very interesting challenge for AI because of its complexity. With only two available buildings and one unit, the bot had a small scope, and it was enough to perform decently against the easiest opponent. I suspect that it will need access to more buildings and units to be successful against the more difficult opponents, but this will increase action space complexity in ways that is hard to predict. Corners can be cut, for example with smart actions that have been implemented in the bot described in this report. One improvement that would drastically improve performance would be the capability for the bot to accept surrenders. Currently it is unable to do so. From my own empirical observations of the bot playing I noticed that often the opponent would offer to surrender, 9

13 which it does once certain scripted conditions are met, when it believes it can t win anymore. Because of a PySC2 limitation, the bot is unable to accept the surrender and numerous times the game conditions for a draw were met, or the opponent came back, resulting in either a draw or a loss for the bot where a human player could ve won by pressing one button. Another issue that was considered was the reward structure, with sparse rewards the bot generally takes more matches to learn. The author of the tutorial used to create the bot did consider alternate structures, the game has a built-in score which depends on a few things, such as enemy units destroyed (Brown, 2018). Using the score as a performance measure led to some strange behaviours, such as the bot placing its soldiers outside of the enemy base and waiting for enemy soldiers to appear and destroy them, to further increase the score. Winning really is the purest performance measure. I considered the algorithm used and it options, I specifically looked at an algorithm called SARSA which is similar to Q-learning with a few differences that becomes important when choosing algorithm. In Russell & Norvig (2010) they state that if the overall policy is affected by another agent, which it is, since the bot is playing against an opponent, then a Q- learning algorithm for what actually happens, rather than a SARSA algorithm for what the bot wants to happen. This is the case because the Q-learning algorithm backs up the best Q-value when a state is reached and the SARSA algorithm backs up the Q-value after an action has been taken. This was not my choice since I followed a tutorial, but the choice is backed up by literature. I dived head first into the StarCraft II AI world, because it looked interesting and I very soon realized that I have only been scraping the surface. There is a whole lot more to discover and I believe that the bot s performance can be improved both by a deeper understanding of reinforcement learning and deeper understanding of the game and all of its intricacies. 10

14 5. REFERENCE LIST Blizzard Enterainment. (den 9 January 2018). Starcraft 2. Hämtat från Starcraft 2: Brown, S. (den 10 January 2018). Build a Sparse Reward PySC2 Agent. Hämtat från Medium: DeepMind. (den 9 January 2018). DeepMind and Blizzard open StarCraft II as an AI research environment. Hämtat från Deepmind: Russell, S., & Norvig, P. (2010). Artificial Intelligence: A modern approach. Boston: Pearson Education. Vinyals, O., Ewalds, T., Bartunov, S., Georgiev, P., Vezhnevets, A. S., Yeo, M.,... Tsing, R. (den 9 January 2018). PySC2 - StarCraft II Learning Environment. Hämtat från GitHub: Wikimedia Foundation, Inc. (den 10 January 2018). Q-learning. Hämtat från Wikipedia: Wikimedia Foundation, Inc. (den 9 January 2018). Reinforcement Learning. Hämtat från Wikipedia: Zhou, M. (den 9 January 2018). Reinforcement Learning Methods and Tutorials. Hämtat från Github: 11

15 6. APPENDICES APPENDIX 1 PYTHON CODE # Importing packages import random import math import os.path import numpy as np import pandas as pd from pysc2.agents import base_agent from pysc2.lib import actions from pysc2.lib import features # Declaring variables for actions and unit IDs _NO_OP = actions.functions.no_op.id _SELECT_POINT = actions.functions.select_point.id _BUILD_SUPPLY_DEPOT = actions.functions.build_supplydepot_screen.id _BUILD_BARRACKS = actions.functions.build_barracks_screen.id _TRAIN_MARINE = actions.functions.train_marine_quick.id _SELECT_ARMY = actions.functions.select_army.id _ATTACK_MINIMAP = actions.functions.attack_minimap.id _HARVEST_GATHER = actions.functions.harvest_gather_screen.id _PLAYER_RELATIVE = features.screen_features.player_relative.index _UNIT_TYPE = features.screen_features.unit_type.index _PLAYER_ID = features.screen_features.player_id.index _PLAYER_SELF = 1 _PLAYER_HOSTILE = 4 _ARMY_SUPPLY = 5 _TERRAN_COMMANDCENTER = 18 _TERRAN_SCV = 45 _TERRAN_SUPPLY_DEPOT = 19 _TERRAN_BARRACKS = 21 _NEUTRAL_MINERAL_FIELD = 341 _NOT_QUEUED = [0] _QUEUED = [1] _SELECT_ALL = [2] DATA_FILE = 'sparse_agent_data' ACTION_DO_NOTHING = 'donothing' ACTION_BUILD_SUPPLY_DEPOT = 'buildsupplydepot' ACTION_BUILD_BARRACKS = 'buildbarracks' ACTION_BUILD_MARINE = 'buildmarine' ACTION_ATTACK = 'attack' # Smart actions are multiple actions in one command smart_actions = [ ACTION_DO_NOTHING, ACTION_BUILD_SUPPLY_DEPOT, ACTION_BUILD_BARRACKS, ACTION_BUILD_MARINE, ] # This loop splits the mini-map from a 64x64 grid into a 4x4 grid for mm_x in range(0, 64): for mm_y in range(0, 64): if (mm_x + 1) % 32 == 0 and (mm_y + 1) % 32 == 0: 12

16 smart_actions.append(action_attack + '_' + str(mm_x - 16) + '_' + str(mm_y - 16)) # Stolen from class QLearningTable: def init (self, actions, learning_rate=0.01, reward_decay=0.9, e_greedy=0.9): self.actions = actions # a list self.lr = learning_rate self.gamma = reward_decay self.epsilon = e_greedy self.q_table = pd.dataframe(columns=self.actions, dtype=np.float64) def choose_action(self, observation): self.check_state_exist(observation) if np.random.uniform() < self.epsilon: # choose best action state_action = self.q_table.ix[observation, :] # some actions have the same value state_action = state_action.reindex(np.random.permutation(state_action.index)) action = state_action.idxmax() else: # choose random action action = np.random.choice(self.actions) return action def learn(self, s, a, r, s_): self.check_state_exist(s_) self.check_state_exist(s) q_predict = self.q_table.ix[s, a] if s_!= 'terminal': q_target = r + self.gamma * self.q_table.ix[s_, :].max() else: q_target = r # next state is terminal # update self.q_table.ix[s, a] += self.lr * (q_target - q_predict) def check_state_exist(self, state): if state not in self.q_table.index: # append new state to q table self.q_table = self.q_table.append( pd.series([0] * len(self.actions), index=self.q_table.columns, name=state)) class SparseAgent(base_agent.BaseAgent): def init (self): super(sparseagent, self). init () self.qlearn = QLearningTable(actions=list(range(len(smart_actions)))) self.previous_action = None self.previous_state = None self.cc_y = None self.cc_x = None self.move_number = 0 13

17 if os.path.isfile(data_file + '.gz'): self.qlearn.q_table = pd.read_pickle(data_file + '.gz', compression='gzip') def transformdistance(self, x, x_distance, y, y_distance): if not self.base_top_left: return [x - x_distance, y - y_distance] return [x + x_distance, y + y_distance] def transformlocation(self, x, y): if not self.base_top_left: return [64 - x, 64 - y] return [x, y] def splitaction(self, action_id): smart_action = smart_actions[action_id] x = 0 y = 0 if '_' in smart_action: smart_action, x, y = smart_action.split('_') return (smart_action, x, y) def step(self, obs): super(sparseagent, self).step(obs) if obs.last(): reward = obs.reward with open(data_file + '_rewards.txt', 'a') as myfile: myfile.write(str(reward) + "\n") self.qlearn.learn(str(self.previous_state), self.previous_action, reward, 'terminal') self.qlearn.q_table.to_csv(data_file + '.csv') self.qlearn.q_table.to_pickle(data_file + '.gz', 'gzip') self.previous_action = None self.previous_state = None self.move_number = 0 return actions.functioncall(_no_op, []) unit_type = obs.observation['screen'][_unit_type] if obs.first(): player_y, player_x = (obs.observation['minimap'][_player_relative] == _PLAYER_SELF).nonzero() self.base_top_left = 1 if player_y.any() and player_y.mean() <= 31 else 0 self.cc_y, self.cc_x = (unit_type == _TERRAN_COMMANDCENTER).nonzero() cc_y, cc_x = (unit_type == _TERRAN_COMMANDCENTER).nonzero() cc_count = 1 if cc_y.any() else 0 depot_y, depot_x = (unit_type == _TERRAN_SUPPLY_DEPOT).nonzero() supply_depot_count = int(round(len(depot_y) / 69)) barracks_y, barracks_x = (unit_type == _TERRAN_BARRACKS).nonzero() barracks_count = int(round(len(barracks_y) / 137)) 14

18 if self.move_number == 0: self.move_number += 1 current_state = np.zeros(8) current_state[0] = cc_count current_state[1] = supply_depot_count current_state[2] = barracks_count current_state[3] = obs.observation['player'][_army_supply] hot_squares = np.zeros(4) enemy_y, enemy_x = (obs.observation['minimap'][_player_relative] == _PLAYER_HOSTILE).nonzero() for i in range(0, len(enemy_y)): y = int(math.ceil((enemy_y[i] + 1) / 32)) x = int(math.ceil((enemy_x[i] + 1) / 32)) hot_squares[((y - 1) * 2) + (x - 1)] = 1 if not self.base_top_left: hot_squares = hot_squares[::-1] for i in range(0, 4): current_state[i + 4] = hot_squares[i] if self.previous_action is not None: self.qlearn.learn(str(self.previous_state), self.previous_action, 0, str(current_state)) rl_action = self.qlearn.choose_action(str(current_state)) self.previous_state = current_state self.previous_action = rl_action smart_action, x, y = self.splitaction(self.previous_action) if smart_action == ACTION_BUILD_BARRACKS or smart_action == ACTION_BUILD_SUPPLY_DEPOT: unit_y, unit_x = (unit_type == _TERRAN_SCV).nonzero() if unit_y.any(): i = random.randint(0, len(unit_y) - 1) target = [unit_x[i], unit_y[i]] target]) return actions.functioncall(_select_point, [_NOT_QUEUED, elif smart_action == ACTION_BUILD_MARINE: if barracks_y.any(): i = random.randint(0, len(barracks_y) - 1) target = [barracks_x[i], barracks_y[i]] target]) return actions.functioncall(_select_point, [_SELECT_ALL, elif smart_action == ACTION_ATTACK: if _SELECT_ARMY in obs.observation['available_actions']: return actions.functioncall(_select_army, [_NOT_QUEUED]) elif self.move_number == 1: self.move_number += 1 smart_action, x, y = self.splitaction(self.previous_action) if smart_action == ACTION_BUILD_SUPPLY_DEPOT: if supply_depot_count < 2 and _BUILD_SUPPLY_DEPOT in obs.observation['available_actions']: if self.cc_y.any(): if supply_depot_count == 0: 15

19 target = self.transformdistance(round(self.cc_x.mean()), -35, round(self.cc_y.mean()), 0) elif supply_depot_count == 1: target = self.transformdistance(round(self.cc_x.mean()), -25, round(self.cc_y.mean()), -25) [_NOT_QUEUED, target]) return actions.functioncall(_build_supply_depot, elif smart_action == ACTION_BUILD_BARRACKS: if barracks_count < 2 and _BUILD_BARRACKS in obs.observation['available_actions']: if self.cc_y.any(): if barracks_count == 0: target = self.transformdistance(round(self.cc_x.mean()), 15, round(self.cc_y.mean()), -9) elif barracks_count == 1: target = self.transformdistance(round(self.cc_x.mean()), 15, round(self.cc_y.mean()), 12) target]) return actions.functioncall(_build_barracks, [_NOT_QUEUED, elif smart_action == ACTION_BUILD_MARINE: if _TRAIN_MARINE in obs.observation['available_actions']: return actions.functioncall(_train_marine, [_QUEUED]) elif smart_action == ACTION_ATTACK: do_it = True if len(obs.observation['single_select']) > 0 and obs.observation['single_select'][0][0] == _TERRAN_SCV: do_it = False if len(obs.observation['multi_select']) > 0 and obs.observation['multi_select'][0][0] == _TERRAN_SCV: do_it = False if do_it and _ATTACK_MINIMAP in obs.observation["available_actions"]: x_offset = random.randint(-1, 1) y_offset = random.randint(-1, 1) return actions.functioncall(_attack_minimap, [_NOT_QUEUED, self.transformlocation(int(x) + (x_offset * 8), int(y) + (y_offset * 8))]) elif self.move_number == 2: self.move_number = 0 smart_action, x, y = self.splitaction(self.previous_action) if smart_action == ACTION_BUILD_BARRACKS or smart_action == ACTION_BUILD_SUPPLY_DEPOT: if _HARVEST_GATHER in obs.observation['available_actions']: unit_y, unit_x = (unit_type == _NEUTRAL_MINERAL_FIELD).nonzero() if unit_y.any(): i = random.randint(0, len(unit_y) - 1) m_x = unit_x[i] m_y = unit_y[i] target = [int(m_x), int(m_y)] 16

20 target]) return actions.functioncall(_harvest_gather, [_QUEUED, return actions.functioncall(_no_op, []) 17

Deep RL For Starcraft II

Deep RL For Starcraft II Deep RL For Starcraft II Andrew G. Chang agchang1@stanford.edu Abstract Games have proven to be a challenging yet fruitful domain for reinforcement learning. One of the main areas that AI agents have surpassed

More information

Learning Unit Values in Wargus Using Temporal Differences

Learning Unit Values in Wargus Using Temporal Differences Learning Unit Values in Wargus Using Temporal Differences P.J.M. Kerbusch 16th June 2005 Abstract In order to use a learning method in a computer game to improve the perfomance of computer controlled entities,

More information

CS 480: GAME AI DECISION MAKING AND SCRIPTING

CS 480: GAME AI DECISION MAKING AND SCRIPTING CS 480: GAME AI DECISION MAKING AND SCRIPTING 4/24/2012 Santiago Ontañón santi@cs.drexel.edu https://www.cs.drexel.edu/~santi/teaching/2012/cs480/intro.html Reminders Check BBVista site for the course

More information

CMSC 671 Project Report- Google AI Challenge: Planet Wars

CMSC 671 Project Report- Google AI Challenge: Planet Wars 1. Introduction Purpose The purpose of the project is to apply relevant AI techniques learned during the course with a view to develop an intelligent game playing bot for the game of Planet Wars. Planet

More information

Deep Imitation Learning for Playing Real Time Strategy Games

Deep Imitation Learning for Playing Real Time Strategy Games Deep Imitation Learning for Playing Real Time Strategy Games Jeffrey Barratt Stanford University 353 Serra Mall jbarratt@cs.stanford.edu Chuanbo Pan Stanford University 353 Serra Mall chuanbo@cs.stanford.edu

More information

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

Basic Tips & Tricks To Becoming A Pro

Basic Tips & Tricks To Becoming A Pro STARCRAFT 2 Basic Tips & Tricks To Becoming A Pro 1 P age Table of Contents Introduction 3 Choosing Your Race (for Newbies) 3 The Economy 4 Tips & Tricks 6 General Tips 7 Battle Tips 8 How to Improve Your

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

Artificial Intelligence ( CS 365 ) IMPLEMENTATION OF AI SCRIPT GENERATOR USING DYNAMIC SCRIPTING FOR AOE2 GAME

Artificial Intelligence ( CS 365 ) IMPLEMENTATION OF AI SCRIPT GENERATOR USING DYNAMIC SCRIPTING FOR AOE2 GAME Artificial Intelligence ( CS 365 ) IMPLEMENTATION OF AI SCRIPT GENERATOR USING DYNAMIC SCRIPTING FOR AOE2 GAME Author: Saurabh Chatterjee Guided by: Dr. Amitabha Mukherjee Abstract: I have implemented

More information

Event:

Event: Raluca D. Gaina @b_gum22 rdgain.github.io Usually people talk about AI as AI bots playing games, and getting very good at it and at dealing with difficult situations us evil researchers put in their ways.

More information

CS 387/680: GAME AI DECISION MAKING. 4/19/2016 Instructor: Santiago Ontañón

CS 387/680: GAME AI DECISION MAKING. 4/19/2016 Instructor: Santiago Ontañón CS 387/680: GAME AI DECISION MAKING 4/19/2016 Instructor: Santiago Ontañón santi@cs.drexel.edu Class website: https://www.cs.drexel.edu/~santi/teaching/2016/cs387/intro.html Reminders Check BBVista site

More information

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Applying Modern Reinforcement Learning to Play Video Games Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Outline Term 1 Review Term 2 Objectives Experiments & Results

More information

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

A Reinforcement Learning Approach for Solving KRK Chess Endgames

A Reinforcement Learning Approach for Solving KRK Chess Endgames A Reinforcement Learning Approach for Solving KRK Chess Endgames Zacharias Georgiou a Evangelos Karountzos a Matthia Sabatelli a Yaroslav Shkarupa a a Rijksuniversiteit Groningen, Department of Artificial

More information

arxiv: v1 [cs.lg] 16 Aug 2017

arxiv: v1 [cs.lg] 16 Aug 2017 StarCraft II: A New Challenge for Reinforcement Learning arxiv:1708.04782v1 [cs.lg] 16 Aug 2017 Oriol Vinyals Timo Ewalds Sergey Bartunov Petko Georgiev Alexander Sasha Vezhnevets Michelle Yeo Alireza

More information

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN FACULTY OF COMPUTING AND INFORMATICS UNIVERSITY MALAYSIA SABAH 2014 ABSTRACT The use of Artificial Intelligence

More information

Learning Character Behaviors using Agent Modeling in Games

Learning Character Behaviors using Agent Modeling in Games Proceedings of the Fifth Artificial Intelligence for Interactive Digital Entertainment Conference Learning Character Behaviors using Agent Modeling in Games Richard Zhao, Duane Szafron Department of Computing

More information

Learning to Play Love Letter with Deep Reinforcement Learning

Learning to Play Love Letter with Deep Reinforcement Learning Learning to Play Love Letter with Deep Reinforcement Learning Madeleine D. Dawson* MIT mdd@mit.edu Robert X. Liang* MIT xbliang@mit.edu Alexander M. Turner* MIT turneram@mit.edu Abstract Recent advancements

More information

Case-Based Goal Formulation

Case-Based Goal Formulation Case-Based Goal Formulation Ben G. Weber and Michael Mateas and Arnav Jhala Expressive Intelligence Studio University of California, Santa Cruz {bweber, michaelm, jhala}@soe.ucsc.edu Abstract Robust AI

More information

Using Artificial intelligent to solve the game of 2048

Using Artificial intelligent to solve the game of 2048 Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial

More information

G51PGP: Software Paradigms. Object Oriented Coursework 4

G51PGP: Software Paradigms. Object Oriented Coursework 4 G51PGP: Software Paradigms Object Oriented Coursework 4 You must complete this coursework on your own, rather than working with anybody else. To complete the coursework you must create a working two-player

More information

STARCRAFT 2 is a highly dynamic and non-linear game.

STARCRAFT 2 is a highly dynamic and non-linear game. JOURNAL OF COMPUTER SCIENCE AND AWESOMENESS 1 Early Prediction of Outcome of a Starcraft 2 Game Replay David Leblanc, Sushil Louis, Outline Paper Some interesting things to say here. Abstract The goal

More information

Opponent Modelling In World Of Warcraft

Opponent Modelling In World Of Warcraft Opponent Modelling In World Of Warcraft A.J.J. Valkenberg 19th June 2007 Abstract In tactical commercial games, knowledge of an opponent s location is advantageous when designing a tactic. This paper proposes

More information

Real-Time Connect 4 Game Using Artificial Intelligence

Real-Time Connect 4 Game Using Artificial Intelligence Journal of Computer Science 5 (4): 283-289, 2009 ISSN 1549-3636 2009 Science Publications Real-Time Connect 4 Game Using Artificial Intelligence 1 Ahmad M. Sarhan, 2 Adnan Shaout and 2 Michele Shock 1

More information

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

Potential-Field Based navigation in StarCraft

Potential-Field Based navigation in StarCraft Potential-Field Based navigation in StarCraft Johan Hagelbäck, Member, IEEE Abstract Real-Time Strategy (RTS) games are a sub-genre of strategy games typically taking place in a war setting. RTS games

More information

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Programming an Othello AI Michael An (man4), Evan Liang (liange) Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black

More information

Learning Artificial Intelligence in Large-Scale Video Games

Learning Artificial Intelligence in Large-Scale Video Games Learning Artificial Intelligence in Large-Scale Video Games A First Case Study with Hearthstone: Heroes of WarCraft Master Thesis Submitted for the Degree of MSc in Computer Science & Engineering Author

More information

Optimal Yahtzee performance in multi-player games

Optimal Yahtzee performance in multi-player games Optimal Yahtzee performance in multi-player games Andreas Serra aserra@kth.se Kai Widell Niigata kaiwn@kth.se April 12, 2013 Abstract Yahtzee is a game with a moderately large search space, dependent on

More information

A Particle Model for State Estimation in Real-Time Strategy Games

A Particle Model for State Estimation in Real-Time Strategy Games Proceedings of the Seventh AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment A Particle Model for State Estimation in Real-Time Strategy Games Ben G. Weber Expressive Intelligence

More information

Five-In-Row with Local Evaluation and Beam Search

Five-In-Row with Local Evaluation and Beam Search Five-In-Row with Local Evaluation and Beam Search Jiun-Hung Chen and Adrienne X. Wang jhchen@cs axwang@cs Abstract This report provides a brief overview of the game of five-in-row, also known as Go-Moku,

More information

Deep Green. System for real-time tracking and playing the board game Reversi. Final Project Submitted by: Nadav Erell

Deep Green. System for real-time tracking and playing the board game Reversi. Final Project Submitted by: Nadav Erell Deep Green System for real-time tracking and playing the board game Reversi Final Project Submitted by: Nadav Erell Introduction to Computational and Biological Vision Department of Computer Science, Ben-Gurion

More information

Integrating Learning in a Multi-Scale Agent

Integrating Learning in a Multi-Scale Agent Integrating Learning in a Multi-Scale Agent Ben Weber Dissertation Defense May 18, 2012 Introduction AI has a long history of using games to advance the state of the field [Shannon 1950] Real-Time Strategy

More information

A Learning Infrastructure for Improving Agent Performance and Game Balance

A Learning Infrastructure for Improving Agent Performance and Game Balance A Learning Infrastructure for Improving Agent Performance and Game Balance Jeremy Ludwig and Art Farley Computer Science Department, University of Oregon 120 Deschutes Hall, 1202 University of Oregon Eugene,

More information

Extending the STRADA Framework to Design an AI for ORTS

Extending the STRADA Framework to Design an AI for ORTS Extending the STRADA Framework to Design an AI for ORTS Laurent Navarro and Vincent Corruble Laboratoire d Informatique de Paris 6 Université Pierre et Marie Curie (Paris 6) CNRS 4, Place Jussieu 75252

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

2 SETUP RULES HOW TO WIN IMPORTANT IMPORTANT CHANGES TO THE BOARD. 1. Set up the board showing the 3-4 player side.

2 SETUP RULES HOW TO WIN IMPORTANT IMPORTANT CHANGES TO THE BOARD. 1. Set up the board showing the 3-4 player side. RULES 2 SETUP Rules: Follow all rules for Cry Havoc, with the exceptions listed below. # of Players: 1. This is a solo mission! The Trogs are controlled using a simple set of rules. The human player is

More information

Playing Atari Games with Deep Reinforcement Learning

Playing Atari Games with Deep Reinforcement Learning Playing Atari Games with Deep Reinforcement Learning 1 Playing Atari Games with Deep Reinforcement Learning Varsha Lalwani (varshajn@iitk.ac.in) Masare Akshay Sunil (amasare@iitk.ac.in) IIT Kanpur CS365A

More information

Approximation Models of Combat in StarCraft 2

Approximation Models of Combat in StarCraft 2 Approximation Models of Combat in StarCraft 2 Ian Helmke, Daniel Kreymer, and Karl Wiegand Northeastern University Boston, MA 02115 {ihelmke, dkreymer, wiegandkarl} @gmail.com December 3, 2012 Abstract

More information

arxiv: v1 [cs.ai] 9 Aug 2012

arxiv: v1 [cs.ai] 9 Aug 2012 Experiments with Game Tree Search in Real-Time Strategy Games Santiago Ontañón Computer Science Department Drexel University Philadelphia, PA, USA 19104 santi@cs.drexel.edu arxiv:1208.1940v1 [cs.ai] 9

More information

Playing Othello Using Monte Carlo

Playing Othello Using Monte Carlo June 22, 2007 Abstract This paper deals with the construction of an AI player to play the game Othello. A lot of techniques are already known to let AI players play the game Othello. Some of these techniques

More information

The game of Reversi was invented around 1880 by two. Englishmen, Lewis Waterman and John W. Mollett. It later became

The game of Reversi was invented around 1880 by two. Englishmen, Lewis Waterman and John W. Mollett. It later became Reversi Meng Tran tranm@seas.upenn.edu Faculty Advisor: Dr. Barry Silverman Abstract: The game of Reversi was invented around 1880 by two Englishmen, Lewis Waterman and John W. Mollett. It later became

More information

High-Level Representations for Game-Tree Search in RTS Games

High-Level Representations for Game-Tree Search in RTS Games Artificial Intelligence in Adversarial Real-Time Games: Papers from the AIIDE Workshop High-Level Representations for Game-Tree Search in RTS Games Alberto Uriarte and Santiago Ontañón Computer Science

More information

Hierarchical Controller for Robotic Soccer

Hierarchical Controller for Robotic Soccer Hierarchical Controller for Robotic Soccer Byron Knoll Cognitive Systems 402 April 13, 2008 ABSTRACT RoboCup is an initiative aimed at advancing Artificial Intelligence (AI) and robotics research. This

More information

Tetris: A Heuristic Study

Tetris: A Heuristic Study Tetris: A Heuristic Study Using height-based weighing functions and breadth-first search heuristics for playing Tetris Max Bergmark May 2015 Bachelor s Thesis at CSC, KTH Supervisor: Örjan Ekeberg maxbergm@kth.se

More information

AI Approaches to Ultimate Tic-Tac-Toe

AI Approaches to Ultimate Tic-Tac-Toe AI Approaches to Ultimate Tic-Tac-Toe Eytan Lifshitz CS Department Hebrew University of Jerusalem, Israel David Tsurel CS Department Hebrew University of Jerusalem, Israel I. INTRODUCTION This report is

More information

Creating a Poker Playing Program Using Evolutionary Computation

Creating a Poker Playing Program Using Evolutionary Computation Creating a Poker Playing Program Using Evolutionary Computation Simon Olsen and Rob LeGrand, Ph.D. Abstract Artificial intelligence is a rapidly expanding technology. We are surrounded by technology that

More information

Case-Based Goal Formulation

Case-Based Goal Formulation Case-Based Goal Formulation Ben G. Weber and Michael Mateas and Arnav Jhala Expressive Intelligence Studio University of California, Santa Cruz {bweber, michaelm, jhala}@soe.ucsc.edu Abstract Robust AI

More information

Cooperative Learning by Replay Files in Real-Time Strategy Game

Cooperative Learning by Replay Files in Real-Time Strategy Game Cooperative Learning by Replay Files in Real-Time Strategy Game Jaekwang Kim, Kwang Ho Yoon, Taebok Yoon, and Jee-Hyong Lee 300 Cheoncheon-dong, Jangan-gu, Suwon, Gyeonggi-do 440-746, Department of Electrical

More information

FINAL PROJECT ARTIFICIAL INTELLIGENCE VINDINIUM. Hosam Hakroush and Dmitry Levikov FOUR LEGENDARY HEROES, FIGHTING FOR THE LAND OF VINDINIUM

FINAL PROJECT ARTIFICIAL INTELLIGENCE VINDINIUM. Hosam Hakroush and Dmitry Levikov FOUR LEGENDARY HEROES, FIGHTING FOR THE LAND OF VINDINIUM FINAL PROJECT ARTIFICIAL INTELLIGENCE Hosam Hakroush and Dmitry Levikov VINDINIUM FOUR LEGENDARY HEROES, FIGHTING FOR THE LAND OF VINDINIUM ROAMING IN THE DANGEROUS WOODS SLASHING GOBLINS, STEALING GOLD

More information

Drafting Territories in the Board Game Risk

Drafting Territories in the Board Game Risk Drafting Territories in the Board Game Risk Presenter: Richard Gibson Joint Work With: Neesha Desai and Richard Zhao AIIDE 2010 October 12, 2010 Outline Risk Drafting territories How to draft territories

More information

Using Automated Replay Annotation for Case-Based Planning in Games

Using Automated Replay Annotation for Case-Based Planning in Games Using Automated Replay Annotation for Case-Based Planning in Games Ben G. Weber 1 and Santiago Ontañón 2 1 Expressive Intelligence Studio University of California, Santa Cruz bweber@soe.ucsc.edu 2 IIIA,

More information

Comparing Methods for Solving Kuromasu Puzzles

Comparing Methods for Solving Kuromasu Puzzles Comparing Methods for Solving Kuromasu Puzzles Leiden Institute of Advanced Computer Science Bachelor Project Report Tim van Meurs Abstract The goal of this bachelor thesis is to examine different methods

More information

CS 680: GAME AI WEEK 4: DECISION MAKING IN RTS GAMES

CS 680: GAME AI WEEK 4: DECISION MAKING IN RTS GAMES CS 680: GAME AI WEEK 4: DECISION MAKING IN RTS GAMES 2/6/2012 Santiago Ontañón santi@cs.drexel.edu https://www.cs.drexel.edu/~santi/teaching/2012/cs680/intro.html Reminders Projects: Project 1 is simpler

More information

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Valentijn Muijrers 3275183 Valentijn.Muijrers@phil.uu.nl Supervisor: Gerard Vreeswijk 7,5 ECTS

More information

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH Santiago Ontañón so367@drexel.edu Recall: Adversarial Search Idea: When there is only one agent in the world, we can solve problems using DFS, BFS, ID,

More information

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1 Announcements Homework 1 Due tonight at 11:59pm Project 1 Electronic HW1 Written HW1 Due Friday 2/8 at 4:00pm CS 188: Artificial Intelligence Adversarial Search and Game Trees Instructors: Sergey Levine

More information

Training Neural Networks for Checkers

Training Neural Networks for Checkers Training Neural Networks for Checkers Daniel Boonzaaier Supervisor: Adiel Ismail 2017 Thesis presented in fulfilment of the requirements for the degree of Bachelor of Science in Honours at the University

More information

Artificial Intelligence. Cameron Jett, William Kentris, Arthur Mo, Juan Roman

Artificial Intelligence. Cameron Jett, William Kentris, Arthur Mo, Juan Roman Artificial Intelligence Cameron Jett, William Kentris, Arthur Mo, Juan Roman AI Outline Handicap for AI Machine Learning Monte Carlo Methods Group Intelligence Incorporating stupidity into game AI overview

More information

Training a Neural Network for Checkers

Training a Neural Network for Checkers Training a Neural Network for Checkers Daniel Boonzaaier Supervisor: Adiel Ismail June 2017 Thesis presented in fulfilment of the requirements for the degree of Bachelor of Science in Honours at the University

More information

CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9

CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9 CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9 Learning to play blackjack In this assignment, you will implement

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Adversarial Search Prof. Scott Niekum The University of Texas at Austin [These slides are based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Software Development of the Board Game Agricola

Software Development of the Board Game Agricola CARLETON UNIVERSITY Software Development of the Board Game Agricola COMP4905 Computer Science Honours Project Robert Souter Jean-Pierre Corriveau Ph.D., Associate Professor, School of Computer Science

More information

CS221 Final Project Report Learn to Play Texas hold em

CS221 Final Project Report Learn to Play Texas hold em CS221 Final Project Report Learn to Play Texas hold em Yixin Tang(yixint), Ruoyu Wang(rwang28), Chang Yue(changyue) 1 Introduction Texas hold em, one of the most popular poker games in casinos, is a variation

More information

Making Simple Decisions CS3523 AI for Computer Games The University of Aberdeen

Making Simple Decisions CS3523 AI for Computer Games The University of Aberdeen Making Simple Decisions CS3523 AI for Computer Games The University of Aberdeen Contents Decision making Search and Optimization Decision Trees State Machines Motivating Question How can we program rules

More information

Board Game AIs. With a Focus on Othello. Julian Panetta March 3, 2010

Board Game AIs. With a Focus on Othello. Julian Panetta March 3, 2010 Board Game AIs With a Focus on Othello Julian Panetta March 3, 2010 1 Practical Issues Bug fix for TimeoutException at player init Not an issue for everyone Download updated project files from CS2 course

More information

Discussion of Emergent Strategy

Discussion of Emergent Strategy Discussion of Emergent Strategy When Ants Play Chess Mark Jenne and David Pick Presentation Overview Introduction to strategy Previous work on emergent strategies Pengi N-puzzle Sociogenesis in MANTA colonies

More information

Project 1: Game of Bricks

Project 1: Game of Bricks Project 1: Game of Bricks Game Description This is a game you play with a ball and a flat paddle. A number of bricks are lined up at the top of the screen. As the ball bounces up and down you use the paddle

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Adversarial Search Instructor: Stuart Russell University of California, Berkeley Game Playing State-of-the-Art Checkers: 1950: First computer player. 1959: Samuel s self-taught

More information

Tutorial of Reinforcement: A Special Focus on Q-Learning

Tutorial of Reinforcement: A Special Focus on Q-Learning Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model

More information

Decision Making in Multiplayer Environments Application in Backgammon Variants

Decision Making in Multiplayer Environments Application in Backgammon Variants Decision Making in Multiplayer Environments Application in Backgammon Variants PhD Thesis by Nikolaos Papahristou AI researcher Department of Applied Informatics Thessaloniki, Greece Contributions Expert

More information

Experiments with Tensor Flow Roman Weber (Geschäftsführer) Richard Schmid (Senior Consultant)

Experiments with Tensor Flow Roman Weber (Geschäftsführer) Richard Schmid (Senior Consultant) Experiments with Tensor Flow 23.05.2017 Roman Weber (Geschäftsführer) Richard Schmid (Senior Consultant) WEBGATE CONSULTING Gegründet Mitarbeiter CH Inhaber geführt IT Anbieter Partner 2001 Ex 29 Beratung

More information

Tic-Tac-Toe and machine learning. David Holmstedt Davho G43

Tic-Tac-Toe and machine learning. David Holmstedt Davho G43 Tic-Tac-Toe and machine learning David Holmstedt Davho304 729G43 Table of Contents Introduction... 1 What is tic-tac-toe... 1 Tic-tac-toe Strategies... 1 Search-Algorithms... 1 Machine learning... 2 Weights...

More information

GLOSSARY USING THIS REFERENCE THE GOLDEN RULES ACTION CARDS ACTIVATING SYSTEMS

GLOSSARY USING THIS REFERENCE THE GOLDEN RULES ACTION CARDS ACTIVATING SYSTEMS TM TM USING THIS REFERENCE This document is intended as a reference for all rules queries. It is recommended that players begin playing Star Wars: Rebellion by reading the Learn to Play booklet in its

More information

Starcraft Invasions a solitaire game. By Eric Pietrocupo January 28th, 2012 Version 1.2

Starcraft Invasions a solitaire game. By Eric Pietrocupo January 28th, 2012 Version 1.2 Starcraft Invasions a solitaire game By Eric Pietrocupo January 28th, 2012 Version 1.2 Introduction The Starcraft board game is very complex and long to play which makes it very hard to find players willing

More information

Learning Companion Behaviors Using Reinforcement Learning in Games

Learning Companion Behaviors Using Reinforcement Learning in Games Learning Companion Behaviors Using Reinforcement Learning in Games AmirAli Sharifi, Richard Zhao and Duane Szafron Department of Computing Science, University of Alberta Edmonton, AB, CANADA T6G 2H1 asharifi@ualberta.ca,

More information

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters CS 188: Artificial Intelligence Spring 2011 Announcements W1 out and due Monday 4:59pm P2 out and due next week Friday 4:59pm Lecture 7: Mini and Alpha-Beta Search 2/9/2011 Pieter Abbeel UC Berkeley Many

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Bart Selman Reinforcement Learning R&N Chapter 21 Note: in the next two parts of RL, some of the figure/section numbers refer to an earlier edition of R&N

More information

Outline. Introduction to AI. Artificial Intelligence. What is an AI? What is an AI? Agents Environments

Outline. Introduction to AI. Artificial Intelligence. What is an AI? What is an AI? Agents Environments Outline Introduction to AI ECE457 Applied Artificial Intelligence Fall 2007 Lecture #1 What is an AI? Russell & Norvig, chapter 1 Agents s Russell & Norvig, chapter 2 ECE457 Applied Artificial Intelligence

More information

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

An analysis of Cannon By Keith Carter

An analysis of Cannon By Keith Carter An analysis of Cannon By Keith Carter 1.0 Deploying for Battle Town Location The initial placement of the towns, the relative position to their own soldiers, enemy soldiers, and each other effects the

More information

Electronic Research Archive of Blekinge Institute of Technology

Electronic Research Archive of Blekinge Institute of Technology Electronic Research Archive of Blekinge Institute of Technology http://www.bth.se/fou/ This is an author produced version of a conference paper. The paper has been peer-reviewed but may not include the

More information

Taffy Tangle. cpsc 231 assignment #5. Due Dates

Taffy Tangle. cpsc 231 assignment #5. Due Dates cpsc 231 assignment #5 Taffy Tangle If you ve ever played casual games on your mobile device, or even on the internet through your browser, chances are that you ve spent some time with a match three game.

More information

SDS PODCAST EPISODE 110 ALPHAGO ZERO

SDS PODCAST EPISODE 110 ALPHAGO ZERO SDS PODCAST EPISODE 110 ALPHAGO ZERO Show Notes: http://www.superdatascience.com/110 1 Kirill: This is episode number 110, AlphaGo Zero. Welcome back ladies and gentlemen to the SuperDataSceince podcast.

More information

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of Table of Contents Game Mechanics...2 Game Play...3 Game Strategy...4 Truth...4 Contrapositive... 5 Exhaustion...6 Burnout...8 Game Difficulty... 10 Experiment One... 12 Experiment Two...14 Experiment Three...16

More information

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( ) COMP3211 Project Artificial Intelligence for Tron game Group 7 Chiu Ka Wa (20369737) Chun Wai Wong (20265022) Ku Chun Kit (20123470) Abstract Tron is an old and popular game based on a movie of the same

More information

Sequential Pattern Mining in StarCraft:Brood War for Short and Long-term Goals

Sequential Pattern Mining in StarCraft:Brood War for Short and Long-term Goals Sequential Pattern Mining in StarCraft:Brood War for Short and Long-term Goals Anonymous Submitted for blind review Workshop on Artificial Intelligence in Adversarial Real-Time Games AIIDE 2014 Abstract

More information

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017 Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER 2017 April 6, 2017 Upcoming Misc. Check out course webpage and schedule Check out Canvas, especially for deadlines Do the survey by tomorrow,

More information

A Day in the Life CTE Enrichment Grades 3-5 mblock Programs Using the Sensors

A Day in the Life CTE Enrichment Grades 3-5 mblock Programs Using the Sensors Activity 1 - Reading Sensors A Day in the Life CTE Enrichment Grades 3-5 mblock Programs Using the Sensors Computer Science Unit This tutorial teaches how to read values from sensors in the mblock IDE.

More information

2048: An Autonomous Solver

2048: An Autonomous Solver 2048: An Autonomous Solver Final Project in Introduction to Artificial Intelligence ABSTRACT. Our goal in this project was to create an automatic solver for the wellknown game 2048 and to analyze how different

More information

Part II: Number Guessing Game Part 2. Lab Guessing Game version 2.0

Part II: Number Guessing Game Part 2. Lab Guessing Game version 2.0 Part II: Number Guessing Game Part 2 Lab Guessing Game version 2.0 The Number Guessing Game that just created had you utilize IF statements and random number generators. This week, you will expand upon

More information

Comp 3211 Final Project - Poker AI

Comp 3211 Final Project - Poker AI Comp 3211 Final Project - Poker AI Introduction Poker is a game played with a standard 52 card deck, usually with 4 to 8 players per game. During each hand of poker, players are dealt two cards and must

More information

Rock, Paper, Scissors

Rock, Paper, Scissors Projects Rock, Paper, Scissors Create your own 'Rock, Paper Scissors' game. Python Step 1 Introduction In this project you will make a Rock, Paper, Scissors game and play against the computer. Rules: You

More information

Evolving Behaviour Trees for the Commercial Game DEFCON

Evolving Behaviour Trees for the Commercial Game DEFCON Evolving Behaviour Trees for the Commercial Game DEFCON Chong-U Lim, Robin Baumgarten and Simon Colton Computational Creativity Group Department of Computing, Imperial College, London www.doc.ic.ac.uk/ccg

More information

Game AI Challenges: Past, Present, and Future

Game AI Challenges: Past, Present, and Future Game AI Challenges: Past, Present, and Future Professor Michael Buro Computing Science, University of Alberta, Edmonton, Canada www.skatgame.net/cpcc2018.pdf 1/ 35 AI / ML Group @ University of Alberta

More information

Game Artificial Intelligence ( CS 4731/7632 )

Game Artificial Intelligence ( CS 4731/7632 ) Game Artificial Intelligence ( CS 4731/7632 ) Instructor: Stephen Lee-Urban http://www.cc.gatech.edu/~surban6/2018-gameai/ (soon) Piazza T-square What s this all about? Industry standard approaches to

More information

CS221 Project Final Report Automatic Flappy Bird Player

CS221 Project Final Report Automatic Flappy Bird Player 1 CS221 Project Final Report Automatic Flappy Bird Player Minh-An Quinn, Guilherme Reis Introduction Flappy Bird is a notoriously difficult and addicting game - so much so that its creator even removed

More information

Automatic Wordfeud Playing Bot

Automatic Wordfeud Playing Bot Automatic Wordfeud Playing Bot Authors: Martin Berntsson, Körsbärsvägen 4 C, 073-6962240, mbernt@kth.se Fredric Ericsson, Adolf Lemons väg 33, 073-4224662, fericss@kth.se Course: Degree Project in Computer

More information