AI Agent for Ants vs. SomeBees: Final Report

Size: px
Start display at page:

Download "AI Agent for Ants vs. SomeBees: Final Report"

Transcription

1 CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 1 AI Agent for Ants vs. SomeBees: Final Report Wanyi Qian, Yundong Zhang, Xiaotong Duan Abstract This project aims to build a real-time game playing AI agent for Ants vs. SomeBees that can beat human agent. Specifically, we implemented a search algorithm using A* and reinforcement learning using expectimax with TD learning to solve this tower defense game. Win rate shows a significant increase when using AI algorithms comparing with our baseline model. A* search gives a decent result but faces computational inefficiency and lack of look ahead, and reinforcement learning is a good alternative to address these issues. Key: Tower Defense Game, Search, Reinforcement Learning I. INTRODUCTION Ants vs. SomeBees [1] is a type of Tower Defense game [2] and a simpler version of the classic Plants vs. Zombies game [3]. The goal of a tower defense game is to keep the invaders from reaching a certain location on the board, and commonly the innermost point of the defenders region. The game typically consists of three main elements, the invader, the defender and the target, and the result is determined by whether the defenders have eliminated all the invaders or the invaders have reached the target. In Ants vs. Somebees, these would be the bees, the ants, and the ant queen, respectively. The excitement of a tower defense game is that it utilizes realtime strategic planning of the player. In Ants vs. SomeBees, the ultimate goal is to keep the ant queen alive by strategically placing the ant soldiers to defend the invading bees, which enters the game board with predefined probability. Since there is a cost associated with each type of ant and the player is subject to constrained resource, a strategy needs to be developed in order to utilize resource well. In previous research [4], the game was modeled as a search problem. In their model, the states contained the existence and location of ants and bees on the board, the amount of food, and the number of turns since the beginning of the game. The authors concluded that the reinforcement learning should not be considered due to the lack of Markov property according to the state definition, and A* was further analyzed for AI agent development. The relaxation problem assumed no additional bees entering the board for each turn. The time series property of the game was approached by setting up sub-goal for each turn as eliminating all bees on current board. By setting the recalculation and execution rate according to the evasion time of bees, the problem was converted to be deterministic for each sub-problem. However, as mentioned in the paper [4], the definition of problem relaxation led to the gap between solving the sub-problem and the entire game. Therefore, the heuristics were introduced to narrow the gap. The cost of action was defined to be 1 for each turn. And five heuristics (Strongest Ants Needed, Food, Fire power, Closest Bee, Bee Armor) were introduced to address the issue of only reaching a local optimum for sub-problem instead of global optimum for the entire game. As the result of the research [4], the Bee Armor heuristic provided the highest win rate (98%) among the five heuristics. Moreover, the bee-based heuristics performed better than antbased heuristics, and non-admissible heuristics had promising result according to the result. However, the previous research didn t provide their board condition in detail (board size). Therefore, the promising result mentioned in the paper was not justified completely. II. MODEL (SCOPE DEFINITION) In this section, we will define the setup of the game. Game Board The game board consists of 3 lanes and 8 tiles in each lane. Ants There are 2 types of ants that can be placed on the board, each has a specific food cost, armor value, damage and ability. The ant queen will automatically be places on the left, out of the board at the start of the game. See Table 1 for details of the ants. One requirement for any type of ants is that once the ant is placed, it cannot be moved to other positions or removed from the board by the player. It will only be eliminated by the bees. Insect Type Food Cost TABLE I INSECT TYPES Harvester Thrower Bee N/A 3 1 Armor Damage Skill generates 1 food per turn attack one position ahead attack the ant in the same position Game Rules A single game consists of a series of turns. At each turn, a bee may enter a row from the right side with certain probability, and all rows lead to the ant queens at the end. During one turn, each thrower ant placed on the board will take action of throwing one leaf at the leading bee on its row, harvester ant will generate one food unit, and all bees will move one block to the left. The player will decide whether to place an ant on the board and the type of ant given the current board condition. Only one ant can be placed at each turn. The end result of the game depends on whether the ants survived the attack or the bees have reached the ant queen.

2 CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 2 There is one special effect when a bee enters a block that has an thrower ant. The bee will stun the ant, thus killing it and removing it from the board, but the ant will also slow down the bee s action by one time unit. Thus the bee will remain in its block for the next turn. The following figures show the start of the game and a sample display of after a few turns. Game Conditions We tested all the models under three game conditions: easy, medium, and hard. For all of the modes, the game start with 4 food units. The easy mode is composed of one wave of 2 bees at time 2, three waves of 1 bee at time 3, 8, 13, and one wave of 2 bees at time 15; the medium mode is composed of five waves of 2 bees at the same time steps as easy mode; the hard mode is composed of one wave of 2 bees, three waves of 5 bees, and one wave of 3 bees at the above time steps. III. GOAL The goal of this project is to implement AI agent using search and reinforcement learning to play this game and compare the performance of the agents with the baseline model and oracle. IV. ORACLE As there is no Agent, a top human player will give the oracle in this case. As data about this game is insufficient, after being familiar with the game, we played it multiple times and counted the win rate by ourselves. TABLE II WIN RATE FOR ORACLE Oracle V. BASELINE AND RESULT To simplify the problem for baseline, we broke down the goal of solving the whole game into finding the optimal strategy of the subgame (each turn). For each turn, we assumed that there won t be other bees entering the game board. Moreover, we assumed all the tiles in the game board are legal for deployment of ants. For the baseline strategy, we implemented a greedy algorithm aiming to always deploy ants on the row where the difference between the number of bees and ants are the largest. Moreover, to ensure there will be enough food for the greedy algorithm, we set up a ratio of number of harvesters and number of throwers which must be fulfilled before following greedy algorithms to deploy ants. Previously, the original Ants vs. SomeBees used location of ants, location of bees, amount of food, ant types, and time as the state of the game. In order to implement our greedy algorithm, the amount of harvester, amount of throwers, and amount of ants and bees in each row of the board are added as part of the state of the game. For each ant turn of the game, first, the ratio of harvester and thrower will be calculated. If the ratio is below the threshold, the choice of ants will be enforced to be harvester to ensure there are enough amount of food for future attacks, otherwise the thrower will be chosen as the choice of ant type in current turn. Because the game board is a 3 * 8 grid, the location of deployment of ant should be carefully considered. In our greedy algorithm, we defined the place with high weight is the place: 1. Column: as close as possible to the ant queen (left most of the board); 2. Row: amount of bees - amount of ants is the largest among all the rows. The win rate of our baseline is shown below. TABLE III WIN RATE FOR BASELINE Baseline Note that we found the ratio between number of harvesters and throwers is an important factor affecting the win rate. Specifically, the optimal ratio under easy mode will lead to a significant lack of food in the hard mode. Possible reasons could be that under the easy mode, the number of bees in each wave is small, the defined ratio preferred to be about 1. However, under hard mode, the amount of bees in some waves were significantly higher than those in easy mode, which required the agent to place more throwers than harvesters compared to the strategy under easy mode. A. Search Game State VI. ALGORITHM AND RESULT As a natural extension of the conditions we used in setting the baseline algorithm, we can also optimize the game with a search method using a similar heuristic. The challenge though is that the game board changes when new bees enter the colony

3 CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 3 and thus the best strategy for one board may not be as good for the next one. The assumption we made to address this issue is the same as in the baseline, assuming no bees will enter the board in future turns, which makes the game deterministic and thus solvable using a search algorithm. Parameters we used are defined as: state(s) = number of bees on the board and their position, number of ants on the board and their type and position, amount of food s start = initial state: current game board isend(s) = end state: number of bees on the board == 0 { place an ant: the type of ant and location Actions(s) = not to place an ant (1) Cost(s,a) = 1) 0 if at the end 2) heuristic cost if not at the end The heuristic we used is that we looked at the difference between armor and damage power of bees and ants, and calculated total turns needed to eliminate all bees on the game board. The algorithm is shown below: For each row: bee.armor = total sum of each bees armor ant.firepower = type of ants * their firepower turnsneeded = bee.armor / ant.firepower return sum(turnsneeded) To confirm the consistency of this heuristic, we need to make sure the assumptions we made would relax the original problem. Using this heuristic, we do not consider how close the bees are to the ant queen and how this would affect the game state, thus we remove the constraint that the bees cannot pass over the left, i.e. we do not constrain on how long the row has to be, and this would make the problem a relaxation of the original. Therefore, the heuristic we used is consistent. Giving all the parameters, the algorithm can generate the optimal solution relating to the current state, and when new bees entering the board, the algorithm will need to be rerun to generate a new solution. The win rate under this condition is given below: TABLE IV WIN RATE FOR A* A* The win rate using A* algorithm does not show much improvement compared with the baseline model. (Performance is poorer than the baseline s for hard mode.) We noticed that the poor performance could come from the limitations of this algorithm. The first limitation is lack of look ahead. Unlike the logic human adopts when playing such defense game, which is to plan not only for current state but also account for the future, the agent does not recognize that action made at current turn would influence future turns. For example, in the first turn when no bees enter the board, instead of planting a harvester (as is what a human player usually does), the agent simply does nothing since there is no bees on the board yet. And when all the bees are eliminated from the board, instead of planting a harvester or a thrower to prepare for future attacks, the agent again does nothing. This lack of planning ahead will limit the success rate of the agent in a more intense game. The second limitation is the computational inefficiency. Since when searching through the paths to reach the end goal, the state space expands exponentially as number of bees on the board increases. Since we did not add time constraint on each turn, When running the hard mode, the average time needed to make one move can be up to several minutes. Although A* is a powerful tool for such search game, it s performance is severely limited as the search space expands. To address these issues, we added on certain constraints that need to be satisfied before generating new actions using A*. The first constraint we added is if there are no harvester placed yet, a harvester will be placed on the board. This means at time step one, a harvester will be placed. (Note in all of the game conditions, the first wave of bees will enter at time step two.) The second constraint we imposed is similar to the one we did in baseline. If the ratio of number of throwers relative to number of harvesters is greater than 3, then a harvester should be placed. This make sure there will be enough food source for future attacks. Only if the game condition satisfies the above constraints, A* algorithm will be used in determine the next step action. The win rate under this condition is shown below: TABLE V WIN RATE FOR A* WITH CONSTRAINTS ADDED A* with constraints We see that together with the constraints, the performance of A* is dramatically improved and the computation is much more efficient as well. One thing to note though is that adding constraints is an extension of using domain knowledge. The constraints we added mainly serve the purpose of having enough food resource for future turns because we are aware of future attacks. Since at the start of game, there are only 4 food units, food source becomes a very stringent factor in determining what actions the agent can take. And if we do not make sure there are enough food in each turn, we get the first result (poor performance), since A* agent does not see the complete picture and only consider current state. And it will not choose the optimal action if there is not enough food to support it. But once we added the constraints, thus making sure food source is not as stringent as before, we see the performance of A* agent increases significantly. Since the agent has more resource to utilize and have more feasible actions. B. Adversarial Game State Search 1) Expectimax With Manually Defined Evaluation Function: Next, we wanted to test the performance of reinforce-

4 CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 4 ment learning. If we define the state = (Board Layout, Food, # of bees left), and the action (ant type, location), then the problem will become a Markov-decision process (MDP). Nevertheless, this game does not have a specific reward for taking action. It only generates two results win or lose, which is all that matters. Thus, typical ways to solve MDP such as linear or dynamic programming is impossible. As stated in the project proposal, we mainly concern two candidates for approaching this problem: search tree or reinforcement learning. At this stage, we have implemented some searching algorithms, and are planing on the extension to Q-learning. For the following part of this section, we will describe the searching methods in details. 1. Expecitimax with Evaluation function In this game, every turn the agent can either choose a type of ant to place, or simply doing nothing; while the adversary bees will enter the board randomly. Once a bee enters the board, it will go with a fixed action. Intuitively, an expectimax search can be used in building the searching tree. Since the policy of the bees are fixed, we can write the search strategy as: V max,bee (s, d) = Evaluation(s), d = 0 max a V max,bees (Succ(s,a), d), Agent Move a π bees(s,a)v max,bees (Succ(s,a), d 1)), Bees move (2) where a LegalActions(s) Here d is the search depth, and the output of the evaluation function is considered to be the reward, given the current state which contains the information about Board Layout, food and number of bees left. To implement this search method, there are mainly two tasks for the team: the first is to change the original game structure and the second is to determine a good searching strategy. The original design of the game is designed to only consider the input from human player, so it contains minimum information about the board. In order to let us play around with the algorithm, we spent a decent amount of time in modifying the original code (including the GUI), so as to ensure that the game can be controlled by the agent. Specifically, we builds two separate classes (ant and bee) which takes action and place as input and do something in each turn. We also re-code the game board so the board information is easier to obtain. For the searching, the logic is exactly the same as described in (1). However, it is obvious that a plain search will easily run out of time (here we set the decision time of the agent is 1 second per turn), as there are 3 8 locations in the game board. The time complexity is O((3 8 3 N) d ) per turn, where N is the number of types of the ants. Hence, a depth of 2 or 3 is almost the limit of the plain search. Nevertheless, there are some pruning we can do here. An obvious way is the deploy location. Obviously, ants should be deployed as far away as the hive of the bee, so as to increase their life span. The ideal game board will look like: a bunch of Harvester working at the leftmost side of the game board, while the attacker ants face directly to the bees. Hence, the deploying strategy for harvester ants should be: as left as possible, and the strategy for attacker bees should be: the maximum range that they can attack the nearest bees (if there exists). With this, we are able to run a depth-2 Expectimax search. For the evaluation function, an ideal way is to use Q- learning to come up with some fancy combinations of each feature of the game board. However, due to the complexity, the evaluation function is mainly done by hand-crafted empirical experiments and human intuition. Through many experiments, the design of our evaluation function is as follows: Criteria 0: isend(state) or iswin(state) Criteria 1: there has to be at least one harvester in the game board Criteria 2: in easy and medium mode, maintain the ratio between attacker and harvester to be 1:1 Criteria 3: all the rows should contain at least one ant and penalize the rows where the amount of bees is larger than that of ants Criteria 4: the loss of bees armor for bees on the game board will receive positive reward Criteria 5: the less total armor of all the bees, the better Criteria 6: for all the ants, the closer to the left side, the better TABLE VI WIN RATE FOR EXPECTIMAX Expectimax All of this criteria are reasonably intuitive. We found that with this evaluation function and depth-2 Expectimax search, our algorithm is able to beat the baseline greedy (which is actually a solid baseline). This is mainly because the agent can foresee the following situation. For example, when the game board only contains one harvester and one food, the greedy algorithm will tell the agent stop doing anything till food:4 and then place a thrower. While the Expectimax agent will chose to place another harvester, and then at the end of this turn it will immediately have 2 food. By doing this, the expectimax agent will arrive food:4 at the same time with greedy, but have one more harvester. During the search of Expectimax, we assumed that the agent will know the exactly time and amount of bees will enter the board. With this Expectimax agent, we successfully arrived at the win rate of 42/50, 15/50, 5/50 corresponding to easy, medium and hard mode. We only used two types of ant here, increasing the types of ant or the depth or search will definitely increase the win rate, but the time complexity increases exponentially as well. 2) TD Learning: Reinforcement learning (RL) is another classic way to solve MDP, as it has the power of learning the rewards belong to each action of each state. Also, RL can do much better in terms of the response time after sufficient training. Possible candidates include TD Learning. However, to fully utilize RL, we need to define a good feature extractor so the learning can be generalized. More importantly, the loss function and the update rule needs to be carefully specified

5 CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 5 and evaluated. The general format of these two methods are: w w η (prediction(w) target) w prediction(w) (3) where w is the learned weighted factor, weighting each extracted feature and η is the learning rate. In order to implement TD learning, first, the features were vectorized. The complete features were: 1. number of Thrower on the board 2. ratio of Harvester and Thrower > 2 3. ratio of Harvester and Thrower < 1 4. isend(state) or iswin(state) 5. total armor of bees Total armor of bees on the board 6. sum of the location (column) of deployment of Harvester 7. sum of the location (column) of deployment of Thrower 8. current state score 9. # bees - # ant for each row 10. sum of the location (column) of deployment of Harvester / # Harvester on the board 11. sum of the location (column) of deployment of Thrower / # Thrower on the board 12. danger bees in each row For the danger bees feature, the danger bee was defined as the total armor of bees is larger than that of the ants in the same row, and was calculated as the difference between total armor of the bees and the amount of Throwers in the row times the bees location as a heuristic. In order to increase the converge speed, first, we manually tuned an Expectimax agent with function approximation (features 1, 2, 4, 5, 6, 7, 8, 12) using weight based on our intuition which resulted in 100% win rate in medium level of the game. The weight was used as a prior knowledge influencing the choice of optimal action during the learning process according to some probability (with 30% probability, the agent will choose the optimal action produced by evaluation function utilizing manually tuned weight). Then, the TD learning agent utilizing feature 1, 2, 3, 4, 5, 8, 10, 11, 12 with a warm start of weight was implemented. The same as Expectimax agent, the legal actions of ants were chosen each turn and the resulted value of each action was calculated. With a probability 0.3, the agent will choose the optimal action generated by the inner product of manually tuned weight and corresponding features to speed up the convergence. During the tuning of the model, we noticed that some features such as number of Throwers on the board will increase significantly and close to infinity and some features such as score of current state remained constant. Therefore, we manually deleted those features and the rest of the features (features 2, 3, 4, 5, 8, 12 indicated as full feature later) were used to generate the result. The learning rate was defined as 1/Step Size. The step size would increase as the game time increases, and it was initialized within the range of 100 to 1000 to find the optimal initial value of step size. The discount was set as 1 and the reward was 0 and were held as constant during the learning process. Besides this combination of the features, we also implemented a location feature only contained the location of each Harvester, Thrower, and Bees resulted in a 72*1 vector to analyze the informative of the location of insects. The 7:3 strategy was also used in this TD learning agent to speed up the converge of weight. The results of survival time for loss turn, weight difference in terms of 2 norm, and win/loss of each turn for TD learning agent using full features and location feature were shown below. The survival time for loss turn was smoothed according to medium filter.

6 CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 6 tuning learning rate in different mode. For example, the optimal learning rate in medium mode might not be the optimal learning rate in hard mode. In our approach, the initial learning rate was tuned manually within the range of 0.01 to and we noticed that the win rate increased and decreased when the learning rate decreased from 0.01 to TABLE VII WIN RATE WITH TWO TYPE OF FEATURES Feature Type Location Feature (learning rate = 0.001) Full Features (learning rate = 0.001) Easy Medium Hard According to the result, the location information was informative and enough to gain an appropriate win rate in easy and medium mode. However, according to the result of win/loss for each turn and the survival time of loss turn, the TD learning didn t work as expected. In our expectation, the density of win turn should increase along with the iterations, which eventually should look like an horizontal line as y = 1 in the graph. According to our result, it only showed a weak learning trend at the early stage of the location feature method in easy and medium mode, and full features method in medium mode. Possible reason could be that the features we created were not informative enough for learning that the value produced by evaluation function could not indicated the value in the real situation. Moreover, the convexity of the feature space can also influence the convergence of gradient descent algorithm. The feature space generated by our selected features might not be convex so that the gradient descent could fail to some extent, or with a convex feature space, the gradient descent was trapped at some local minimum that led to unstable win status at the later stage of the iteration. Even though adjusting the learning rate helped to improve the win rate from 0.56 to 0.71 for TD learning using full feature in medium mode, it still could not boost the performance to higher win rate. Additionally, since there are various sequential states leading to the end state, if the evaluation of one state is off (i.e. the value of state should be positive but the evaluation function produce a negative value), it might lead to the failure of search and result in low win rate. The large amount of different combinations of states can also make it difficult to win compared to lose. During the learning process,our TD learning suffered from VII. CONCLUSION AND FUTURE WORK In our project, we modified the A* algorithm implemented previously and implemented Expectimax agent and TD learning agent. In conclusion, A* agent alone does not show good performance and it faces the challenge of lacking look ahead and computational inefficiency. We addressed these issues by adding constraints using domain knowledge of the game, in this setting, the agent gives very high performance for easy and medium mode, and decent result for hard mode. Compared to A* algorithm, the Expectimax agent and TD learning agent had a much higher calculation rate, and TD learning agent performed slightly better than Expectimax agent in terms of higher win rate in medium mode. Moreover, we confirmed that the initial learning rate will influence the performance agent in terms of win rate. In order to find an optimal combination of parameters in TD learning (i.e. learning rate and discount), we could wrap up our script with choice of various combination of parameters and return the combination with the highest win rate. Also, we will add location feature and other informative features to full features set to gain a competency in the description of state value. VIII. ACKNOWLEDGMENT We would like to acknowledge the source of the game base code is from the project developed for UC Berkeley CS 61A. We would also like to extend our gratitude to Moses and Clara for giving us access to their code source for this project, and most of the backbone A* code we implemented are based on their work, with our modification of heuristics and constrains and some implementation changes. IX. REFERENCE 1. Ants vs. SomeBees project reference page: 1 inst.eecs.berkeley.edu/ cs61a/su13/projects/ants/ants 2. Tower Defense Game: 3. Plants vs. Zoombies: 4. Leshem, Yotam, et al. Plants vs. Zombies: Introduction to AI Final Project. Hebrew University of Jerusalem. Manuscript.

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.

More information

CS221 Project Final Report Automatic Flappy Bird Player

CS221 Project Final Report Automatic Flappy Bird Player 1 CS221 Project Final Report Automatic Flappy Bird Player Minh-An Quinn, Guilherme Reis Introduction Flappy Bird is a notoriously difficult and addicting game - so much so that its creator even removed

More information

2048: An Autonomous Solver

2048: An Autonomous Solver 2048: An Autonomous Solver Final Project in Introduction to Artificial Intelligence ABSTRACT. Our goal in this project was to create an automatic solver for the wellknown game 2048 and to analyze how different

More information

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( ) COMP3211 Project Artificial Intelligence for Tron game Group 7 Chiu Ka Wa (20369737) Chun Wai Wong (20265022) Ku Chun Kit (20123470) Abstract Tron is an old and popular game based on a movie of the same

More information

AI Approaches to Ultimate Tic-Tac-Toe

AI Approaches to Ultimate Tic-Tac-Toe AI Approaches to Ultimate Tic-Tac-Toe Eytan Lifshitz CS Department Hebrew University of Jerusalem, Israel David Tsurel CS Department Hebrew University of Jerusalem, Israel I. INTRODUCTION This report is

More information

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) AI Plays 2048 Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) Abstract The strategy game 2048 gained great popularity quickly. Although it is easy to play, people cannot win the game easily,

More information

Using Artificial intelligent to solve the game of 2048

Using Artificial intelligent to solve the game of 2048 Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

CS221 Project Final Report Gomoku Game Agent

CS221 Project Final Report Gomoku Game Agent CS221 Project Final Report Gomoku Game Agent Qiao Tan qtan@stanford.edu Xiaoti Hu xiaotihu@stanford.edu 1 Introduction Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally

More information

Mutliplayer Snake AI

Mutliplayer Snake AI Mutliplayer Snake AI CS221 Project Final Report Felix CREVIER, Sebastien DUBOIS, Sebastien LEVY 12/16/2016 Abstract This project is focused on the implementation of AI strategies for a tailor-made game

More information

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Programming an Othello AI Michael An (man4), Evan Liang (liange) Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

CMSC 671 Project Report- Google AI Challenge: Planet Wars

CMSC 671 Project Report- Google AI Challenge: Planet Wars 1. Introduction Purpose The purpose of the project is to apply relevant AI techniques learned during the course with a view to develop an intelligent game playing bot for the game of Planet Wars. Planet

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificial Intelligence Spring 2007 Lecture 7: CSP-II and Adversarial Search 2/6/2007 Srini Narayanan ICSI and UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell or

More information

CS 771 Artificial Intelligence. Adversarial Search

CS 771 Artificial Intelligence. Adversarial Search CS 771 Artificial Intelligence Adversarial Search Typical assumptions Two agents whose actions alternate Utility values for each agent are the opposite of the other This creates the adversarial situation

More information

Monte Carlo based battleship agent

Monte Carlo based battleship agent Monte Carlo based battleship agent Written by: Omer Haber, 313302010; Dror Sharf, 315357319 Introduction The game of battleship is a guessing game for two players which has been around for almost a century.

More information

AI Agents for Playing Tetris

AI Agents for Playing Tetris AI Agents for Playing Tetris Sang Goo Kang and Viet Vo Stanford University sanggookang@stanford.edu vtvo@stanford.edu Abstract Game playing has played a crucial role in the development and research of

More information

CS221 Project: Final Report Raiden AI Agent

CS221 Project: Final Report Raiden AI Agent CS221 Project: Final Report Raiden AI Agent Lu Bian lbian@stanford.edu Yiran Deng yrdeng@stanford.edu Xuandong Lei xuandong@stanford.edu 1 Introduction Raiden is a classic shooting game where the player

More information

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

Learning from Hints: AI for Playing Threes

Learning from Hints: AI for Playing Threes Learning from Hints: AI for Playing Threes Hao Sheng (haosheng), Chen Guo (cguo2) December 17, 2016 1 Introduction The highly addictive stochastic puzzle game Threes by Sirvo LLC. is Apple Game of the

More information

Artificial Intelligence ( CS 365 ) IMPLEMENTATION OF AI SCRIPT GENERATOR USING DYNAMIC SCRIPTING FOR AOE2 GAME

Artificial Intelligence ( CS 365 ) IMPLEMENTATION OF AI SCRIPT GENERATOR USING DYNAMIC SCRIPTING FOR AOE2 GAME Artificial Intelligence ( CS 365 ) IMPLEMENTATION OF AI SCRIPT GENERATOR USING DYNAMIC SCRIPTING FOR AOE2 GAME Author: Saurabh Chatterjee Guided by: Dr. Amitabha Mukherjee Abstract: I have implemented

More information

CS-E4800 Artificial Intelligence

CS-E4800 Artificial Intelligence CS-E4800 Artificial Intelligence Jussi Rintanen Department of Computer Science Aalto University March 9, 2017 Difficulties in Rational Collective Behavior Individual utility in conflict with collective

More information

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written

More information

Playing Othello Using Monte Carlo

Playing Othello Using Monte Carlo June 22, 2007 Abstract This paper deals with the construction of an AI player to play the game Othello. A lot of techniques are already known to let AI players play the game Othello. Some of these techniques

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1

CS 188 Fall Introduction to Artificial Intelligence Midterm 1 CS 188 Fall 2018 Introduction to Artificial Intelligence Midterm 1 You have 120 minutes. The time will be projected at the front of the room. You may not leave during the last 10 minutes of the exam. Do

More information

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen Policy Teaching Through Reward Function Learning Haoqi Zhang, David Parkes, and Yiling Chen School of Engineering and Applied Sciences Harvard University ACM EC 2009 Haoqi Zhang (Harvard University) Policy

More information

For slightly more detailed instructions on how to play, visit:

For slightly more detailed instructions on how to play, visit: Introduction to Artificial Intelligence CS 151 Programming Assignment 2 Mancala!! The purpose of this assignment is to program some of the search algorithms and game playing strategies that we have learned

More information

Outline for today s lecture Informed Search Optimal informed search: A* (AIMA 3.5.2) Creating good heuristic functions Hill Climbing

Outline for today s lecture Informed Search Optimal informed search: A* (AIMA 3.5.2) Creating good heuristic functions Hill Climbing Informed Search II Outline for today s lecture Informed Search Optimal informed search: A* (AIMA 3.5.2) Creating good heuristic functions Hill Climbing CIS 521 - Intro to AI - Fall 2017 2 Review: Greedy

More information

ARTIFICIAL INTELLIGENCE (CS 370D)

ARTIFICIAL INTELLIGENCE (CS 370D) Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-5) ADVERSARIAL SEARCH ADVERSARIAL SEARCH Optimal decisions Min algorithm α-β pruning Imperfect,

More information

HUJI AI Course 2012/2013. Bomberman. Eli Karasik, Arthur Hemed

HUJI AI Course 2012/2013. Bomberman. Eli Karasik, Arthur Hemed HUJI AI Course 2012/2013 Bomberman Eli Karasik, Arthur Hemed Table of Contents Game Description...3 The Original Game...3 Our version of Bomberman...5 Game Settings screen...5 The Game Screen...6 The Progress

More information

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi Learning to Play like an Othello Master CS 229 Project Report December 13, 213 1 Abstract This project aims to train a machine to strategically play the game of Othello using machine learning. Prior to

More information

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Davis Ancona and Jake Weiner Abstract In this report, we examine the plausibility of implementing a NEAT-based solution

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 4: Adversarial Search 10/12/2009 Luke Zettlemoyer Based on slides from Dan Klein Many slides over the course adapted from either Stuart Russell or Andrew

More information

Five-In-Row with Local Evaluation and Beam Search

Five-In-Row with Local Evaluation and Beam Search Five-In-Row with Local Evaluation and Beam Search Jiun-Hung Chen and Adrienne X. Wang jhchen@cs axwang@cs Abstract This report provides a brief overview of the game of five-in-row, also known as Go-Moku,

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

Hierarchical Controller for Robotic Soccer

Hierarchical Controller for Robotic Soccer Hierarchical Controller for Robotic Soccer Byron Knoll Cognitive Systems 402 April 13, 2008 ABSTRACT RoboCup is an initiative aimed at advancing Artificial Intelligence (AI) and robotics research. This

More information

Heads-up Limit Texas Hold em Poker Agent

Heads-up Limit Texas Hold em Poker Agent Heads-up Limit Texas Hold em Poker Agent Nattapoom Asavareongchai and Pin Pin Tea-mangkornpan CS221 Final Project Report Abstract Our project aims to create an agent that is able to play heads-up limit

More information

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN FACULTY OF COMPUTING AND INFORMATICS UNIVERSITY MALAYSIA SABAH 2014 ABSTRACT The use of Artificial Intelligence

More information

Deep Green. System for real-time tracking and playing the board game Reversi. Final Project Submitted by: Nadav Erell

Deep Green. System for real-time tracking and playing the board game Reversi. Final Project Submitted by: Nadav Erell Deep Green System for real-time tracking and playing the board game Reversi Final Project Submitted by: Nadav Erell Introduction to Computational and Biological Vision Department of Computer Science, Ben-Gurion

More information

CPS331 Lecture: Heuristic Search last revised 6/18/09

CPS331 Lecture: Heuristic Search last revised 6/18/09 CPS331 Lecture: Heuristic Search last revised 6/18/09 Objectives: 1. To introduce the use of heuristics in searches 2. To introduce some standard heuristic algorithms 3. To introduce criteria for evaluating

More information

An Intelligent Agent for Connect-6

An Intelligent Agent for Connect-6 An Intelligent Agent for Connect-6 Sagar Vare, Sherrie Wang, Andrea Zanette {svare, sherwang, zanette}@stanford.edu Institute for Computational and Mathematical Engineering Huang Building 475 Via Ortega

More information

AI Learning Agent for the Game of Battleship

AI Learning Agent for the Game of Battleship CS 221 Fall 2016 AI Learning Agent for the Game of Battleship Jordan Ebel (jebel) Kai Yee Wan (kaiw) Abstract This project implements a Battleship-playing agent that uses reinforcement learning to become

More information

Artificial Intelligence Search III

Artificial Intelligence Search III Artificial Intelligence Search III Lecture 5 Content: Search III Quick Review on Lecture 4 Why Study Games? Game Playing as Search Special Characteristics of Game Playing Search Ingredients of 2-Person

More information

Game Theory and Randomized Algorithms

Game Theory and Randomized Algorithms Game Theory and Randomized Algorithms Guy Aridor Game theory is a set of tools that allow us to understand how decisionmakers interact with each other. It has practical applications in economics, international

More information

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

Techniques for Generating Sudoku Instances

Techniques for Generating Sudoku Instances Chapter Techniques for Generating Sudoku Instances Overview Sudoku puzzles become worldwide popular among many players in different intellectual levels. In this chapter, we are going to discuss different

More information

Introduction to Spring 2009 Artificial Intelligence Final Exam

Introduction to Spring 2009 Artificial Intelligence Final Exam CS 188 Introduction to Spring 2009 Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet, double-sided. Please use non-programmable

More information

Swarm AI: A Solution to Soccer

Swarm AI: A Solution to Soccer Swarm AI: A Solution to Soccer Alex Kutsenok Advisor: Michael Wollowski Senior Thesis Rose-Hulman Institute of Technology Department of Computer Science and Software Engineering May 10th, 2004 Definition

More information

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel Foundations of AI 6. Adversarial Search Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard & Bernhard Nebel Contents Game Theory Board Games Minimax Search Alpha-Beta Search

More information

CS188 Spring 2014 Section 3: Games

CS188 Spring 2014 Section 3: Games CS188 Spring 2014 Section 3: Games 1 Nearly Zero Sum Games The standard Minimax algorithm calculates worst-case values in a zero-sum two player game, i.e. a game in which for all terminal states s, the

More information

Introduction to Artificial Intelligence CS 151 Programming Assignment 2 Mancala!! Due (in dropbox) Tuesday, September 23, 9:34am

Introduction to Artificial Intelligence CS 151 Programming Assignment 2 Mancala!! Due (in dropbox) Tuesday, September 23, 9:34am Introduction to Artificial Intelligence CS 151 Programming Assignment 2 Mancala!! Due (in dropbox) Tuesday, September 23, 9:34am The purpose of this assignment is to program some of the search algorithms

More information

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I Adversarial Search and Game- Playing C H A P T E R 6 C M P T 3 1 0 : S P R I N G 2 0 1 1 H A S S A N K H O S R A V I Adversarial Search Examine the problems that arise when we try to plan ahead in a world

More information

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Monte Carlo Tree Search and AlphaGo Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Zero-Sum Games and AI A player s utility gain or loss is exactly balanced by the combined gain or loss of opponents:

More information

Learning and Using Models of Kicking Motions for Legged Robots

Learning and Using Models of Kicking Motions for Legged Robots Learning and Using Models of Kicking Motions for Legged Robots Sonia Chernova and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {soniac, mmv}@cs.cmu.edu Abstract

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence selman@cs.cornell.edu Module: Adversarial Search R&N: Chapter 5 1 Outline Adversarial Search Optimal decisions Minimax α-β pruning Case study: Deep Blue

More information

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS Thong B. Trinh, Anwer S. Bashi, Nikhil Deshpande Department of Electrical Engineering University of New Orleans New Orleans, LA 70148 Tel: (504) 280-7383 Fax:

More information

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search COMP19: Artificial Intelligence COMP19: Artificial Intelligence Dr. Annabel Latham Room.05 Ashton Building Department of Computer Science University of Liverpool Lecture 1: Game Playing 1 Overview Last

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 7: Minimax and Alpha-Beta Search 2/9/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein 1 Announcements W1 out and due Monday 4:59pm P2

More information

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Scott Watson, Andrew Vardy, Wolfgang Banzhaf Department of Computer Science Memorial University of Newfoundland St John s.

More information

An Empirical Evaluation of Policy Rollout for Clue

An Empirical Evaluation of Policy Rollout for Clue An Empirical Evaluation of Policy Rollout for Clue Eric Marshall Oregon State University M.S. Final Project marshaer@oregonstate.edu Adviser: Professor Alan Fern Abstract We model the popular board game

More information

Creating a Dominion AI Using Genetic Algorithms

Creating a Dominion AI Using Genetic Algorithms Creating a Dominion AI Using Genetic Algorithms Abstract Mok Ming Foong Dominion is a deck-building card game. It allows for complex strategies, has an aspect of randomness in card drawing, and no obvious

More information

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms Felix Arnold, Bryan Horvat, Albert Sacks Department of Computer Science Georgia Institute of Technology Atlanta, GA 30318 farnold3@gatech.edu

More information

Game-Playing & Adversarial Search

Game-Playing & Adversarial Search Game-Playing & Adversarial Search This lecture topic: Game-Playing & Adversarial Search (two lectures) Chapter 5.1-5.5 Next lecture topic: Constraint Satisfaction Problems (two lectures) Chapter 6.1-6.4,

More information

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage Richard Kelly and David Churchill Computer Science Faculty of Science Memorial University {richard.kelly, dchurchill}@mun.ca

More information

Solving Coup as an MDP/POMDP

Solving Coup as an MDP/POMDP Solving Coup as an MDP/POMDP Semir Shafi Dept. of Computer Science Stanford University Stanford, USA semir@stanford.edu Adrien Truong Dept. of Computer Science Stanford University Stanford, USA aqtruong@stanford.edu

More information

Optimal Yahtzee performance in multi-player games

Optimal Yahtzee performance in multi-player games Optimal Yahtzee performance in multi-player games Andreas Serra aserra@kth.se Kai Widell Niigata kaiwn@kth.se April 12, 2013 Abstract Yahtzee is a game with a moderately large search space, dependent on

More information

Adversary Search. Ref: Chapter 5

Adversary Search. Ref: Chapter 5 Adversary Search Ref: Chapter 5 1 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans is possible. Many games can be modeled very easily, although

More information

Ar#ficial)Intelligence!!

Ar#ficial)Intelligence!! Introduc*on! Ar#ficial)Intelligence!! Roman Barták Department of Theoretical Computer Science and Mathematical Logic So far we assumed a single-agent environment, but what if there are more agents and

More information

an AI for Slither.io

an AI for Slither.io an AI for Slither.io Jackie Yang(jackiey) Introduction Game playing is a very interesting topic area in Artificial Intelligence today. Most of the recent emerging AI are for turn-based game, like the very

More information

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of Table of Contents Game Mechanics...2 Game Play...3 Game Strategy...4 Truth...4 Contrapositive... 5 Exhaustion...6 Burnout...8 Game Difficulty... 10 Experiment One... 12 Experiment Two...14 Experiment Three...16

More information

USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES

USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES Thomas Hartley, Quasim Mehdi, Norman Gough The Research Institute in Advanced Technologies (RIATec) School of Computing and Information

More information

Chapter 3 Learning in Two-Player Matrix Games

Chapter 3 Learning in Two-Player Matrix Games Chapter 3 Learning in Two-Player Matrix Games 3.1 Matrix Games In this chapter, we will examine the two-player stage game or the matrix game problem. Now, we have two players each learning how to play

More information

10/5/2015. Constraint Satisfaction Problems. Example: Cryptarithmetic. Example: Map-coloring. Example: Map-coloring. Constraint Satisfaction Problems

10/5/2015. Constraint Satisfaction Problems. Example: Cryptarithmetic. Example: Map-coloring. Example: Map-coloring. Constraint Satisfaction Problems 0/5/05 Constraint Satisfaction Problems Constraint Satisfaction Problems AIMA: Chapter 6 A CSP consists of: Finite set of X, X,, X n Nonempty domain of possible values for each variable D, D, D n where

More information

SCRABBLE ARTIFICIAL INTELLIGENCE GAME. CS 297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University

SCRABBLE ARTIFICIAL INTELLIGENCE GAME. CS 297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University SCRABBLE AI GAME 1 SCRABBLE ARTIFICIAL INTELLIGENCE GAME CS 297 Report Presented to Dr. Chris Pollett Department of Computer Science San Jose State University In Partial Fulfillment Of the Requirements

More information

Name: Your EdX Login: SID: Name of person to left: Exam Room: Name of person to right: Primary TA:

Name: Your EdX Login: SID: Name of person to left: Exam Room: Name of person to right: Primary TA: UC Berkeley Computer Science CS188: Introduction to Artificial Intelligence Josh Hug and Adam Janin Midterm I, Fall 2016 This test has 8 questions worth a total of 100 points, to be completed in 110 minutes.

More information

CS 188: Artificial Intelligence. Overview

CS 188: Artificial Intelligence. Overview CS 188: Artificial Intelligence Lecture 6 and 7: Search for Games Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein 1 Overview Deterministic zero-sum games Minimax Limited depth and evaluation

More information

Artificial Intelligence Adversarial Search

Artificial Intelligence Adversarial Search Artificial Intelligence Adversarial Search Adversarial Search Adversarial search problems games They occur in multiagent competitive environments There is an opponent we can t control planning again us!

More information

A Quoridor-playing Agent

A Quoridor-playing Agent A Quoridor-playing Agent P.J.C. Mertens June 21, 2006 Abstract This paper deals with the construction of a Quoridor-playing software agent. Because Quoridor is a rather new game, research about the game

More information

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess Stefan Lüttgen Motivation Learn to play chess Computer approach different than human one Humans search more selective: Kasparov (3-5

More information

CS325 Artificial Intelligence Ch. 5, Games!

CS325 Artificial Intelligence Ch. 5, Games! CS325 Artificial Intelligence Ch. 5, Games! Cengiz Günay, Emory Univ. vs. Spring 2013 Günay Ch. 5, Games! Spring 2013 1 / 19 AI in Games A lot of work is done on it. Why? Günay Ch. 5, Games! Spring 2013

More information

Discussion of Emergent Strategy

Discussion of Emergent Strategy Discussion of Emergent Strategy When Ants Play Chess Mark Jenne and David Pick Presentation Overview Introduction to strategy Previous work on emergent strategies Pengi N-puzzle Sociogenesis in MANTA colonies

More information

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009 By David Anderson SZTAKI (Budapest, Hungary) WPI D2009 1997, Deep Blue won against Kasparov Average workstation can defeat best Chess players Computer Chess no longer interesting Go is much harder for

More information

game tree complete all possible moves

game tree complete all possible moves Game Trees Game Tree A game tree is a tree the nodes of which are positions in a game and edges are moves. The complete game tree for a game is the game tree starting at the initial position and containing

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Adversarial Search Prof. Scott Niekum The University of Texas at Austin [These slides are based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Dota2 is a very popular video game currently.

Dota2 is a very popular video game currently. Dota2 Outcome Prediction Zhengyao Li 1, Dingyue Cui 2 and Chen Li 3 1 ID: A53210709, Email: zhl380@eng.ucsd.edu 2 ID: A53211051, Email: dicui@eng.ucsd.edu 3 ID: A53218665, Email: lic055@eng.ucsd.edu March

More information

Automated Suicide: An Antichess Engine

Automated Suicide: An Antichess Engine Automated Suicide: An Antichess Engine Jim Andress and Prasanna Ramakrishnan 1 Introduction Antichess (also known as Suicide Chess or Loser s Chess) is a popular variant of chess where the objective of

More information

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH Santiago Ontañón so367@drexel.edu Recall: Problem Solving Idea: represent the problem we want to solve as: State space Actions Goal check Cost function

More information

COMP219: Artificial Intelligence. Lecture 13: Game Playing

COMP219: Artificial Intelligence. Lecture 13: Game Playing CMP219: Artificial Intelligence Lecture 13: Game Playing 1 verview Last time Search with partial/no observations Belief states Incremental belief state search Determinism vs non-determinism Today We will

More information

CS151 - Assignment 2 Mancala Due: Tuesday March 5 at the beginning of class

CS151 - Assignment 2 Mancala Due: Tuesday March 5 at the beginning of class CS151 - Assignment 2 Mancala Due: Tuesday March 5 at the beginning of class http://www.clubpenguinsaraapril.com/2009/07/mancala-game-in-club-penguin.html The purpose of this assignment is to program some

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Adversarial Search Instructor: Stuart Russell University of California, Berkeley Game Playing State-of-the-Art Checkers: 1950: First computer player. 1959: Samuel s self-taught

More information

Teaching a Neural Network to Play Konane

Teaching a Neural Network to Play Konane Teaching a Neural Network to Play Konane Darby Thompson Spring 5 Abstract A common approach to game playing in Artificial Intelligence involves the use of the Minimax algorithm and a static evaluation

More information

Outline for this presentation. Introduction I -- background. Introduction I Background

Outline for this presentation. Introduction I -- background. Introduction I Background Mining Spectrum Usage Data: A Large-Scale Spectrum Measurement Study Sixing Yin, Dawei Chen, Qian Zhang, Mingyan Liu, Shufang Li Outline for this presentation! Introduction! Methodology! Statistic and

More information

DeepMind Self-Learning Atari Agent

DeepMind Self-Learning Atari Agent DeepMind Self-Learning Atari Agent Human-level control through deep reinforcement learning Nature Vol 518, Feb 26, 2015 The Deep Mind of Demis Hassabis Backchannel / Medium.com interview with David Levy

More information

2 person perfect information

2 person perfect information Why Study Games? Games offer: Intellectual Engagement Abstraction Representability Performance Measure Not all games are suitable for AI research. We will restrict ourselves to 2 person perfect information

More information

Conversion Masters in IT (MIT) AI as Representation and Search. (Representation and Search Strategies) Lecture 002. Sandro Spina

Conversion Masters in IT (MIT) AI as Representation and Search. (Representation and Search Strategies) Lecture 002. Sandro Spina Conversion Masters in IT (MIT) AI as Representation and Search (Representation and Search Strategies) Lecture 002 Sandro Spina Physical Symbol System Hypothesis Intelligent Activity is achieved through

More information

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1 Announcements Homework 1 Due tonight at 11:59pm Project 1 Electronic HW1 Written HW1 Due Friday 2/8 at 4:00pm CS 188: Artificial Intelligence Adversarial Search and Game Trees Instructors: Sergey Levine

More information

Programming Project 1: Pacman (Due )

Programming Project 1: Pacman (Due ) Programming Project 1: Pacman (Due 8.2.18) Registration to the exams 521495A: Artificial Intelligence Adversarial Search (Min-Max) Lectured by Abdenour Hadid Adjunct Professor, CMVS, University of Oulu

More information