CS221 Project: Final Report Raiden AI Agent

Similar documents
Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

CS221 Project Final Report Automatic Flappy Bird Player

Mutliplayer Snake AI

Reinforcement Learning Agent for Scrolling Shooter Game

Using Artificial intelligent to solve the game of 2048

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

Five-In-Row with Local Evaluation and Beam Search

Monte Carlo based battleship agent

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Artificial Intelligence. Minimax and alpha-beta pruning

Project 1. Out of 20 points. Only 30% of final grade 5-6 projects in total. Extra day: 10%

AI Agent for Ants vs. SomeBees: Final Report

2048: An Autonomous Solver

CandyCrush.ai: An AI Agent for Candy Crush

CS188 Spring 2014 Section 3: Games

Documentation and Discussion

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

Programming an Othello AI Michael An (man4), Evan Liang (liange)

CS510 \ Lecture Ariel Stolerman

CS 229 Final Project: Using Reinforcement Learning to Play Othello

AI Learning Agent for the Game of Battleship

CS221 Project Final Report Gomoku Game Agent

CS 188 Fall Introduction to Artificial Intelligence Midterm 1

A retro space combat game by Chad Fillion. Chad Fillion Scripting for Interactivity ITGM 719: 5/13/13 Space Attack - Retro space shooter game

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search Lecture 7

CS 188: Artificial Intelligence Spring Announcements

AI Approaches to Ultimate Tic-Tac-Toe

Adversarial Search 1

Intuition Mini-Max 2

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

CS 5522: Artificial Intelligence II

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

Artificial Intelligence

CS 188: Artificial Intelligence. Overview

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

AI Agents for Playing Tetris

VACUUM MARAUDERS V1.0

Storyboard for Playing the Game (in detail) Hoang Huynh, Jeremy West, Ioan Ihnatesn

Game Playing State-of-the-Art

Programming Project 1: Pacman (Due )

C# Tutorial Fighter Jet Shooting Game

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence

CS180 Project 5: Centipede

SPACEYARD SCRAPPERS 2-D GAME DESIGN DOCUMENT

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

SPACESHIP (up to 100 points based on ranking)

ARTIFICIAL INTELLIGENCE (CS 370D)

CMSC 372: Artificial Intelligence Lab#1: Designing Pac-Man Agents

Unit 12: Artificial Intelligence CS 101, Fall 2018

Introduction to Artificial Intelligence CS 151 Programming Assignment 2 Mancala!! Due (in dropbox) Tuesday, September 23, 9:34am

Your First Game: Devilishly Easy

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

Artificial Intelligence Adversarial Search

Adversarial Search. Robert Platt Northeastern University. Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA

Artificial Intelligence

Tutorial: A scrolling shooter

Adversarial Search: Game Playing. Reading: Chapter

CSE548, AMS542: Analysis of Algorithms, Fall 2016 Date: Sep 25. Homework #1. ( Due: Oct 10 ) Figure 1: The laser game.

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Tutorial: Creating maze games

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

Ar#ficial)Intelligence!!

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

2 person perfect information

CS 188: Artificial Intelligence Fall AI Applications

Game-playing: DeepBlue and AlphaGo

For slightly more detailed instructions on how to play, visit:

Comp th February Due: 11:59pm, 25th February 2014

Cannon Ball User Manual

mywbut.com Two agent games : alpha beta pruning

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Games and Adversarial Search II

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1

CRYPTOSHOOTER MULTI AGENT BASED SECRET COMMUNICATION IN AUGMENTED VIRTUALITY

CSE 473: Artificial Intelligence Fall Outline. Types of Games. Deterministic Games. Previously: Single-Agent Trees. Previously: Value of a State

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

Playing Atari Games with Deep Reinforcement Learning

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

G54GAM Lab Session 1

Game Maker Tutorial Creating Maze Games Written by Mark Overmars

CS188 Spring 2010 Section 3: Game Trees

CSE 473 Midterm Exam Feb 8, 2018

Introduction to Spring 2009 Artificial Intelligence Final Exam

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

CS 354R: Computer Game Technology

Game-playing AIs: Games and Adversarial Search I AIMA

Game Playing Beyond Minimax. Game Playing Summary So Far. Game Playing Improving Efficiency. Game Playing Minimax using DFS.

CSC 380 Final Presentation. Connect 4 David Alligood, Scott Swiger, Jo Van Voorhis

CS 171, Intro to A.I. Midterm Exam Fall Quarter, 2016

In this project we ll make our own version of the highly popular mobile game Flappy Bird. This project requires Scratch 2.0.

Tac Due: Sep. 26, 2012

GALAXIAN: CSEE 4840 EMBEDDED SYSTEM DESIGN. Galaxian. CSEE 4840 Embedded System Design

CS 771 Artificial Intelligence. Adversarial Search

5.4 Imperfect, Real-Time Decisions

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

CS188 Spring 2010 Section 3: Game Trees

Transcription:

CS221 Project: Final Report Raiden AI Agent Lu Bian lbian@stanford.edu Yiran Deng yrdeng@stanford.edu Xuandong Lei xuandong@stanford.edu 1 Introduction Raiden is a classic shooting game where the player control a flight to avoid collisions between enemy flights and enemy projectiles. Meanwhile, the player flight can shoot missiles to hit enemy flight to gain points. In a normal game configuration, the player will die as long as it get hit by any enemy objects. Therefore, the player needs to ensure staying alive first and then shoot as many enemies as possible. The difficult part of this game is that the game speed could be so fast such that the player can t make the best movement. In this project, we implemented an AI agent player to play the game automatically to achieve a higher score than a human player. We used AlphaBeta and ExpectiMax Agents to control the agent s movement and feed the agents game state information. We will elaborate our game state features in the following sections. We hard-coded the weights for the feature vector and trained it using TD-learning. 2 Task Definition We built an AI agent that will automatically make the best decision based on the current state of the game. Also, we added the difficulty of this game by introducing intelligent enemy flights play against the player. We implemented 4 different game modes: Human player v.s. normal enemy. AI agent v.s. normal enemy. AI agent v.s. AI enemy. Human player v.s. AI enemy. Our jobs is to implement an AI controlled agent player and an AI enemy. We will introduce the AI enemy in the game description section. 1

3 Game Description 3.1 Player Actions The player can move up, down, left and right to avoid collisions with enemy flights and enemy projectiles. The player can shoot a missile to destroy an enemy flight (enemy projectiles are not destructible). Figure 1: Game Layout Figure 2: Player Shoot 2

3.2 Enemy Actions Enemy flights are always generated at the top of the screen with a random horizontal and vertical speed. Also, the enemy will fire 3 spread projectiles every 0.5 seconds to hit the player. The vertical speed of projectiles is fixed. The horizontal speed of projectiles depends on the horizontal speed of enemy flight. The horizontal position of the middle projectile is always the same as the enemy. The speed of classic enemy flights is fixed whereas the AI enemy will track the player s position and flying directly toward the player. However, the enemy flight can only move downward but not upward. This means when the enemy flight is below the player, the enemy cannot go upward to hit the player. 3.3 Scoring The score is showed on the up-left corner of the game board. The game never ends, so the goal is to get as many points as possible. The player will be dead as long as it hits an enemy flight or enemy projectile. The detailed scoring is as below. +1 points: For staying alive for 1 60 +1000 points: For hitting an enemy flight by missile. -500 points: For firing a missile. seconds (60 points for staying alive for one second). We added a penalty for firing a missile. By enforcing this penalty, firing missiles will cripple the total score if the accuracy is under 50%. The reason for deducting points by firing missile is to prevent abuse of missiles. A human player can get a very high score by keeping shooting missiles. 4 Approach 4.1 Game State Implementation 4.1.1 State member The contents of GameState includes: enemy list: the list of enemy jets. missile list: the list of missiles that player has shot. projectile list: the list of projectiles that enemy shot. currentagent: the agent index that is taking action now. 3

score: the current game state score. Each enemy, missile, projectile in the list includes those fields: speed: the current speed of this object on x-axis and y-axis. position: position are simply recorded as (x,y) pairs. dimension: height and width of the object. Each object has the following functionality: checkcollision: a function that takes another flight as input and return whether they are collided based on their position and dimension. updateflight: a function that update the position of the flight to the successor GameState. 4.1.2 State function The Agent needs the successor state given a agent index and its action, and it also needs the legal actions that an agent can take. So these are the API we implemented in our GameState, getlegalactions: This function takes the agent index as input, and return the legal actions of this agent simply according to its position. More specifically, the legal actions are those will not take the flight out of the board (except for the enemy). generatesuccessor: This function takes the agent index and the action this agent are about to take, and return the successor state of this action. When the index is 0, player moves using the updatef light function, also all the projectiles takes a move. And if the index if for enemy, only the enemy update it s position. islose: Loop over the list of enemy and projectiles, using the checkcollision function in Flight, find if there is a collision between the them and player agent. If there is a collision, return True. iswin: This function always return False, because there s not an end point of the game, the player simply wants more score. The game state cannot be easily copied by deepcopy() in python because of pygame mask issues. So we implemented our own copy of all the objects (including speed, position, etc.) and added the above functionality. 4

4.2 Agent 4.2.1 Search At first, we used a Reflex agent for adjusting game parameters and comparison with advanced agents. The agent simply loops through all the legal actions for the current game state and chooses the action that would lead to the highest score in the successor state. Then we used a simple MiniMax agent to control the player s behavior. The MiniMax agent is basically what we ve used for our Pacman assignment, and we transplant it to fit our game interface. However, when the depth of our agent increaes to 3, the game becomes very slow because our game board is very large (640 640) compared to Pacman s layout, besides we have a lot of computation to do in a single frame. This fact motivates us to use an AlphaBeta agent to speed up the computation, which is also a variant of the Pacman s AlphaBeta pruning algorithm. Since Minimax agent sets a lower bound performance against all adversaries, we thought Expectimax would outstrip Minimax in the easy level of our game, where the enemies move randomly. But it may not get a decent score against our AI enemies. Observing the behavior of our agent, we found that sometimes the agent would fire a missile anyway after detecting its certain death (the game would end whatever action the agent takes). This fact stops the agent from achieving a higher score since the game would soon end after the missile is fired. Thus 500 score is wasted most of the time. Furthermore, the agent must stay still to shoot a missile which sometimes leads to its doom, since it is avoidable by choosing to move. In our later implementation, we modified our agent to make up for these features. In the result section, we compared the performance of Reflex agent, AlphaBeta agent, and Expectimax agent in all game modes. 4.2.2 Search depth After getting the AlphaBeta and Expectimax agent working, We tried different search depths for our agent. Knowing that there is a trade-off between search depth and search speed, we observed a drastically deteriorated game speed as we increased the search depth. The reason is that when we are evaluating the game state we have a lot of computation to do, such as calculating the distances to enemies and projectiles. Therefore, we chose a search depth that would not hurt the game speed and still guarantee a decent accuracy. 4.2.3 Evaluation function The evaluation function is largely based on what a human player will behave under certain circumstances. The first thing we do is to force our agent to stay in the center of the screen. Since staying in the border will eliminate at least one legal action, (for example the agent cannot move left or down if staying in the left corner). Moreover, lingering in the corners will increase the probability of being trapped by enemies and their projectiles. Next we evaluate the number of threats within a certain range, since our agent needs to respond 5

to all enemies and projectiles timely. The strategy is similar to Pacman s strategy, we simply calculate the Euclidean distances between the agent and other enemies and projectiles, and punish the total score if some threats get close. Then we let our agent to react to enemies as early as possible. This is realized by calculating the horizontal distance to enemies and keep a large horizontal distance from them. Moreover, The screen is divided into four pieces. We then count the total number of enemies and projectiles in each piece, and make the agent to fly to the safest zone with the least number of threats. Adding this feature enables the agent to behave more like human player instead of just staying in the bottom dodging enemies and projectiles. The last problem for the agent is evaluate when to shoot missiles. Our strategy is to consider the state at current time step, then calculate whether firing a missile will possibly hit an enemy. Denote the agent s position (x 0, y 0 ), an enemy s position and speed (x 1, y 1 ), (v x1, v y1 ), and the missile s speed (v x, v y ). If the agent fires a missile, the missile will move to (x 0 + v x t, y 0 + v y t) after some time step t, and the enemy will (possibly) move to (x 1 + v x1 t, y 1 + v y1 t) accordingly. The missile and the enemy will collide if: x 0 + v x t (x 1 + v x1 t) < W m + W e 2 y 0 + v y t (y 1 + v y1 t) < H m + H e 2 where W m and H m are missile width and height, W e and H e are enemy width and height. If these two conditions are satisfied, the agent will shoot, otherwise it will never shoot due the cost of shooting missiles. This is a very strong estimation and we observed the agent s shooting accuracy is above 85% in average. 4.3 TD-Learning We tried to use TD-learning to train our weight vector automatically. If we use normal TDlearning formula, the weight vector w gets to infinity really quickly. So we added a regularization item and normalize w to avoid this problem. The formula is like below: w w η {[ ˆVπ (s; w) ( r + γ ˆV )] } π (s ; w) w ˆVπ (s; w) + λw ˆV π (s; w) = w φ(s) w ˆVπ (s; w) = φ(s) w 2 = 1 Here, the feature vector φ(s) is the game state we have described above. The evaluation function will return w φ(s) as the current game state evaluation result. 6

5 Results and Analysis 5.1 General results For each agent, we let it to play 200 times separately against random enemy and AI enemy. We also played the game ourselves for 50 times against each enemy. The results are shown below. Agent # of Games Avg. Score Highest Score STD. Reflex vs. normal enemy 200 1354 6984 2922 Reflex vs. AI enemy 200 1320 7051 3253 AlphaBeta vs. normal enemy 200 6769 39784 5858 AlphaBeta vs. AI enemy 200 5864 30913 5028 Expectimax vs. normal enemy 200 7461 49106 7195 Expectimax vs. AI enemy 200 5261 27227 6316 Player vs. normal enemy 50 3829 8172 2670 Player vs. AI enemy 50 2539 4260 1646 Figure 3: Performance Comparison The results indicate that human players perform significantly worse than our AlphaBeta and Expectimax agent, but also have the lowest standard deviation. Which means that human players have a more consistent performance playing the game, probably because we have our own routines for playing such games. Expectimax agent comes on top for both average score and highest score against normal enemy with a large lead. This is what we expected, since it would definitely outperform AlphaBeta. For AI enemy, AlphaBeta and Expectimax agent have very close performances. 7

The standard deviations of AlphaBeta are less, which is consistent with what we have observed: AlphaBeta agent will barely have close contact with enemies, since it assumes them are always Min enemies; on the other hand, Expectimax agent will sometimes sneak up behind enemies. Furthermore, we expected Expectimax agent would get a much lower score against AI enemy, and our result verifies our expectation. 5.2 TD-learning We have trained the weight vector for thousand rounds of games. But the performance of the trained agent is not as good as our hard-coded agent. Here, we only list the trained ExpectiMax agent performance. Agent # of Games Avg. Score Highest Score STD. Trained ExpectiMax vs. normal enemy 100 2776 4724 1092 Trained ExpectiMax vs. AI enemy 100 1562 2286 557 Figure 4: Player Stuck at Corner We can see that the overall performance of our trained agent is worse than human and our hand-coded agent. The is because the trained agent will keep staying at the one of the two of the top corners of the game board and barely move. 6 Conclusion and Future Work Our agent can outperform human players using our hand-coded evaluation function. The agent can score as high as 49,000 points whereas a human can only score nor more than 10,000 points. When faced with the AI enemy, our agent can achieve an average score of 5,500 points whereas human player can only achieve 2,500 points. 8

Our TD-learning trained weight is not performing very well and makes the agent stuck at a local minimum. One improvement we can use is to change the feature vector to get the TD-learning out of the local minimum. We think it would be better to use a trained weight vector rather than a hard-coded weight vector. Our agent performance is not very stable for now. One improvement we can make is to add more generalizing features to the feature vector to get a more stable performance. References [1] Open source Sky-Fighter: https://github.com/edward344/sky-fighter.git 9