an AI for Slither.io

Similar documents
Mastering the game of Go without human knowledge

CS221 Final Project Report Learn to Play Texas hold em

VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL

Swing Copters AI. Monisha White and Nolan Walsh Fall 2015, CS229, Stanford University

Playing Atari Games with Deep Reinforcement Learning

Mutliplayer Snake AI

A Deep Q-Learning Agent for the L-Game with Variable Batch Training

Reinforcement Learning Agent for Scrolling Shooter Game

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Training a Minesweeper Solver

Creating a Poker Playing Program Using Evolutionary Computation

CandyCrush.ai: An AI Agent for Candy Crush

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

Playing CHIP-8 Games with Reinforcement Learning

Proposal and Evaluation of System of Dynamic Adapting Method to Player s Skill

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017

Deep Learning for Infrastructure Assessment in Africa using Remote Sensing Data

Using Artificial intelligent to solve the game of 2048

CS221 Project Final Report Gomoku Game Agent

Automated Suicide: An Antichess Engine

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

ConvNets and Forward Modeling for StarCraft AI

Playing FPS Games with Deep Reinforcement Learning

Playing Geometry Dash with Convolutional Neural Networks

CSE 258 Winter 2017 Assigment 2 Skill Rating Prediction on Online Video Game

Game Artificial Intelligence ( CS 4731/7632 )

Heads-up Limit Texas Hold em Poker Agent

Evaluating Persuasion Strategies and Deep Reinforcement Learning methods for Negotiation Dialogue agents

Playing Angry Birds with a Neural Network and Tree Search

Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN

Combining tactical search and deep learning in the game of Go

Learning from Hints: AI for Playing Threes

CS 354R: Computer Game Technology

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente

Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

CS221 Project Final Report Automatic Flappy Bird Player

Artificial Intelligence

Software Development of the Board Game Agricola

Agenda Artificial Intelligence. Why AI Game Playing? The Problem. 6. Adversarial Search What To Do When Your Solution is Somebody Else s Failure

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Five-In-Row with Local Evaluation and Beam Search

AI Agent for Ants vs. SomeBees: Final Report

Bachelor thesis. Influence map based Ms. Pac-Man and Ghost Controller. Johan Svensson. Abstract

Comp 3211 Final Project - Poker AI

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Augmenting Self-Learning In Chess Through Expert Imitation

Spatial Average Pooling for Computer Go

CS221 Project: Final Report Raiden AI Agent

SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB

Gridiron-Gurus Final Report

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

arxiv: v1 [cs.lg] 30 May 2016

arxiv: v1 [cs.lg] 7 Nov 2016

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

Botzone: A Game Playing System for Artificial Intelligence Education

Playing Othello Using Monte Carlo

Artificial Intelligence. Minimax and alpha-beta pruning

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Artificial Intelligence

Presentation Overview. Bootstrapping from Game Tree Search. Game Tree Search. Heuristic Evaluation Function

Optimal Yahtzee performance in multi-player games

Hierarchical Controller for Robotic Soccer

Transfer Deep Reinforcement Learning in 3D Environments: An Empirical Study

MEI Conference Short Open-Ended Investigations for KS3

Crowd-steering behaviors Using the Fame Crowd Simulation API to manage crowds Exploring ANT-Op to create more goal-directed crowds

AlphaGo and Artificial Intelligence GUEST LECTURE IN THE GAME OF GO AND SOCIETY

HUJI AI Course 2012/2013. Bomberman. Eli Karasik, Arthur Hemed

2048: An Autonomous Solver

CS 4700: Foundations of Artificial Intelligence

the gamedesigninitiative at cornell university Lecture 6 Uncertainty & Risk

Bootstrapping from Game Tree Search

For slightly more detailed instructions on how to play, visit:

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN

Tetris: A Heuristic Study

Andrei Behel AC-43И 1

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS

CS 188: Artificial Intelligence

Extending the STRADA Framework to Design an AI for ORTS

Reinforcement Learning Applied to a Game of Deceit

Dota2 is a very popular video game currently.

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

Neural Networks for Real-time Pathfinding in Computer Games

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

The Art of Neural Nets

A RESEARCH PAPER ON ENDLESS FUN

FATE WEAVER. Lingbing Jiang U Final Game Pitch

City Research Online. Permanent City Research Online URL:

Predicting Video Game Popularity With Tweets

Deep Green. System for real-time tracking and playing the board game Reversi. Final Project Submitted by: Nadav Erell

LEARNABLE BUDDY: LEARNABLE SUPPORTIVE AI IN COMMERCIAL MMORPG

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen

Inference of Opponent s Uncertain States in Ghosts Game using Machine Learning

CMSC 671 Project Report- Google AI Challenge: Planet Wars

Transcription:

an AI for Slither.io Jackie Yang(jackiey) Introduction Game playing is a very interesting topic area in Artificial Intelligence today. Most of the recent emerging AI are for turn-based game, like the very popular Alpha Go (Silver et al. 2016), or are usually a solo or few palyers game, like a breakout AI (Mnih et al. 2015). However, there are many other interesting games that does not falls into this category, like Star Craft. They usually have a realtime game play experience with potentially massive amount of player participate in the game, which makes the game a lot more harder to solve. In this project, I m focusing on a very confined and easily modeled game with all these under explored features. Slither.io is a massively multiplayer browser game developed by Steve Howse ( Slither.io on the App Store ). Player controls a snake by changing its moving direction and eat food, represented by colored dots on the map to grow larger, which is the target of this game. Meanwhile, when one player s snake runs into other s player s snake, the snake would die and is converted into food dots. The game arose intense competitation between players who tries to let other players snakes run into their bodies. Task Definition For this project, the object is to create an AI that can control the snake (as showned as snake in center in figure 1) and keep eating and avoid danger. Specifically, the input would be all the food around the snake avatar (as showned as colored dots in figure 1 around the user s snake) and all the danger (nearby snakes) around the snake, while the output would be the direction of where the snake should go. The metric of success woule be the average maxium length that the snake controled by AI. Baseline and future improvement I consider a baseline of this project would be a snake wander around in random directions. An oracle for this project is quite straight forward, which is to become 1

Figure 1: Slither.io the longest player in the arena. There is a realtime learderboard in the game which includes the top 10 longest players currently playing in the arena. Related Work I have found some related work in the effort of creating a Slither.io AI. Slither.iobot is a very popular implementation of a Slither.io AI. It uses a rule-based way to give a action for the snake. The rule of this AI can be divided into two part, food finding and collision avoiding. The food finding is basically find the nearest food and head towards it. The collision is that once the bot detected any other snakes get into a circle defined by a static parameter, it would turn the other way to avoid collision. Currently, the rule-based AI is no where near other human players. the major reaosn is that the rule-based AI can be easily killed by others snakes who surrounded the AI with their bodies. As long as the surround circle is larger than that threshold of the rule-based AI, the AI wou t react to that action. My solution is to use a machine learning algorithm to dynamically tune that parameter to avoid this situation without becoming too timid. This training process could be done using reinforcement learning algorithm like Q-learning and I would go for simple classifier like linear classifier to make the AI runs fast. 2

Infrastructure The infrastructure running this AI would be a browser, a browser automation script and the AI, which is built already. The input data is like this: collisionpoints = '0' : { 'xx' : 27557.36809910213, 'yy' : 30773.149294393803, 'snake' : which shows all the obstacles and also the length of current snake. The output are just the parameter for the rule-based AI. Approach (new) As I working on this project throughtout the semaster, I graduately realize that this is a rather difficult problem to tackle with. I have tried quite a few approaches to make the AI. I will first discuss the general infrastructure for all of these designs and then describe each one of my AI that I have tried. General The general approach I planed to solve this problem is Q-learning with nerual network as stated (Mnih et al. 2015). The Q-learning algorithm is shown as follows: where ˆQ opt (s, a) (1 η) ˆQ opt (s, a) +η(r + γ }{{} ˆV opt (s ))] }{{} prediction target ˆV opt (s ) = max ˆQ opt (s, a ) a Actions(s ) As the s have a much larger space than as in mose Q-learning algorithm, I want to replace the list Q opt (s, a) with a nerual network. Then the Q opt (s, a) should be updated with stochastic gradient descent, therefore, the algorithm is showned as follows: train ˆQ opt with input of (s, a) and the result of(1 η) ˆQ opt (s, a) +η(r + γ }{{} ˆV opt (s ))] }{{} prediction target where 3

ˆV opt (s ) = max ˆQ opt (s, a ) a Actions(s ) However, there are still serveral difficulties to solve: 1. The competing-with-human nature of this game makes the game play really slow, how to speed up the training? 2. How to produce a continious action with only discrete policy from Q- learning. 3. How to select a proper feature vector. The first problem is a shared problem for all of the AI I designed. The human part of this game makes game play really slow and significantly slowed down the training. For example, if we consider 0.5 second in game as one turn for the AI, then we can only generate a total of 172800 = 1 60 60 24/0.5 samples in a day. I solve this problem letting the AI play serveral slither.io simutaneously with a shared predictor. All of those AI evaluate the situation and choose action using a single predictor. In the mean while, they give all the feedback to the same predictor to improve the policy (actually, the function estimation of Q) collborately. Another problem is that the training process is slow, however, because of the online, multiplayer nature of slither.io, we cannot pause the game while the predictor is training. I solve this problem by building 2 identical classifier, the game use one of them to predict the best action and use another to receive feedback in another thread. After a epoch, the program swap those 2 model in a atomic way to avoid hazard in threads and then quickly copy all the new parameter to the old model in a asynchronous way, and the latest-trained model can be used to produce the best action and the just updated model can be used to accept new feedbacks. Using this parallel method, the bottleneck of human players can be overcomed without building a self-competing system which can be inaccurate and does not represent how human player plays this game. For the rest of the problems, I have designed serveral AIs to tackle them: Rule-based AI with Q-learning parameter tuning This idea emerged when I m thinking ways to solve the second problem: how might I play a continious game while Q-learning is turn-based. I figured that I can use a rule-based to do continious control and use a Q-learning to tune the parameters of that rule-based AI. So I build a AI that have rule-based AI could be used to tackle short-term strategy and a reinforcement-learning-based AI could be used to give high-level prediction about high-level strategy. Namely, we adopt the same rule-based AI as described in related work and extracted 2 parameter inside that AI as a high-level instruction: radiusavoidsize and fastresponsesize, which guide how near should the snake to avoid possible danger and how near should the snake to do emergency maneuver to avoid 4

prominent danger. Notably, I divided game play to 2 seconds turn for the Q-learning AI. To design a proper feature vector, I focus on represent the game status as well as possible for the AI to deside what should those parameters be like. So I select the most important factor to make that decision: the distance of first 10 nearest dangers and their direction. As the snake grow larger, it might make sence that the snake should avoid danger in a bigger range as it is harder to turn a large snake around. So I also include snake length into the input factor, as both an factor for the nerual-network predictor and also a approach to calculate the reward. Q-learning AI with hand extracted feature and nerual network function approximation After a few days of training and exploring, I found the improvement is not that prominent and it is not easy to distinguish the improvement because of the many uncertainties in each game-play. I choose to build an AI from ground up using Q-learning. To make the AI as responsive as possible, I reduce the turn time for the AI to 0.5 seconds. With the experience of there previous AI and another tens of days of training, I came up this this feature vector. This new feature vector not only takes food locations into consideration, but also greatly improved the interpretability. I figured that in the previous implementation, the function approximation would have a hard time found out the relationship between the group of dangers directions and their distances. In the new design, I divide the map into 16 directions from the snake and I build a array consists of the distance of the thread from those 16 directions. To futher improve the AI, I build a matrix shows the relationship between those 16 direcitons, which is whether the nearest danger in each of those 2 direction is the same. In this way, the AI would have a idea of where is the danger and where each snakes are. During the training, I also noticed that the AI sometimes tries to kill other snakes, however, because lack of information, they sometimes go for their tail instead of their head. So I added another 16-elements boolean vector to show whether that danger in that direction is a snake head. For the food vector, I futher divide each one of the 16 directions into 4 regions according to their distances. So that I can build a 64-element vector shows the amount of food in different direction and different ranges of distances. 5

Q-learning AI with raw image input and convolutional neural networks Although I tried really hard on hand-picking those feature vector in the previous design of a slither.io AI. the AI is still not as informed as a human player. For example, the picture showed in the game play shows whether the user is in a snake-clowded area or not. This piece of information is very useful to human players to decide whether to rush wildly or play cautiously. To tackle this problem, I chose to refer to the method mentioned in the breakout AI (Mnih et al. 2015) paper, directly feeding the raw imagies to the Q-learning algorithm. Similar to the method mentioned in that paper, I use a convolutional nerual network to take advantage of the 2-d matrix shaped data, and also stacked 4 frames in a row and feed it to the machine-learning algorithm to get a better sence of velocity. Results The result is showed as follows, I uses length and turns as metrics: Baseline Rule-based AI Method 1 Method 2 Method 3 Length average 13.49 948.18 979.98 39.03 13.92 Length stdev 8.98 1303.60 879.30 55.10 5.22 Turns average 59.09 300.30 703.39 55.40 63.56 Turns stdev 109.34 343.31 741.14 73.05 89.08 Note that the Rule-based AI and Method 1 have longer turns of 2 sec, other turns are all 0.5 sec. It seems that the Method 1 did improve the performance of Rule-based AI, however it is quite surprising to see that the Method 2 and 3 performed poorly. I assume this is due to the complexity of this game. The Q-learning based AI is very unlikely to discover that rushing to a cluster food is a good idea, as going to a cluster of food will usually caused the not well-performed AI to be killed, which yield in a very high penalty. In the mean while, a well-trained Q-learning AI is very timid, I observe that the trained AI in method 2 and 3 often rush to the corner of the map and stays there. They does not have the chance to walk to a crowded area and fight for food with other snakes just because of low possiblity of random exploration. 6

Discuss During the training, I found 2 figure very intriguing. Figure 2 shows the loss function in terms of training epoch, while figure 3 shows the length of the snake when died with a 100 elements sliding window during the Method 3 training. Figure 2: Loss and training epoches Figure 3: Length of each death This result shows that although the loss function keeps reducing, the average score does not keeps increasing. I assume that this proves my point above. The snake have better awareness of the situation, it knows that crowded places are dangerous. However, it did not have enough chance to try out different eating tactic and just given up all the crowded places. I think this explain the bad 7

performance of both method 2 and 3. I can think of a few very preliminary idea to solve this problem in the future: 1. Learn from human players: Let the AI watch human players play for a couple of rounds. Hopefully, the AI would learn from the organized tactic of human player and recognize that the crowded area are quite profitable. 2. Learn from opponents: Override the webpage and let it render the scene not only from the snake that the bot controls but also other opponents. Shows the situation and movement of other plays and train the function approximation of Q with that. Hopefully, the bot can learn from other s experience. 3. Learn from itself: Build a private slither.io server with only bot in it. As they all have very bad tactics, the crowded area might not be as dangerous. Then, the AI might be able to try out more tactics instead of hidding. Reference Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, et al. 2015. Human-Level Control Through Deep Reinforcement Learning. Nature 518 (7540). Nature Publishing Group: 529 33. Silver, David, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, et al. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 529 (7587). Nature Publishing Group: 484 89. Slither.io on the App Store. id1091944550?mt=8. https://itunes.apple.com/us/app/slither.io/ 8