Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael

Similar documents
Applying Modern Reinforcement Learning to Play Video Games

Department of Computer Science and Engineering. The Chinese University of Hong Kong. Final Year Project Report LYU1601

Reinforcement Learning Agent for Scrolling Shooter Game

Playing CHIP-8 Games with Reinforcement Learning

Hacking Reinforcement Learning

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault

DeepMind Self-Learning Atari Agent

CS 229 Final Project: Using Reinforcement Learning to Play Othello

Experiments with Tensor Flow Roman Weber (Geschäftsführer) Richard Schmid (Senior Consultant)

ConvNets and Forward Modeling for StarCraft AI

Decision Making in Multiplayer Environments Application in Backgammon Variants

CS 188: Artificial Intelligence

Game AI Challenges: Past, Present, and Future

An Artificially Intelligent Ludo Player

AI in Games: Achievements and Challenges. Yuandong Tian Facebook AI Research

Tutorial of Reinforcement: A Special Focus on Q-Learning

Genbby Technical Paper

Playing Atari Games with Deep Reinforcement Learning

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm

Andrei Behel AC-43И 1

POKER AGENTS LD Miller & Adam Eck April 14 & 19, 2011

Learning to Play Love Letter with Deep Reinforcement Learning

Robotics at OpenAI. May 1, 2017 By Wojciech Zaremba

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

Large-Scale Platform for MOBA Game AI

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Success Stories of Deep RL. David Silver

Swing Copters AI. Monisha White and Nolan Walsh Fall 2015, CS229, Stanford University

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017

arxiv: v1 [cs.ne] 3 May 2018

Computing Elo Ratings of Move Patterns. Game of Go

Computer Go: from the Beginnings to AlphaGo. Martin Müller, University of Alberta

Artificial Intelligence and Deep Learning

Interior Design with Augmented Reality

CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9

arxiv: v2 [cs.ai] 30 Oct 2017

Evolutionary Computation for Creativity and Intelligence. By Darwin Johnson, Alice Quintanilla, and Isabel Tweraser

Intelligent Non-Player Character with Deep Learning. Intelligent Non-Player Character with Deep Learning 1

Reinforcement Learning

Game Artificial Intelligence ( CS 4731/7632 )

Extending the STRADA Framework to Design an AI for ORTS

Department of Computer Science and Engineering The Chinese University of Hong Kong. Year Final Year Project

Game-playing: DeepBlue and AlphaGo

Monte Carlo Tree Search

Gridiron-Gurus Final Report

Heads-up Limit Texas Hold em Poker Agent

Game Design Verification using Reinforcement Learning

Texas Hold em Inference Bot Proposal. By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks

Deep RL For Starcraft II

Ansible + Hadoop. Deploying Hortonworks Data Platform with Ansible. Michael Young Solutions Engineer February 23, 2017

Looking ahead : Technology trends driving business innovation.

Adjustable Group Behavior of Agents in Action-based Games

Federico Forti, Erdi Izgi, Varalika Rathore, Francesco Forti

CS221 Project Final Report Gomoku Game Agent

A Bayesian Model for Plan Recognition in RTS Games applied to StarCraft

USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES

A Bayesian rating system using W-Stein s identity

Human Level Control in Halo Through Deep Reinforcement Learning

TUD Poker Challenge Reinforcement Learning with Imperfect Information

Playing Geometry Dash with Convolutional Neural Networks

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Practical Big Data Science

AGENTLESS ARCHITECTURE

CS 188: Artificial Intelligence Fall AI Applications

Twelve Types of Game Balance

Reinforcement Learning Applied to a Game of Deceit

*Please see course page for full description and additional details.

10703 Deep Reinforcement Learning and Control

Artificial Intelligence ( CS 365 ) IMPLEMENTATION OF AI SCRIPT GENERATOR USING DYNAMIC SCRIPTING FOR AOE2 GAME

Mission-focused Interaction and Visualization for Cyber-Awareness!

RoboCup. Presented by Shane Murphy April 24, 2003

Genbby: Disruptive decentralized ecosystem in the gaming industry

Solving Coup as an MDP/POMDP

Viking Chess Using MCTS. Design Document

Attack of Township. Moniruzzaman, Md. Daffodil International University Institutional Repository Daffodil International University

Automated Suicide: An Antichess Engine

Challenges in Transition

Scalable and Lightweight CTF Infrastructures Using Application Containers

CSC321 Lecture 23: Go

It s Over 400: Cooperative reinforcement learning through self-play

Outcome Forecasting in Sports. Ondřej Hubáček

2018 Avanade Inc. All Rights Reserved.

A. Rules of blackjack, representations, and playing blackjack

Learning and Using Models of Kicking Motions for Legged Robots

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

Predicting Army Combat Outcomes in StarCraft

Bayesian Networks for Micromanagement Decision Imitation in the RTS Game Starcraft

ANSIBLE AUTOMATION AT TJX

Computing Science (CMPUT) 496

Predicting outcomes of professional DotA 2 matches

The Principles Of A.I Alphago

Learning Artificial Intelligence in Large-Scale Video Games

Estimation of player's preference fo RPGs using multi-strategy Monte-Carl. Author(s)Sato, Naoyuki; Ikeda, Kokolo; Wada,

TGD3351 Game Algorithms TGP2281 Games Programming III. in my own words, better known as Game AI

Transcription:

Applying Modern Reinforcement Learning to Play Video Games Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael

Outline Term 1 Review Term 2 Objectives Experiments & Results Online Evaluation Platform Future Work

Term 1 Review - Background Reinforcement learning is learning what to do - Prof. Richard S. Sutton Often modelled as Markov Decision Processes S: a finite set of states. A: a finite set of actions. T(s' s, a): Transition model R_a(s, s'): Reward model γ: future discounted factor Objective Maximize discounted future reward

Term 1 Review - Motivation Explore the boundary of modern RL Selected a challenging, unexplored and meaningful video game Why video game? Why is it meaningful? "At DeepMind, our mission is to solve intelligence and use that to solve complex real world problems, but in order to do that, we need to test our algorithmic ideas in challenging environments." - BlizzCon on DeepMind x Starcraft II

Term 1 Review - Little Fighter 2 LF2 Developed by CUHK Alumni Visual fighting game Very popular in HK Game HP & MP 7 keys, {up, down, left, right, attack, jump, defense} Special abilities for each character, triggered by key sequences Exploitable game objects

Term 1 Review - Methods NeuroEvolution of Augmenting Topologies NEAT Proposed in 2002 Evolutionary method Deep Q-Network DQN Proposed in 2014 Value iteration method Actor Critic using Kronecker-Factored Trust Region ACKTR Proposed in 2017 Actor critic method

Term 1 Review - Summary Implemented game environment Experimented RL algorithms Experimented different feature extractions, reward shaping Experimented various training curriculum Demo: https://www.youtube.com/watch?v=1lpvosnhaxe

Term 2 Objectives Focus on what worked AlphaGo-style self play (the proper way) Feature Augmentation Frame Stacking Action History Online AI Evaluation Platform

Experiments & Results - Overview Phase 1: Static agent task Phase 2: In-game AI Phase 3: Self play Phase 4: Proper self play Phase 5: Feature Augmentation

Proper self play Motivation Inspired by AlphaGo Continuous learning -> more general strategy Avoid catastrophic forgetting Symmetric breaking Solution: Opponent sampling Create snapshot agent every K steps Switch opponent every Q steps

Proper self play - Result Tested on MLP-DQN on various parameters Double 128 - best (K, Q) = (50000, 10) Triple 256, the best combination of (K, Q) = (100000, 20) At first glance, not much difference?

Proper self play - Result Naive self play vs In-game AI 1 Weird and uninteresting policy Proper self play vs In-game AI 1 General playing style Diverse skill - tracking, jump kick, tackling Aggressive

Proper self play - Result Tested on MLP-ACKTR Significant improvement Most general self play agent 00:00

Feature Augmentation Frame Stacking Action History

Frame Stacking Motivation Inspired by DQN original paper Capture dynamic information Necessary for some Atari games Implementation Environment wrapper Maintain a state deque of size of 4

Frame Stacking - Result & Analysis In-game AI 0 No observable positive effects In-game AI 1 In-game AI 2

Frame Stacking - Result & Analysis Information gain is too sparse Too much redundancy within frames Does not worth 4x dimensionality

Action History Motivation Inspired by aleju/mario-ai project Improve action coordination Special attacks discovery Implementation Environment wrapper Maintain an action history deque of size of k Append k one-hot vectors into state

Action History - Result & Analysis In-game AI 0 In-game AI 1 Deeper topology does not help Action-2: Better against in-game AI 0, 1 Action-4: Significantly better in in-game AI 0 In-game AI 2

Action History - Result & Analysis Action-2 vs In-game AI 1 Learned an entirely different policy One-Turn-Kill Fastest strategy against in-game AI 1 Action-4 vs In-game AI 0 Fire blast special attack Win rate: 50% Best DQN agent against in-game AI 0

Action History - Result & Analysis Improve action coordination Special attacks discovery A tradeoff between dimensionality and the above

Online AI evaluation platform Motivation Cannot objectively measure AI skills Benchmark with a fixed set of in-game AI led to biased comparison Performance against other RL agents could be unrepresentative Idea: Online platform for human to interact with the RL agent Key problems Data collection is very expensive Users come and go with various skills

Features Accurate rating prediction with sparse data Matchmaking Concurrent game sessions management Error Tolerance Low latency Informative UI

Trueskill A modern rating algorithm Applications Microsoft Research (Cambridge, UK) Bayesian inference Significant improvement over Elo More data efficient XBox Live OpenAI Dota AI tournament Rating structure The mean skill of the player: μ The degree uncertainty: σ

Technology Stack Frontend Language: ECMAScript 2015 (ES6) Framework: VueJS 2.0 CSS Library: Vuestic Admin Module bundler: Webpack Backend Language: Python 3 Framework: Flask Trueskill API

Deployment Google Cloud Platform Zone: Taiwan n1-standard-2 2 Virtual CPUs 7.5GB Memory 30GB SSD Storage Docker OS-level virtualization Painless deployment Designed two Docker images

Demo time http://104.199.146.210:8080/#/dashboard

Future Work - Diversify play style Motivation Agents doesn't use special abilities (except one trained ACKTR agent) No information in features regarding special abilities Limited dynamics Ideas Deep Recurrent Q-Network (DRQN)

Future Work - Launching online AI evaluation platform Motivation Collect real data Milestones: Pilot testing Load test Promotion

Q&A

In-game AI task - Provided targets In-game AI 0 Uses all special abilities Good at close and long range Unfair comparison Challenging to mid level player In-game AI 1 Move away from target Launch jump kicks from angles Challenging to mid level player In-game AI 2 Mainly close range Move back and forth and attack Challenging to amatuer level player