Integrating Learning in a Multi-Scale Agent

Integrating Learning in a Multi-Scale Agent Ben Weber Dissertation Defense May 18, 2012

Introduction AI has a long history of using games to advance the state of the field [Shannon 1950]

Real-Time Strategy Games Building human-level AI for RTS games remains an open research challenge StarCraft II, Blizzard Entertainment

Task Environment Properties Chess StarCraft Taxi Driving Fully vs. partially observable Deterministic vs. stochastic Episodic vs. sequential Fully Partially Partially Deterministic Deterministic* Stochastic Sequential Sequential Sequential Static vs. dynamic Static Dynamic Dynamic Discrete vs. continuous Discrete Continuous Continuous Single vs. multiagent Multi Multi Multi [Russell & Norvig 2009]

Motivation RTS games present complex environments and complex tasks Professional players demonstrate a broad range of reasoning capabilities Human behavior can be observed, emulated, and evaluated [Langley 2011, Mateas 2002]

Hypothesis Reproducing expert-level StarCraft gameplay involves integrating heterogeneous reasoning capabilities

Research Questions What competencies are necessary for expert StarCraft gameplay? Which competencies can be learned from demonstrations? How can these competencies be integrated in a real-time agent?

Overview StarCraft Multi-Scale AI Learning from Demonstration Integrating Learning Evaluation

StarCraft Expert gameplay 300+ APM Evolving meta-game Exhibited capabilities Estimation Anticipation Adaptation [Flash, Pro-gamer]

StarCraft Gameplay Expand Tech Tree Attack Opponent Manage Economy Produce Units

Gameplay Scales in StarCraft Individual Worker harassment Squad Global Aggressive mine placement Support siege line

State Space The following number of states are possible, considering only unit type and location: (Type * X * Y) Units States on a 256x256 tile map: (100*256*256) 1700 > 10 11,500

Decision Complexity The set of possible actions that can be executed at a particular moment: O(2 W (A * P) + 2 T (D + S) + B(R + C)) W number of workers A number of the type of worker assignments P average number of workspaces T number of troops D number of movement directions [Aha et al. 2005]

Decision Complexity The set of possible actions that can be executed at a particular moment: Assumption O(W * A * P + T * D * S + B(R + C)) Unit actions can be selected independently Resulting complexity: Assuming 50 worker units on a 256x256 tile map results in more than 1,000,000 possible actions

StarCraft Complex gameplay Real-world properties Highly-competitive Sources of expert gameplay

Research Question #1 What competencies are necessary for expert StarCraft gameplay?

Multi-Scale AI Multiple scales Actions are performed across multiple levels of coordination Interrelated tasks Performance in each tasks impacts other tasks Real-time Actions are performed in real time

Reactive Planning Provides useful mechanisms for building multi-scale agents Advantages Efficient behavior selection Interleaved plan expansion and execution Disadvantages Lacks deliberative capabilities [Loyall 1997, Mateas 2002]

Agent Design Implemented in the ABL reactive planning language Architecture Extension of McCoy & Mateas integrated agent framework Partitions gameplay into distinct competencies Uses a blackboard for coordination [McCoy & Mateas 2008]

EISBot Managers Strategy Manager Income Manager Production Manager Tactics Manager Recon Manager Gather Resources Construct Buildings Attack Opponent Scout Opponent

Multi-Scale Idioms Design patterns for authoring multi-scale AI Idioms Message passing Daemon behaviors Managers Unit subtasks Behavior locking

Idioms in EISBot Initial_tree Tactics Manager Strategy Manager Income Manager Form Squad Attack Enemy Pump Probes Timing Attack WME Probe Stop WME Squad Monitor Legend Subgoal Daemon behavior Squad Attack Squad Retreat Dragoon Dance Message passing

Multi-Scale AI StarCraft gameplay is multi-scale Reactive planning provides mechanisms for multi-scale reasoning Idioms are applied in EISBot to support StarCraft gameplay

Research Question #2 Which competencies can be learned from demonstrations?

Learning from Demonstration Objective Emulate capabilities exhibited by expert players by harnessing gameplay demonstrations Methods Classification and regression model training Case-based goal formulation Parameter selection for model optimization

Strategy Prediction Tasks Identify opponent build orders Predict when buildings will be constructed 400 Spawning Pool Timing 300 200 100 0 0 4 Game Time (minutes) [Hsieh & Sun 2008]

Approach Feature encoding Each player s actions are encoded in a single vector Vectors are labeled using a build-order rule set Features describe the game cycle when a unit or building type is first produced by a player t, time when x is first produced by P f(x) = { 0, x was not (yet) produced by P

Recall Precision Strategy Prediction Results 1 NNge Boosting Rule Set State Lattice 0.8 0.6 0.4 0.2 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Game Time (minutes)

Strategy Learning Task Learn build-orders from demonstration Trace Algorithm Converts replays to a trace representation Formulates goals based on most similar situation q = argmin c ϵ L distance(s, c) g = s + (q - q) [Ontañón et al. 2010]

Trace Retrieval: Example Consider a planning window of size 2 S =< 3, 0, 1, 1 > T 1 =< 2, 0, 0.5, 1 > T 2 =< 3, 0, 0.7, 1 > T 3 =< 4, 1, 0.9, 1 > T 4 =< 4, 1, 1.1, 2 >

Trace Retrieval: Step 1 The system retrieves the most similar case, q S =< 3, 0, 1, 1 > T 1 =< 2, 0, 0.5, 1 > T 2 =< 3, 0, 0.7, 1 > T 3 =< 4, 1, 0.9, 1 > T 4 =< 4, 1, 1.1, 2 >

Trace Retrieval : Step 2 q is retrieved S =< 3, 0, 1, 1 > T 1 =< 2, 0, 0.5, 1 > T 2 =< 3, 0, 0.7, 1 > T 3 =< 4, 1, 0.9, 1 > T 4 =< 4, 1, 1.1, 2 >

Trace Retrieval : Step 3 The difference is computed: T 4 T 2 = <1,1,0.4,1> S =< 3, 0, 1, 1 > T 1 =< 2, 0, 0.5, 1 > T 2 =< 3, 0, 0.7, 1 > T 3 =< 4, 1, 0.9, 1 > T 4 =< 4, 1, 1.1, 2 >

Trace Retrieval : Step 4 g is computed: S =< 3, 0, 1, 1 > T 1 =< 2, 0, 0.5, 1 > T 2 =< 3, 0, 0.7, 1 > T 3 =< 4, 1, 0.9, 1 > T 4 =< 4, 1, 1.1, 2 > g = s + (T 4 T 2 ) = <4, 1, 1.4, 2>

Prediction Error (RMSE) Strategy Learning Results 14 Opponent modeling with a window size of 20 12 10 8 6 4 Null IB1 Trace MultiTrace 2 0 0 10 20 30 40 50 60 70 80 90 100 Actions performed by player

State Estimation Task Estimate enemy positions given prior observations Particle Model Apply movement model Remove visible particles Reweight particles [Thrun 2002, Bererton 2004]

Parameter Selection Free parameters Trajectory weights Decay rates State estimation is represented as an optimization problem Input: parameter weights Output: particle model error Replays are used to implement a particle model error function

Threat Prediction Error State Estimation Results 160 140 120 100 80 60 40 20 0 0 2 4 6 8 10 12 14 16 18 Game Time (Minutes) Null Model Perfect Tracker Default Model Optimized Model

Learning from Demonstration Anticipation Classification and regression models Adaptation Case-based goal formulation Estimation Model optimization

Research Question #3 How can these competencies be integrated in a real-time agent?

Agent Architecture

Integration Approaches Augmenting working memory External Components External plan generation External goal formulation Working Memory

Augmenting Working Memory Supplementing working memory with additional beliefs

External Plan Generation Generating plans outside the scope of ABL

External Goal Formulation Formulating goals outside the scope of ABL

Goal-Driven Autonomy A framework for building self introspective agents GDA agents monitor plan execution, detect discrepancies, and explain failures Implementations Hand-authored rules Case-based reasoning [Molineaux et al. 2010, Muñoz-Avila et al. 2010]

GDA Subtasks Expectation generation Discrepancy detection Explanation generation Goal formulation

Implementation

Integrating Learning ABL agents can be interfaced with external learning components Applying the GDA model enabled tighter coordination across capabilities EISBot incorporates ABL behaviors, a particle model, and a GDA implementation

Evaluation Claim Reproducing expert-level StarCraft gameplay involves integrating heterogeneous reasoning capabilities Experiments Ablation studies User study

GDA Ablation Study Agent configurations Base Formulator Predictor GDA Free parameters Planning window size Look-ahead window size Discrepancy period Discrepancies Explanations Goals Discrepancy Detector Explanation Generator Goal Formulator Goal Manager

GDA Results Overall results from the GDA experiments Agent Win Ratio Base 0.73 Formulator 0.77 Predictor 0.81 GDA 0.92

User Study Experiment setup Matches hosted on ICCup 3 trials Testing script 1. Launch StarCraft 2. Connect to server 3. Host match 4. Announce experiment [Dennis Fong, Pro-gamer]

ICCup Score Performance on Tau Cross 2000 1500 1000 500 Base Formulator Predictor GDA 0 0 10 20 30 40 50 Number of Games Played

ICCup Results Agent Longinus Python Tau Cross Overall Base 942 599 669 737 Formulator 980 718 1078 925 Predictor 1111 555 1145 937 GDA 952 860 1293 1035

EISBot Ranking Rankings achieved by the complete GDA agent Trial Longinus Python Tau Cross Average Percentile Ranking 32 nd 8 th 66 th 48 th

Evaluation Ablation Studies Optimized particle model Complete GDA model Integrating additional capabilities into EISBot improved performance EISBot performed at the level of a competitive amateur StarCraft player

Conclusion Objective Identify and realize capabilities necessary for expert-level StarCraft gameplay in an agent Approach Decompose gameplay Learn capabilities from demonstrations Integrate learned gameplay models Evaluate versus humans and agents

Contributions Idioms for authoring multi-scale agents Methods for learning from demonstration Integration approaches for ABL agents

Integrating Learning in a Multi-Scale Agent Ben G. Weber Ph.D. Candidate Expressive Intelligence Studio bweber@soe.ucsc.edu Funding NSF Grant IIS 1018954

References Aha, Molineaux, & Ponsen. 2005. Learning to Win: Case-Based Plan Selection in a Real-Time Strategy Game, Proceedings of ICCBR. Bererton. 2004. State Estimation for Game AI using Particle Filters, Proceedings of AAI Workshop on Challenges in Game AI. Hsieh & Sun. 2008. Building a Player Strategy Model by Analyzing Replays of Real-Time Strategy Games, Proceedings of IJCNN. Langley. 2011. Artificial Intelligence and Cognitive Systems, AISB Quarterly. Loyall. 1997. Believable Agents: Building Interactive Personalities, Ph.D. thesis, CMU. Mateas. 2002. Believable Agents: Building Interactive Personalities, Ph.D. thesis, CMU.

References McCoy & Mateas. 2008. An Integrated Agent for Playing Real-Time Strategy Games, Proceedings of AAAI. Molineaux, Klenk, Aha. 2010. Goal-Driven Autonomy in a Navy Strategy Simulation, Proceedings of AAAI. Muñoz-Avila, Aha, Jaidee, Klenk, Molineaux. 2010. Applying Goal Driven Autonomy to a Team Shooter Game, Proceedings of FLAIRS. Ontañón, Mishra, Sugandh, Ram. 2010. On-line Case-Based Planning, Computational Intelligence. Russell & Norvig. 2009. Artificial Intelligence: A Modern Approach. Shannon. 1950. Programming a Computer for Playing Chess, Philosophical magazine. Thrun. 2002. Particle Filters in Robotics, Proceedings of UAI.