Charles University in Prague. Faculty of Mathematics and Physics BACHELOR THESIS. Pavel Šmejkal

Size: px

Start display at page:

Download "Charles University in Prague. Faculty of Mathematics and Physics BACHELOR THESIS. Pavel Šmejkal"

Marcia Skinner
5 years ago
Views:

Charles University in Prague Faculty of Mathematics and Physics BACHELOR THESIS Pavel Šmejkal Integrating Probabilistic Model for Detecting Opponent Strategies Into a Starcraft Bot

1 Charles University in Prague Faculty of Mathematics and Physics BACHELOR THESIS Pavel Šmejkal Integrating Probabilistic Model for Detecting Opponent Strategies Into a Starcraft Bot Department of Software and Computer Science Education Supervisor of the bachelor thesis: Mgr. Martin Černý Study programme: Computer science Specialization: General computer science Prague 2016

2 I declare that I carried out this bachelor thesis independently, and only with the cited sources, literature and other professional sources. I understand that my work relates to the rights and obligations under the Act No. 121/2000 Coll., the Copyright Act, as amended, in particular the fact that the Charles University in Prague has the right to conclude a license agreement on the use of this work as a school work pursuant to Section 60 paragraph 1 of the Copyright Act. In... date... signature

3 Děkuji Martinovi Černému za kreativní vedení, výdrž a za to že si vždycky našel čas. Ivanovi Holobradému za konzultaci a názor profesionálního hráče. A v neposlední řadě Janě Šmejkalové a celé svojí rodině za pomoc a podporu.

4 Title: Integrating Probabilistic Model for Detecting Opponent Strategies Into a Starcraft Bot Author: Pavel Šmejkal Department of Software and Computer Science Education Supervisor of the bachelor thesis: Mgr. Martin Černý, Department of Software and Computer Science Education Abstract: Recent research in artificial intelligence (AI) for real time strategies (RTS) has shown a great need for a computer controlled agent (bot) to be able to adapt its strategy in response to opponent s actions. While some progress has been made in detecting opponent s strategies offline, there has not been much success in using this information to guide in-game decisions. We present a version of UAlbertaBot enhanced by existing probabilistic algorithm for supervised learning from replays and strategy prediction. Bot that adapts its strategies has proved to be superior to a random bot as we show in simulated StarCraft: Brood War AI tournament. Our work exposes the importance of scouting and strategy adaptation. By further improvement of strategies, a bot capable of competing with human players may be created. Keywords: RTS, StarCraft, probabilistic model, opponent modelling, AI

5 Contents 1 Introduction RTS background Introduction to RTS RTS match schema RTS Specifics Scouting in RTS Why are RTS games interesting Related work High-level decision making Scouting General approach Motivation Strategy prediction A brief introduction to Bayesian networks Prediction model Datasets Model improvement Countering Advanced scouting When to scout - Information value decision making How to scout Implementation Bot architecture High level decision making Prediction manager Strategy manager Scouting Map grid Advanced scouting manager... 39

6 5.3.3 Scouting unit Experiments Model improvement Dataset and clustering Solutions Scouting model Bot in game evaluation Conclusions and future work Future work Bibliography List of Tables List of Abbreviations List of figures Attachments CD / Online zip attachment structure

7 1 Introduction In contrast to traditional games as chess and Go where even the best human players are nowadays beaten by the top artificial intelligence (AI) agents, in real time strategy (RTS) games even average human players dominate current top AI agents. In games we usually call these agents bots. An RTS is a two-player video game where the goal of each player is to eliminate opponent player s units and structures. One accomplishes that by gathering resources (usually with weak civil units), expanding one s base, building army and advancing one s technology. This general scheme is applicable to vast majority of RTS games. To name some common examples: Age of Empires II 1, Dune II 2, Planetary Annihilation 3, Warcraft III 4, StarCraft II 5 and many others. This means that most of the research done on one of them is easily applicable to many other RTS games due to their similarity. The research in RTS games is also motivated by regular RTS competitions mainly in Blizzard Entertainment s 1998 game StarCraft: Brood War (StarCraft) 6. These are for example Artificial Intelligence and Interactive Digital Entertainment (AIIDE) StarCraft AI competition 7 or IEEE CIG StarCraft AI competition 8, as discussed in (Ontanon, Synnaeve, Uriarte, Richoux, Churchill, Preuss 2013). StarCraft is still used for research purposes mainly because we do not have access to a source code of any RTS game of similar quality. Therefore, if we wanted to build a complex bot to one, we would need to reverse engineer the game and read/write directly into its memory. The reverse engineering is very difficult and takes a long time to do properly. Fortunately, for StarCraft this has been done and a

8 library for communicating with the game called BWAPI 9 is freely available for everyone. Another way is to use a scripting language available for example in the StarCraft II as done in (Stiegler, Livingstone 2013). Despite impressive looking results like Automaton this environment is not sufficient for complex algorithms currently used in StarCraft: Brood War. As we said earlier human players still play much better than bots. This is mainly due to the complexity and diversity of tasks one must perform to play an RTS game competitively. This includes spatial and temporal reasoning, prediction of the opponent s actions, adaptation to current situation, planning, micromanagement of units (moving individual units e.g. to setup battle formation or to avoid enemy fire) or scouting (gathering information by sending units to enemy territory). We also cannot forget that we are talking about real time games and therefore the actual reasoning must be done in real time. Most modern games update 24 to 60 times a second and every one of these frames (even if slightly) changes the game state. In each of these frames both the player and the opponent can issue new orders to their units. The resulting branching factor of RTS games is about 10 6 in contrast to chess with approximately 35 or Go with about 360 (Synnaeve 2012). RTS games are therefore beyond scope of any simple search approach and we need to decompose the problem and solve one sub problem at a time (Ontanon, Synnaeve, Uriarte, Richoux, Churchill, Preuss 2013). Although research has been done in many of these sub problems such as micromanagement of units (Churchill, Buro 2013), build order planning (Churchill, Buro 2011), strategy prediction (Preuss, Kozakowski, Hagelback, Trautmann 2013), spatial reasoning (Synnaeve, Bessiere 2012). The resulting bots using these complex techniques are still often beaten by fairly simple scripted AIs or average human players 11. In this thesis we address the sub-problem of adaptation to opponent s strategy. We do this by predicting our opponent s strategy and then choosing our strategy to counter it. We also improve current approaches by introducing an advanced scouting

9 methodology another very important task in partially observable environment, yet lacking in majority of current bots. We do this by implementing Bayesian model from (Synnaeve, Bessière 2011) into UAlbertaBot 12. We also discuss our attempt to improve this Bayesian model. Then we present our improvements to the bot s scouting and strategy modules. The rest of the thesis is organized as follows: First we explain basic concepts of RTS games important for our work. Next, we take a look at related work to our theme. Then we lay out our general approach and solution of our problem with our own improvements. We conclude by evaluating and discussing the results of the systems and the bot as a whole. The bot itself is evaluated in a small StarCraft tournament

10 2 RTS background In this chapter we briefly introduce the RTS genre. Then we illustrate different aspects of an RTS match. Afterwards we discuss few properties of RTS games that are important for our work and conclude by describing applications and importance of RTS research. 2.1 Introduction to RTS RTS is a sub-genre of strategy video games and can be also divided into several types. Most of them share the same basic rules and schema but there are of course exceptions and irregularities - especially in the advanced mechanics. In this work we will talk about RTS games of StarCraft type which represents classical RTS. Most of the concepts are applicable to other types of RTS as well but sometimes a little modification is needed. Competitive RTS games are played by two players, one versus one. Each player starts with just a few weak units and a base building. Each player starts on one of several predefined spawn points (starting locations). There may be more than two spawn points on each map and therefore at the beginning of the match players may not know where their opponent is. Maps in RTS games tend to be symmetrical for balance reasons and all spawn points should be balanced in the same way (similar amount of resources in symmetrical locations etc.). This means no player starts with an unfair advantage. The game ends when one player destroys all buildings and units of the opponent. This is usually achieved by producing military units and attacking. To be able to create new units and build buildings a player needs to have appropriate resources to pay for every new unit and building. There are usually several types or resources e.g. gold, wood etc. and players gather them throughout the game from various places. The rate at which a player gathers resources is called economy. When a player has strong or weak economy a player gathers resources at high or low rate respectively. A player may spend resources to produce civil units to strengthen his economy, military units to be able to defend or attack the opponent, or research upgrades for both economy or military. Upgrades are permanent improvements of player s units and buildings. A player might upgrade for example an attack power of his ground units or improve the rate at which wood is gathered. The last way to 6

spend resources is to build buildings. Different buildings allow player to produce various units, research new upgrades or to produce more units simultaneously.

11 spend resources is to build buildings. Different buildings allow player to produce various units, research new upgrades or to produce more units simultaneously. Both upgrades and advanced buildings combined are called technology. An example of RTS game interface and player s base is shown in Figure 2.1. Figure 2.1 Example of interface and buildings in player s base from popular RTS game Warcraft III: Frozen Throne from Blizzard entertainment. We can see civil units gathering wood and gold (resources) and basic buildings to produce both military and civil units. In RTS games there are also usually several races (or factions) a player can play as. These races may differ in their playstyle, units, buildings and technologies. Match-up is a certain combination of races playing against one another. 2.2 RTS match schema An RTS match can be divided to three parts: early, mid and late game. In early game players choose an opening. Similar to chess, an opening is a standardized sequence of moves. In RTS it means a player builds certain buildings and produces certain units in a recognizable sequence. We may divide openings into three basic classes: aggressive, economic and technological. In aggressive openings (also called rushes) player builds many basic units with basic resources as fast as possible and attacks as soon as possible. Then he 7

12 hopes the enemy will not have his defenses prepared. When rushing, a proper micromanagement of units is crucial because a single unit may be the difference between win and loss. In economical openings player is hoping for a longer game and tries to build a stronger economy than the opponent. Stronger economy will secure the player a better position in later stages of the game. When building an economy, a strong defense is key to survive since the player needs to be able to deal with rushes and overall aggression. Technological openings are in between the other two. They are based on advancing the technological tree in certain direction faster than usual. It is mainly because an advanced technology unit may be able to win the game if an enemy is not prepared to deal with it. If the player dedicates too much resources to any type of opening, he may not be able to recover if an enemy fends him off or exploits a weak defense. It is also worth noting that in early game players have very limited resources and spending resources on an unnecessary building or unit may lead to lost match. When choosing an opening players need to think about mid game because mid game is strongly connected to early game. In mid game, resources are still fairly limited and players usually commit to a single branch of the tech tree. This represents their strategy and allows them to build certain advanced types of units (e.g. flying units, mechanical units, cavalry etc.). There is no branch of the tech tree that dominates all other branches and every unit has strengths as well as weaknesses. This is called a rock-paper-scissors mechanics and is one of the key rules to most competitive RTS games. This means players in mid game try to build solid economy and good army composition to hurt their opponent and to efficiently counter opponent s units. In late game players usually have huge armies and all important technology researched. Players try to have the best army composition their race allows against their enemy s race. This is sometimes called heaven composition and usually consists of many different types of units. If the game goes all the way to late game, we call it a macro game. There is usually no time limit for RTS game and one game may last as long as several hours although games from 10 to 40 minutes are more common. 8

2.3 RTS Specifics RTS games have several specific features other strategy games may not have. Here we will talk about the aspects that are important to our work.

13 2.3 RTS Specifics RTS games have several specific features other strategy games may not have. Here we will talk about the aspects that are important to our work. Fog of war Fog of war is a feature of a vast majority of RTS games. It is special concept of partial observability where by default player does not have vision of the map and every friendly unit/building reveals a small radius around itself giving a vision over a certain small region. That means a player can in principle see the whole map and therefore what exactly is an opponent doing opposed to for example poker where seeing other player s hands would be considered cheating. Figure 2.2 explains concept of fog of war in StarCraft II. Simultaneous move - Both players play at the same time and the faster a player is, the more orders to his units he can issue. This means that a skillful player can move his units around the battlefield and at the same time build new buildings and produce new units in his base. This skill is measured in action per minute (APM). Actions include selecting a unit/building, ordering it, training a unit, etc. 13. Professional StarCraft players can perform more than 300 APM and modern bots have thousands APM. Figure 2.2 Concept of Fog of war illustrated on a single unit in popular RTS game StarCraft II: Hearth of the Swarm. Orange radius around the unit shows border of its vision. Everything outside this radius looks like the last time we saw it. Enemy units outside the circle are unseen

14 2.4 Scouting in RTS Scouting is very important in RTS games because there are several distinct paths (strategies) an opponent may take and player s response may differ significantly for each one of them. For instance, if an opponent builds a strong economy, a player may want to harass him or build an economy as well. In contrast if the player sees a weak economy and a large army, he needs to prepare his defenses. In RTS games scouting is visual gathering of information about an opponent. By scouting, a player wants to gather information about things such as build order, army position, economy situation etc. Proper scouting may considerably improve the model of an enemy (and player s response to it). Scouting is done by sending a small force (typically a single unit called scout) to enemy (or neutral) territory that is under fog of war. This should be a highmobility (or invisible) unit to be able to evade enemy defenses and gather all information needed as fast as possible (in the best case scenario without the enemy noticing). First scout is usually sent out soon after the game starts to discover an enemy s starting location. If players are spawned close to each other, there is smaller so called rush distance how long is it going to take to units to rush from one base to the other and how much time will the opponent be ahead in production and rushes are more of a threat. Other scouting is usually done in times when a player anticipates strategic decision from his opponent or when a player generally wants more information. Scouts are sent to typical locations around plyer s base and an enemy s base a place most important to infer a strategy. 2.5 Why are RTS games interesting RTS games usually consist of a story driven campaign and skirmishes versus AI opponents or other players. Campaigns tend to be divided into separate missions, where in each mission a new game mechanic or a unit is revealed to the player while also further developing the story. This way the player enjoys the story presented while learning how to play the game. In fact, most missions are 1v1 skirmishes where a player plays against a scripted AI. The story usually introduces the goal of a mission as well as potential threats enemies. Players therefore do not expect the fight to be necessarily fair if it fits the story e.g. a small player s force with a goal to 10

15 sabotage several established enemy bases should not be a fair encounter. Thus campaigns are played for the epic story rather than for a fair fight. What makes RTS games so interesting, replayable and competitive are skirmishes, especially 1v1. We can see the skirmish AI as a training before players are confident enough to play against human opponents or just as a way to enjoy the game further in single-player. Players expect a fair but challenging fight in a skirmish, where what they usually get is a scripted AI that can be easily tricked. Higher difficulties of this AI may be challenging at first, for example in games like Planetary Annihilation or Age of Empires II, but as players start to get better at the game the AI tends to fall of very fast because its behavior is predictable, can be exploited and the AI overall does not adapt very well. Another property of the scripted behavior is that if a player learns to win against the AI he does not necessarily learn to play the game and playing against other human players may be equally difficult as before. This means in RTS a good AI actually improves the gameplay opposed to many other genres. E.g. in first person shooters AI with inhuman reaction time may shoot a player before the player realizes what is happening, which does not improve the gameplay but brings frustration. Sometimes game developers implement unfair AI difficulties even to skirmish. These bots usually cheat by having no fog of war and increased resource gathering rate. But even these bots are not able to beat experienced human players which emphasizes the bots inability to predict an opponent and to adapt even if they have full vision and resource advantage. RTS games are often balanced over time and may evolve as time passes. When game developers release a patch or an extension to the game, where units are added or their strength is altered, developers also need to revise the scripted AI, since new strategies become viable and the meta game 14 changes. This may impose a lot of work and fine tuning even with a modular design. For these reasons a self-learning artificial intelligence able to adapt to current meta game which is challenging for experienced players to play against would be much appreciated for the gaming industry as well as it would be a great achievement 14 Current way to play the game based on recent trends (strategies that are currently considered strong etc.). 11

16 in artificial intelligence research. And we can say that RTS is the next big challenge in game AI after Go. 12

17 3 Related work Skynet had good macro, generally good micro, but bad strategic decisions and [lack of scouting/prediction]. - Oriol Vinyals, a former professional StarCraft player about defeating 2011 StarCraft AI competition winner a bot called Skynet (Buro, Churchill 2012). 3.1 High-level decision making Although RTS bots have a huge advantage over human players in their speed and ability to multitask and therefore are quite capable of micro managing units and spending resources efficiently, they still have not fully grasped one of the most important aspect of RTS games high-level strategy decision making and closely related opponent modelling. Research in this field has followed several directions. The first direction is supervised learning based on analyzing replays (game logs). In this approach a dataset of replays from experienced players is analyzed and labeled. This way each replay is put into one of few predefined brackets and since a replay is practically a sequence of all actions that happened in a single game, one can see labels as a meaning of these sequences. These replay-label pairs are then used for supervised learning of various systems which can later be used to infer a label for new replays. Both the labeling and learning/inference from data are difficult tasks since labeling of thousands of replays requires an automated approach and for the inference a machine learning algorithm is needed. (Weber, Mateas 2009) presented data mining approach to opponent modeling using decision trees for the labeling. (Synnaeve, Bessière 2011) built on that work and used Bayesian machine learning approach using a clustering algorithm for the labeling (a model we utilize in this work and describe in further detail in upcoming chapters). This model was also implemented to a StarCraft bot called BroodwarBotQ (Synnaeve 2012). Replays can be also used as cases in case based approaches such as case based reasoning and case based planning. (Cadena, Garrido 2011) use case based reasoning utilizing small amount of replays to learn the cases where a case consists of game state represented by a vector of counts of buildings/units and a solution to a case is a sequence of next five buildings/units/upgrades the bot shall produce. This work was also implemented as a bot to StarCraft with approximately 60% win rate against the build-in StarCraft AI. 13

18 Annotated game logs were used as cases in case based planning in (Ontañón, Mishra, Sugandh, Ram 2007). Their bot sees the game as a planning problem with an unexpanded goal win the game. The system tries to adapt by executing partially expanded plan where goals are expanded only when needed to achieve the best adaptation. Every time an unexpanded goal is reached the system looks up the best behavior in the case-base for the current game state and current goal and expands the goal. However, note that in this approach a manual annotation was used and it utilized a small number of game logs only two replays were used in the actual experiment. A different approach was shown by (Preuss, Kozakowski, Hagelback, Trautmann 2013). They avoid using game logs for teaching their fuzzy logic system and utilize expert knowledge instead. The game state is represented by a set of variables. The bot chooses one of predefined strategies based on usefulness of each strategy. Usefulness is computed by taking the game state variables and applying fuzzy rules that are defined for each strategy. Another high level decision making based on goal generating and management using evolutionary learning was proposed by (Young, Hawes 2012). This system generates goals based on current motivation, which can be seen as a state where the agent wants to be. The system implements dynamic goal reprioritization to account for the fact that it may manage up to forty goals generated this way with high potential for conflict. Each goal has assigned a priority profile which determines what will happen in case of conflict. These priority profiles were designed by humans as well as learned by evolution algorithm to discover that learned profiles performed better against other bots. 3.2 Scouting Because we are in practice still in environment affected by fog of war the quality of our decision making depends on the amount of information we have from the game. This puts a great emphasis on the quality of our scouting technique. Hostetler et al. (2012) present work that addresses the problem of unit count inference during the early stages of the game using Bayesian network model with evolving unit counts represented as a Markov process. A dataset of StarCraft replays was used for learning and cross evaluation of the predictive power of the model. 14

19 There was no implementation and evaluation in the actual game environment of StarCraft. Potential in-game power of the system was however shown on a single human played game reconstructed from a replay all actions were performed by the players the system was only an observer. (Si, Pisan, Tan 2014) present a scouting strategies to map a terrain under fog of war ignoring any enemy forces. They assume the topology of the map itself is unknown at the start of the game and one needs to discover it by scouting. This may be useful in RTS games where the map is procedurally generated and potentially asymmetrical, this is however not our case since we know the whole map in advance. Potential field was used in scouting in (Wang, Nguyen, Thawonmas, Rinaldo 2013) to avoid enemy military units. This work is however more in the domain of micro managing units than high-level decision making and planning. When there are enemy military units near the player s scouting unit a potential field is triggered to keep the scout alive for as long as possible. Although as we can see some research has been made in the field of scouting, we are not aware of any work solving the problem of determining the ideal times to send out scouts a problem addressed in this work. 15

20 4 General approach In this chapter we start by explaining our motivation to solve this problem in detail. Then we show the model we build our thesis upon. We conclude by discussing our improvements to both the model and the bot in general. 4.1 Motivation Strategy in RTS games is usually determined by specific units created at certain times and/or by using these units in special ways (e.g. secretly transporting units into enemy s base etc.). Players are usually not committed to a single strategy for the whole game from the beginning. To play RTS games competitively players need to adapt their strategy to what their opponents are doing. This is called countering an opponent. The simplest form of countering is producing units that are strong against enemy s units. Current RTS bots tend to lack this essential skill and they preselect their strategy at random or based on their previous matches with given opponent. Their approach is similar to solving the so called n-armed bandit problem described in (Russell, Norvig 2009, p. 841). This can be exploited by identifying what strategy is a bot using and selecting appropriate counter strategy of the opponent. Finding and performing common strategies and counter strategies is usually not as difficult as accurately predicting or at least identifying the actual strategy. Therefore, we decided to improve upon current bots and design a bot able to predict its opponent s strategy and respond with according counter strategy. 4.2 Strategy prediction In RTS games there is not an actual single moment when a strategy is selected. Strategies change, blend and the same build order may have different meanings in different game times. Even if players stick to a single strategy for the whole game it is difficult to say when are they actually using in based on their build order despite the fact they are using it the whole time. This is due to the fact that many strategies share parts of the build order and sometimes a single building built at the right time makes the difference between two different strategies. This means we must repetitively infer what is our opponent s strategy throughout the game based on changing game state (our observations and game time). When we talk about 16

21 prediction we simply mean doing the inference based on current observations but assuming different time. To predict a strategy, we used a machine learning approach presented in (Synnaeve, Bessière 2011). This model consists of Bayesian network (Russell, Norvig 2009, p. 510) and parameters of this model are learned from a dataset of labeled replays of experienced human players. We used two datasets, one produced by clustering from (Synnaeve, Bessière 2011) and another one produced by rule based labeling from (Weber, Mateas 2009). Note that the model is designed to predict an opening rather than a general strategy. This is however not a limitation for us because, as we stated earlier, opening is an early game strategy that greatly influences strategy in later stages of the game. We will therefore use words opening and strategy interchangeably A brief introduction to Bayesian networks Bayesian networks are one of several ways how to deal with uncertainty in artificial intelligence. In purely logical approach (first order logic is usually used) variables may have only crisp values of true or false and nothing in between. In real world however we usually deal with more complex variables and conditions that cannot be this easily classified. For example, take our bot which tries to determine which strategy is its opponent using. If we used the pure logic approach in best case scenario we would get a single true value for a single strategy. In most cases however we probably would not get any true value since there would not be any strategy that our opponent certainly uses opponent s actions would not fully satisfy predicates for any strategy. If we relaxed the logical predicates for the strategies, we would in contrast get too many true values since our opponent s actions would probably satisfy more than one of them. In both cases however some values are less false (or les true) than others and we would like to distinguish between them. This is why other methods such as fuzzy logic or Bayesian networks exist in artificial intelligence. When working with probability the simple way is to work with a joint distribution. Given random variables X 1 X n where X i {v i1 v im } joint probability distribution for these variables is P(X 1, X 2, X n ). Having this distribution, we can perform inferences such as p(x i = v i ) or P(X i X 1 = v 1 X i 1 = v i 1, X i+1 = v i+1 X n = v n ). 17

22 A full joint distribution for 3 random variables X, Y, Z with some arbitrary probabilities can be seen as a following table. Y Y Z Z Z Z X X If we want to infer for example, P(Y = true) we use simple marginalization (sum over all worlds where the statement is true). In our case that would be P(Y = true) = = 0.2 The problems with full joint distribution are that we need to store the whole table which has size O(d n ) where d is the maximum number of values for a random variable and n is the number of random variables. The other problem is that it may not be possible get all the probabilities to fill the whole table. Bayesian network is simply a way to represent the full joint distribution and have the power it provides without the need to store the whole distribution and with ability to fill only some of the probability tables. To do this conditional independence and a Bayes rule (equation (4.1)) is used. P(A B) = P(B A). P(A) P(B) (4.1) Bayesian networks are usually described as oriented graphs where nodes are variables and arks are dependencies between variables. A variable is dependent only on variables from which arks leads to it. This also describes how the probability tables for all variables will look like. For variables with no arcs leading to them we need only to know the probability of the variable itself e.g. for variable X we need table describing P(X). For variables with arcs leading to them we need conditional probability tables where the actual variable is conditioned by the variables from which the arcs lead. E.g. for variable X to which arcs lead from variables Y and Z the probability distribution we need is P(X Y, Z). For more information about Bayesian networks please refer to (Russell, Norvig 2009, p. 510). 18

23 4.2.2 Prediction model Now we will introduce the prediction model. Full joint distribution of this model is P(T, BuildTree, O 1 O N, Op t, Op t 1, λ). Graph representation of this model is shown in Figure 4.1. Figure 4.1 Original Bayesian network representation of the model. Let us now explain each random variable used in the model. The following description is adapted from (Synnaeve, Bessière 2011). Build Tree (a set of buildings): All possible build trees for given race. For instance, {b 1, b 2 } and {b 1, b 2, b 3 } are two different build trees. N Observations:O i {0,1}, i {1 n}, Ok is 1 (true) if we have seen (observed) the kth building (it will stay true even if the building has been destroyed). Each building has a unique natural number assigned to it. Opening: Op t {opening 1 opening m } take the various opening values (depending on the race). Each opening is generally a unique string of characters. Last opening: Op t 1 {opening 1 opening m }, Opening value of the previous time step (allows filtering, taking previous inference into account). 19

24 λ {0, 1}: coherence variable (restraining Build Tree to possible values with regard to observations). E.g. if we saw building A all build trees not including A are filtered out. Time: T {1...P}, time in the game (1 second resolution). Now let us decompose and discuss each part of the joint distribution in equation (4.2). P(T, BuildTree, O 1 O N, Op t, Op t 1, λ) = P(Op t Op t 1 ). P(Op t 1 ). P(BuildTree Op t ). (4.2) P(O {1 N} ). P(λ BuildTree, O {1 N} ). P(T BuildTree, Op t ) P(Op t Op t 1 ) is used to avoid wild prediction switching. A filter is used so that a previous prediction impacts the current one. Functional Dirac is used: P(Op t Op t 1 ) (Dirac) = 1 if Op t = Op t 1 = 0 otherwise This does not prevent the model to switch predictions, it just uses previous prediction posterior P(Op t 1 ) to average P(Op t ). This part of the equation is optional we do not have to take previous prediction into account. P(Op t 1 ) is copied from one inference to another. The first P(Op t 1 ) is initialized with the uniform distribution. And then in each next time step P(Op t ) is copied to P(Op t 1 ). A prior on openings for given matchup could be used to initialize as well instead of uniform distribution. P(BuildTree Op t ) = Categorical( BuildTree, p op ) is learned from labeled replays. BuildTree represents a single replay and Op t represents a label for this replay. This is the first part of the dataset used. P(O {1 n} ) is unspecified, uniform distribution is used. 20

25 P(λ BuildTree, O {1 n} ) is functional Dirac that restricts BuildTree values to ones that can co-exist with the observations. P(λ = 1 buildtree, o [1 N] ) = 1 if buildtree can exist with o [1 N] = 0 otherwise A BuildTree value buildtree is compatible with observations if it covers them fully i.e. it contains all observations, but it can have more buildings than that. For instance, BuildTree = {pylon, gate, core} is compatible with o #core = 1 but is not compatible with o #forge = 1. Where pylon, gate, forge and core are names of buildings and #X = unique building identifier for building X P(T BuildTree, Op t ) 2 = Ν(µ bt,op, σ bt,op ) are bell shape distributions (discretized normal distributions). There is one bell shape per couple (opening, buildtree). The time is discretized to resolution of one second. The parameters of these discrete Gaussian distributions are learned from the labeled replays. This is the second part of the dataset used. This model is designed to be able to answer a question: What is the probability of given opening in given time with given observations?. This can be seen as following equation (4.3). P(Op T = t, O {1 N} = o {1 N}, λ = 1) (4.3) P(Op). P(o {1 N} ). (P(λ = 1 bt, o {1 N} ). P(bt Op). P(t bt, Op)) bt BuildTrees If we run this inference for all possible openings we get a probability distribution over these openings in given time with given observations. Predictive capabilities of this model were empirically shown in (Synnaeve, Bessière 2011). They show that the model is able to predict opponent s strategy with 63%-68% of recognition rate 5 minutes into the game and about 75%-76% 10 minutes into the game. However, this is a performance with perfect information. With 50% missing features the prediction accuracy did not go lower than to 50%. This is mainly due to the hierarchical structure of tech trees in RTS games where if advanced building is seen we can assume all prerequisites are present as well even if we do not see them. It was also shown that the Weber and Mateas dataset performed slightly better in 21

26 measures of several percent points. More details about this model are to be found in the original paper (Synnaeve, Bessière 2011) Datasets Same as G. Synnaeve we used two datasets with his prediction model. The first dataset was created by (Weber, Mateas 2009). In their work they gathered all StarCraft replays from popular web sites GosuGames.net, TeamLiquid.net and ICCup.com including replays from tournaments and replays from top-ranked players. From each of these replays two temporal build order vectors were extracted one for each player. Each vector represents a single player s actions for an entire game. Formally each build order b, for a player P is defined as follows (Weber, Mateas 2009): b(x) = = t if time when x is first produced by P = 0 if x was not (yet) produced by P Where x is a unit type, a building type or a unit upgrade. These feature vectors were labeled using rule sets based on expert domain knowledge. The rule sets label replays in regard to order of production of the features. In the end each replay is labeled by a single strategy label. The second dataset is original work of Synnaeve and Bessière. They used the same set of replays as Weber and Mateas but different labeling algorithm. A temporal vector representing when given unit, building or upgrade type was seen first was also extracted from those replays. Then they developed a set of openings roughly corresponding to the ones of Weber and Mateas and for each of these openings a set of features which represents this opening these are the expert knowledge parts. Then a clustering on times of these features was done. E.g. if we have opening called ReaverDrop with features {Reaver, Shuttle} we perform a clustering of the replays on the time we first saw each of these features. If there was more than one label assigned to a single replay, the label for the opening that happened earliest in the game was chosen Model improvement Imagine a game scenario where our opponent choses an aggressive opening and destroys one of our buildings as well as some of our units. We successfully defend this attack by destroying several enemy units as well. Since the rush did not 22

27 work out as a winning scenario, our opponent needs to transition to different, more economy or tech based strategy. Now let us assume that both we and our opponent lost roughly the same amount of resources by losing buildings and units and that it will take us time M to gather these lost resources back. Let us say that our opponent choses to transition to some opening called for example DTRush. The model has learned to detect this opening at standard time T but our opponent has lost resources in the initial aggression therefore he can perform the opening at soonest at time T + M. At that time however the model will not anticipate this opening to be performed and may not warn us against it. This means that if an opponent is delayed in his execution of certain opening the model may not be accurate. Similar problem was mentioned in (Synnaeve 2012). These scenarios where a player is delayed in execution of an opening are quite common. Therefore, they are included in the replays the dataset consists of. These may however have been misclassified as different openings (and probably were, since the earliest opening performed in a game has priority during classification) but even if they were not misclassified the model should be more robust and resistant to delays if we account for them. A solution we present is simple improvement of the Bayesian model taking resources that have been lost into account. This improved model can be seen in Figure 4.2. Figure 4.2 Improved Bayesian model. Time node has been replaced by a Cartesian product of Damage x Time. 23

28 In our approach we define the amount of delay as total damage done by both players and damage is defined as amount of resources lost by both players 15. We do this by adding Damage as a second dimension to time by replacing Time in the original model by a Cartesian product of Damage and Time. When labeling replays for this improved model we need to consider not only time a certain feature has occurred but also a damage that has been done so far in the game at that time. For example, a standard timing for a feature DarkTemplar is 4 minutes no damage considered. If we saw a DarkTemplar at 6 minutes we would consider it late and non-important. With damage included, DarkTemplar at 4 minutes with 0 damage is still standard and 6 minutes with 0 damage is still late. However, DarkTemplar in 6 minutes with (high enough) damage done may be considered standard as well Countering Now when we know what is our opponent doing we infer his strategy it is time to act based upon this knowledge because knowing what is our opponent doing is rather useless without proper adaptation. That is why we developed custom counter strategies based on unit composition for all strategies recognized by the Bayesian model. The preferred way to determine strategy-counter strategy pairs would be based on a win rate of our strategies versus recognized openings from the dataset. The problem is we do not have these win rates. To get them we would need to implement these standard recognized openings but our implementation and execution of these openings would dramatically affect the win percentages giving us little useful information. To avoid implementing the strategies ourselves we could play against other bots. This would however in the end favor our bot dramatically since we would learn how to counter these bots specifically. This may not seem as a problem at first glance but our later evaluation would be rather unfair and it is important to note that other bots usually do not use these standard openings but custom strategies. Therefore, we decided to determine the counter selection based on 15 If player one lost more resources than the player two, we assume that these buildings/units were destroyed by military units of player two and therefore player two already spent resources to produce these even if he did not lose them, he cannot use these resources to transition to different opening and this next opening is delayed as well. 24

29 expert domain knowledge 16 and consulting with a semi-professional StarCraft II player. Our counter strategies are mostly based on choosing the right unit composition against enemy s unit composition. This is a valid approach in RTS games because of the rock-paper-scissors mechanics. Every tech tree has its advantages as well as disadvantages and each type of unit is strong against some and weak against others. To illustrate this concept, we can imagine simplified scenario where chivalry is very strong against archers, flying units cannot be attacked by ground chivalry and in turn archers able to easily shoot down flying units. This concept is especially strong when an enemy chooses a single branch of a tech tree and sticks to it, disregarding our own strategy. In our bot we always start with a universal strategy from which we can easily switch to all other strategies when we are certain of enemy s strategy. 4.3 Advanced scouting We have described how to predict a strategy and how we developed our counter-strategies. In practice however, we are still in an environment under fog of war and therefore precision of our predictions and responses is directly determined by our knowledge of current game state i.e. our opponent. In this chapter we will describe how to improve this knowledge in game by scouting, a concept we introduced in chapter 2.4. We start by describing our approach to determine when to scout using previously introduced probabilistic model. And we conclude by showing our algorithm for in-game execution of scouting When to scout - Information value decision making In this work we use scouting mainly to improve our model of the enemy which will then in return improve our prediction of the enemy s strategy. For these purposes knowledge of the enemy s build tree is crucial. Having constant vision of the opponent s buildings may seem like the best way to go however, doing so is very costly because we would need to have a unit in the enemy s base all the time. Not only does the unit need to avoid enemy s harassment - player needs to move it all the time, player also constantly risks losing it and the risk may overweight the information value

30 Slightly better approach may be to scout on a regular basis e.g. every minute (in limit we get the previous scenario). Every time we send out a scout we call it a scouting mission. We lower the risk but as we can see some scouting missions are better than others. That is because some scouting missions discover new buildings and others do not, furthermore some buildings are more important than others (e.g. new technology building tells us more about strategy than fourth instance of the same production building). We also cannot forget the interval. Setting this parameter too low leads to many missions most of which are not informative at all high risk. Setting it too high leads to a few potentially highly informative missions but we risk not seeing important buildings soon enough to adapt to them. We designed our advanced scouting approach based on observation described by following example. If a player builds a technology building early into the game, he dedicates resources that might have been used to boost economy or defense (things we consider standard). However, seeing this building changes our prediction and gives much higher probability to certain technology openings. Thus, we can value the importance of discovery of certain building by looking at how much it would change our current prediction of the opponent s strategy. If the change is small the building fits into our image or is not important at the time. If the prediction changes a lot, we would improve it by knowing that this building exists. If we combine this concept with the prediction model, we are able to determine a probability that our prediction would change if we had perfect information. This probability (p(opc)) can be computed by a marginalization over all possible build trees our enemy may have at this time which is shown in equation (4.4). We simply calculate the probability of given build tree at given time and add this probability to the sum only if this build tree would change our prediction. p(opc) = P(λ = 1 bt, o {1 N} ). P(BT = bt T = time). eq(bt, CurrOp) bt BuildTrees (4.4) eq(bt, op) = 1 if most probable opening of prediction based on bt is op = 0 otherwise 26

31 CurrOp in (4.4) stands for current most probable opening. P(λ bt, o {1 N} ) is used to filter only build trees compatible with our observations as has been discussed in chapter All parts of equation (4.4) are known except of P(BT T). To get the P(BT T) we utilize the Bayesian model we have but first we need to modify the equation to be compatible with our network. Using a Bayes rule and marginalization we get equation (4.5). P(BT T) = α. P(T OP = op, BT). P(BT OP = op) op Openings (4.5) Where α is a normalization constant. We can easily derive equation (4.5) from the joint distribution as shown in equation (4.6). op Openings P(BT, T, OP) = P(T BT, OP). P(BT OP). P(OP) op Openings (4.6) = P(T BT, OP). P(BT OP) op Openings This allows us to ask the model for a probability that our current prediction will change if we send out a scouting mission right now and allows us to value the information over the risk of losing a scouting unit. E.g. if the p(opc) = 30% we have probably a good model of our enemy and it seems like that our enemy probably does not have enough resources to surprise us with a technology switch or we assume he will not switch technology because he indicates dedication to his current one. A simple improvement to this concept is to use the whole probability distribution over openings the model gives us instead of using strict equality eq(bt, Op). Another improvement may be to consider the probability that we will win against the most probable opening predicted when computing eq(bt, Op). Let us say we currently assume our opponent has opening X and the knowledge that he has 27

32 build tree bt would change this assumption to opening Y. If our current counter strategy S has good enough win rate against Y there is no reason to switch the strategy and therefore to send out a scouting mission. But as we discussed in chapter we do not have these win rates and getting them is very difficult. This concept is inspired by Information gathering agent from (Russell, Norvig 2009, p. 629) How to scout Now that we know when is the ideal time to send out our scouting missions what remains is the actual execution of scouting. First we need to ask where to send our scout. In RTS games it is usually possible for players to build structures almost anywhere on the map so this looks like a difficult decision. It is however very rare for players to hide their buildings outside of their base. This is mainly because the effects of a building are usually apparent not much later after it has been built e.g. we fight new units this building can produce etc. But if a player builds a building outside of his base it is much harder for him to defend both his base and his buildings scattered across the map if an opponent eventually discovers them. This is a simple tradeoff between hiding information for a little bit longer and being able to defend one s structures. Players tend to choose the latter and instead of building far from base they simply put buildings on best spots inside their base and thus the main focus of our scouting is there. Next we need to decide how to get our scout to enemy s base. The simplest approach would to be to directly send it there using the shortest path. This usually works in the early game when there are just a few military units around and a scout can rush through them to gain the intel needed and then rush out hoping not to be destroyed. But when more military units appear they tend to destroy the scout before it can gather any valuable information and this simple approach no longer works. When designing an advanced approach, we need to keep in mind three important things. 1. Where does our enemy have military units? 2. Where does our enemy expect us to attack from? 3. What is the shortest ground path between our base and enemy s base? We are trying to avoid destruction of our scout, especially before it even gathers information. This can happen when it runs into our enemy s army. If we keep in 28

33 mind where our enemy has his army we can avoid it. If we also know where does our enemy expects attack from we may assume he will build static defenses such as turrets and bunkers on these places and we should avoid these as well. Lastly, if an enemy wants to attack us, there is no reason to prolong the time before the attack by going around the map therefore, we can expect that at least a part of the route he takes with his attack squad will be on the shortest path between our bases. With these things in mind we know the most dangerous spots on map we should definitely avoid and we can also determine the approach vector to enemy s base for our scout. Our algorithm works in two steps. First, when we are sure where is our enemy s base located we perform one-time static analysis of the map. Second, every time we send out scouting mission we perform a search to find a path based on our map analysis. Our static analysis divides the map into grid and each tile is assigned a natural number representing how dangerous it is for our scout to be there. We currently recognize these levels of danger. The tiles on the map may be safe, near danger and not safe. Not safe tiles are divided into four categories. War path tiles on the shortest path between our base and our enemy s base. Around war path tiles adjacent to war path. Choke point entrance to our enemy s base. Base defense tiles on the edge of our enemy s base that face our base. Near danger tiles adjacent to at least one tile that is not safe. Safe - all other tiles. This technique is easily extensible by adding new tile types and new rules. To assign Base defense danger we need to know where does our enemy s base end. To do this we utilize the fact that RTS maps are often divided into polygons that are connected via narrow corridors called choke points as shown in Figure 4.3. This division is performed before our analysis and the border of our enemy s base is the border of its polygon. Without this division we could use the same approach but an estimation of enemy s base size would be needed. 29

Figure 4.3 Choke point (red) connecting two regions (green) guarded by three Protoss units. The only way for ground units to get from one region to the other is through the choke point.

34 Figure 4.3 Choke point (red) connecting two regions (green) guarded by three Protoss units. The only way for ground units to get from one region to the other is through the choke point. Ground units cannot climb the rocks around. The second step in our algorithm is a simple graph pathfinding performed on the statically analyzed map. This can be done by any algorithm finding shortest path in graph such as A*. The result of our algorithm is depicted in Figure 4.4. Our approach improves the basic scouting idea by avoiding unnecessary danger. However, if we wanted to match human players we would need to further improve this approach by adding a real time analysis to our static one. This would allow us to better react to current game state, analyzing things such as current expected army position, more precise building placement and this way improve the approach vectors to enemy s base. 30

Figure 4.4 Image showing a result of static map analysis algorithm on a StarCraft 2 map called Destination. Red path shortest path from ours to enemy s base.

35 Figure 4.4 Image showing a result of static map analysis algorithm on a StarCraft 2 map called Destination. Red path shortest path from ours to enemy s base. Tile colors: Red War path, Orange Around war path, Yellow Choke point, Blue Base defense, Grey Near danger, Transparent Safe. Green path one scouting path possibly determined by our algorithm. 31

36 5 Implementation In this chapter, we will show our implementation of general concepts discussed in chapter 4. We decided to implement the concepts to a StarCraft: Brood War bot. This is mainly because most of the RTS research is done in StarCraft for it is well balanced, we can build on previous work, annual AI competitions are held thus we can use other bots for evaluation and the datasets we use are made from StarCraft replays. Another very important reason is that for StarCraft there is a standardized way to implement bots into the game using an API called BWAPI 17 which is very rare thing for popular RTS games i.e. the reason we do not use for example StarCraft II. 5.1 Bot architecture When implementing our algorithms and the probabilistic model we chose not to implement our bot from scratch since it would only have slowed our progress and diminished overall results because of our simple implementation of other areas such as micromanagement. There are fortunately many open source bots with various qualities to choose from. In this work we use UAlbertaBot 18 as our basis because it is well documented, uses advanced micro management (Churchill, Buro 2013) and build order planning techniques (Churchill, Buro 2011), and therefore is a great candidate for our implementation since we can focus only on areas of our interest. Implementation of our bot is publically available as open software with instructions how to build and run the bot 19. Architecture of UAlberta bot is highly modular, hierarchical and easily extensible as far as one sticks to the bot s basic design pattern. The core idea is that the bot is divided into several subsystems (called managers) and a single Game Commander class which takes care of updating these managers on every frame and calling event handlers on these managers. Each manager handles certain area of RTS gameplay such as micromanagement, strategy or scouting. Some managers and tools are not updated by the Game Commander but are rather designed according to the Singleton design pattern. These managers tend to hold global information (such as

37 enemy base location etc.) used by other managers. The hierarchical structure ensures that there are no unnecessary dependencies between modules meaning that a module communicates only to modules directly above or below in the hierarchy. This way it is easy to take part of this hierarchy and provide one s own implementation or to simply add a new module depending only on Game Commander. Cross module communication is still possible due to global singleton modules such as Information manager where data can be easily shared. Let us now briefly introduce several modules important for our work. We describe these modules as they are in the original architecture, our improvements to these will be discussed in following chapters. Production Manager Manages current build order - queue of buildings, units and upgrades we want to produce next. If an event occurs and new build order is needed Production Manager asks Strategy Manager for next build order goal set of buildings, units and upgrades we want to produce according to our strategy. Then asks Build order search system for new build order. Otherwise, if there are items in the build order left it tries to produce these items. Strategy Manager Takes care of high level decision making. When asked, retrieves next build order goal based on current strategy. Also decides what is our current strategy, if we should scout, whether we should attack etc. Combat Manager Given units from Game Commander and decisions from Strategy Manager performs tactical decisions creates squads and issues orders to these squads. Scout Manager Takes care of a single scouting mission at the beginning of the game. The purpose of this mission is to localize enemy s base and detect very early rushes. The first scout unit is always a worker unit given to Scout manager by Game Commander. Scout manager then micromanages the worker scout in enemy s base aiming to stay inside for as long as possible without getting killed. After that the manager is idle for the rest of the game. 33

38 Information Manager Works as a database of information about current game state. Stores information such as enemy base location, whether an enemy has invisible units etc. Map Grid Map representation by a grid of cells. Each cell holds information such as when was it last visited or how many units are in it. Following this scheme, we decided to add Prediction manager where we implement the Bayesian model from chapter 4.2.2, Advanced Scout Manager and Scout Unit where we implement our scouting algorithm from chapter We also modified Strategy manager and added our implementation of countering from chapter and scouting decision making from chapter to it. Map Grid was also modified to support our advanced scouting pathfinding from chapter Figure 5.1 shows higher levels of the architecture with our added modules 20. Figure Part of UAlberta bot architecture. Each box represents a single class in the architecture. Arrows display a hierarchy of classes where upper classes issue orders to lower ones. Dashed boxes are the classes we developed ourselves (Prediction Manager, Advanced Scouting Manager, Scout Unit) or the ones we modified (Strategy Manager, Map Grid). 5.2 High level decision making In this chapter we will focus on our implementation of the Bayesian model into UAlbertaBot. Specifically on concepts from chapters 4.2 and implemented in Strategy and Prediction Manager. We will show detailed architecture of these two managers and their interaction with each other as well as with the rest of the system. 20 More details about the original architecture can be found at 34

39 5.2.1 Prediction manager The Prediction manager has two basic functions for which it provides an interface. The first is to compute a probability distribution over all possible openings - this corresponds to concepts from chapter specifically to equation (4.3). The second is to compute a probability that our current prediction would change if we had perfect information which is a concept from chapter specifically an equation (4.4). Our implementation of the Prediction manager is heavily based on implementation of the original model from (Synnaeve, Bessière 2011) by Gabriel Synnaeve 21. From his BroodwarBotQ we copied a module called ETechEstimator which roughly corresponds to our Prediction manager. We then refactored, debugged and modified this code to our needs and implemented it into UAlbertaBot as a singleton class. However, most of the original structure remained unchanged. When initialized, the Prediction manager loads serialized probability tables of one of the two datasets as explained in chapter The actual tables we use are taken from BroodwarBotQ since if we did it ourselves, we would end up with the same results. During the game the Prediction manager keeps a set of observations and whenever a new enemy unit is seen it is added to this set. When asked, the prediction manager computes the probability distribution over all possible openings. This is done by directly implementing the equation (4.3). If we want to know the probability that our prediction would change with perfect information Prediction manager again works as expected and implements equation (4.4) Strategy manager Strategy manager is a module handling high level decision making in our bot. It is implemented as a singleton class and it is not updated on regular basis but rather asked for decisions by other managers. The first function of Strategy manager is to determine what strategy should our bot use at given time. The Strategy manager asks Prediction manager for probability distribution over all openings and choses a counter strategy for the opening with currently highest probability. All openings we are able to detect through the model are shown in Table 1. Note that the set of openings is bound to the dataset used. However, the two datasets we use have

40 roughly the same set of openings. The strategies our bot plays are shown in Table 2 with corresponding openings they counter. Our bot always starts with a universal opening consisting mainly of gateway units. This allows us to easily transition to any other strategy when we are confident enough of our opponent s strategy. It is very important to note that the model we use converges very well with number of buildings i.e. at single time frame the more buildings we see the more precise the prediction may be. But the model does not converge with time. This means that we are not more certain of enemy s opening the longer we play the game and we definitely cannot say that at the end of the game the opening distribution converges to a single most probable opening. To account for this and to avoid wild strategy switching we detect whether a probability of a single opening reached defined threshold in our implementation 50%. If this threshold is reached, we chose our strategy to counter this opening and we do not switch our strategy anymore. If, however no opening reaches this threshold, whenever Production manager needs new build order goal we return it based on current prediction and the strategy may change between two requests from Production manager. 36

41 Table 1 A list of openings we detect with a brief description. Race Opening Description Protoss Terran Zerg Speed zealot DT rush Dragoons Reaver drop Flying attack High Templars Two gate Bio Two factories Vultures Fast expansion Dropship Fast Mutalisks Mutalisks Hydralisks Speedlings Lurkers Zealots (basic melee unit) with upgraded attack and speed. Dangerous especially to weak ranged units which cannot run away. Rush with hi-tech invisible units. Usually wins the game if one does not have a detector. Dragoons (basic ranged units) with upgraded range. Strong against various types of units due to their speed and range. Reaver (slow high damage ranged unit) transported by Shuttle (transport unit unable to attack) past enemy s defenses. Can easily destroy many units gathering resources and devastate economy. Attack with various flying units. Dangerous if one does not have enough units able to shoot flyers down. High Templars are hi-tech units with ability dealing damage in area over time. Especially devastating against concentrated armies. Rush composed of basic units (Zealot, Dragoon) without any upgrades. Fast attack with Marines (basic ranged unit) and Medics (units able to heal biological units). Attack lead by Siege tanks (strong long range units). Especially devastating due to their splash damage and longest range in the game. Vultures (fast mechanical units able to plant invisible mines) used to harass enemy early into the game. Can easily kite slower units with shorter range leading them into an invisible minefield. Defensive opening used to secure economical advantage if undetected. Dropship (transport unit unable to attack) can transport all kinds of units past enemy defenses. Especially dangerous when multiple squads are dropped simultaneously to different enemy bases. Small force of Mutalisks (highly mobile flying units) used to harass an enemy early into the game. Dangerous if one cannot attack air units or is unable to defend all bases at once. Bigger force of Mutalisks used similarly to Fast Mutalisks but with more devastating effect. Usually combined with ground attack from different side. Hydralisks (ground ranged units with high damage) with upgraded movement speed and attack range. Strong due to high mobility and range. Zerglings (basic small melee units) with upgraded movement speed. Very strong if they manage to surround an enemy unit. Usually combined with Hydralisks. Lurkers cannot attack unless burrowed. When burrowed they cannot move but they are invisible. Very strong for defending or slowly sieging enemy s base. 37

42 Table 2 Our counter strategies with the openings they counter and a brief description. Strategy Counters Description Gateway units Robotic facility Stargate Anti-air High templars Two gate (P), Fast DT (P), Fast Mutalisks (Z), Speedling (Z), default Speed Zealot (P), Dragoons (P), Bio (T), Fast expansion (T), Hydralisks (Z) High Templar (P), Two factories (T), Vultures (T), Lurkers (Z) Flying attack (P), Reaver drop (P), Dropship (T), Mutalisks (Z) Nothing Force of Zealots and Dragoons with optional detector and upgrades. Used as a default strategy as well because transition from it are easy and gateway units are always useful. Strategy utilizing the strength of Reaver s splash attack. Combined with gateway units. By this strategy we counter enemy strategies that are based around units unable to attack flying units. This strategy consists of heavy air defense to counter enemy flyers. This is complemented by gateway units. High Templars are advanced spell casters very often used by human players but UAlberta bot does not have micro capabilities to handle High Templars. The second function of strategy manager is to decide whether we should scout at current time. Most of the work is however done by Prediction manager as it is asked for the probability that the opening prediction would change with perfect information. Given this probability and an actual unit we want to scout with Strategy manager decides if the probability is high enough and if the unit is good enough. E.g. if we had only ground units to scout with later in the game Strategy manager would advise not to scout because the scout would probably die before gaining any information due to expected strong enemy defenses. The last bit of information Strategy manager needs to decide is the last time we saw enemy base. This is important because we do not want to send scouts if we are for example currently fighting in enemy base and have all desired vision over it. 5.3 Scouting In this section we would like to present our implementation of algorithms to perform scouting presented in chapter This was done mainly in modules Advanced Scouting Manager, Scout Unit and Map Grid Map grid Map grid is a singleton module representing a map by a grid of square cells where each cell holds information such as how many units are there or when was the 38

43 last time this cell was seen by our unit. We added an information about level of danger. Map grid is now capable of performing a static map analysis (4.3.2). The static analysis is performed only once since the first time the procedure finishes successfully we remember it and do not perform it anymore when called. To assign danger levels we use following algorithm. We perform the analysis only if we know enemy base location which is discovered by the worker scout at the start of the game. First we get an enemy base polygon and assign the Base defense danger level to a half of the tiles on the enemy base polygon closer to our base. This is because we need a way into the enemy s base and the enemy expects our attack to come from our base. Adjacent cells are assigned the Near danger level. Then we determine the shortest ground path between the two bases and assign tiles on this path War path danger level, adjacent tiles are assigned Around war path level and tiles adjacent to them are assigned Near danger level. Last we get the entrances to our enemy s base and assign Choke point danger level to them, Near danger level is once again assigned to adjacent tiles. Note that when assigning Near danger we never override higher danger levels. It is only applied to safe tiles, whereas other danger levels may override each other in the process but only higher danger overriding lower due to the order we assign the levels Advanced scouting manager Advanced scouting manager (ASM) represents and implementation of concepts found in chapter It takes care of determining the best scouting unit and performing the actual scouting mission if needed. ASM is implemented according to the pattern of rest of the managers which work with units. That is, it registers for update in the Game Commander and when updated a set of available units is passed to it. From this set the ASM may select any amount of units for its scouting missions. Note that the Game Commander is responsible for the order in which managers get to draw from the set of available units. E.g. if we first pass the available units set to Combat manager we can expect it to take all military units and thus no units would remain for the rest of the managers (ASM would not be able to scout). First the ASM determines the best scouting unit in the set of given units. Then it asks the Strategy manager whether a scouting mission is required with given unit (e.g. Strategy manager probably will not want to scout with a weak ground unit later into the game - see 4.3.2). If a scouting mission 39

44 with this unit is approved, ASM takes this unit from available units and creates Scout unit from it. Then a static map analysis is performed if not done already and based on this analysis A* pathfinding is done, yielding a path for the Scouting unit Scouting unit A Scouting unit (SU) represents a unit (potentially a squad of units) on a scouting mission. Given a path, SU performs a scouting mission following this path. SU works as a finite state machine (FSM) and the diagram can be seen in Figure 5.2. Figure 5.2 A diagram of a finite state machine used as a logic of Scout unit. The SU starts in Going to scout state where it follows previously given path. When it finishes this path it must be in enemy s base and therefore the state is switched to In enemy base. Then an exploration of enemy s base is performed and when finished the state is switched to Comming from mission and the unit follows the same path it came by. When it reaches the initial state of the path the position it originally came from it must be safe in our base and it remain Idle. When the unit is Idle it is no longer scouting and ASM can decide whether it will keep the unit for future mission or whether the unit will be returned to the pool of units available to all managers. 40

45 6 Experiments In this chapter, we will describe a few experiments we have done not only with our improved bot but also with the subsystems of our bot. This is mainly because sometimes it is easier to create standalone applications and experiments to show the subsystems on their own than to evaluate them only from the actual gameplay and win rates. 6.1 Model improvement We have not performed any experiments comparing the old model with our improved version because we were not able to create a meaningful dataset for our new model. We will however show the steps of our attempt to create a dataset for our model and discuss the issues of our approach as well as possible solutions and improvements Dataset and clustering To gather our base dataset, we used the same approach as (Weber, Mateas 2009) and used StarCraft replays from popular sites including tournament replays and replays from top players. To label these replays we have decided to use a method from (Synnaeve, Bessière 2011) based on clustering. In our case however we needed to extract more information from the game logs - namely the damage done. To determine current damage at given time, every time a unit is destroyed we add its damage value to the total sum. To combine all resources into a single damage value we used following formula: Damage = Minerals + Gas gasmult + Supply supplymult Where Minerals, Gas and Supply are values based on the unit destroyed and gasmult = 4/3 and supplymult = 50 are values based on expert domain knowledge (Synnaeve, Bessiere 2012). This way we get for each match and each feature a vector of pairs where each pair represents current damage and time when we first saw given feature 22. This gave us a dataset of unlabeled replays we decided to label using a clustering method based on clustering used in (Synnaeve, Bessière 2011). We tried 22 We have made this process automated and it is a free software and together with the unlabeled dataset available at 41

46 several clustering methods in R library MClust (Fraley et al. 2012) but all yielded similar results. Figure 6.1 Clustering on a single feature Terran Expansion. Each point represents damage and time when the expansion was built in a single match. X axis represents time in game frames. Y axis represents Damage done. Ellipses represent the clustering of the games. Blue dots are classified as late expansion and red squares are classified as early expansion. Ellipses describe the variance of each cluster and their center is where the center of each cluster is. The green area bounds points that are clearly better candidates for the red cluster than the points in red cluster bellow them. Problems of our approach stem from the simple nature of clustering. As shown in Figure 6.1 we cluster games into two categories. The first category are games where a Terran player have built an early expansion and the second category are games where this did not happen. As explained before we define the earliness of a feature based on time (the less time the earlier) and damage (the more damage the greater delay the earlier). This means the matches where an expansion was built early by a Terran player are the ones on the left (with low damage) and top (with high damage). The clustering captured the games with low time but failed to capture games with high damage and went for games with low damage instead. What we would like to see are games where the time is low and damage is high. If we have two matches where a feature occurred at the same time we consider earlier the occurrence in the match where more damage was done this clearly is not the case 42

47 for our clustering. On the picture we can see that the points in the top right corner have much higher damage than the red clustered points below them. However, these blue points are much closer to the center of the blue cluster than to the red cluster and are classified as blue i.e. games without early expansion. This approach therefore does not yield new dataset since the clusters are very similar to the ones in the original dataset Solutions There are several solutions to this problem that are worth noting as a base for a future work. The first approach we have also tried is to aggregate the replays based on damage into few sets. E.g. three sets: low, medium and high damage. Then we perform all the clustering three times. E.g. when we cluster on Terran expansion we first take the replays when that happened with low damage, then with medium and finally at high. The clustering itself is the original clustering that does not involve any damage, only time. This is a valid approach that gives us three datasets of labeled replays and solves the problem of clustering with added dimension since we always cluster on time only. To use this dataset, we would need to monitor the damage that has been done in the game as we play and switch datasets accordingly which again would not be a problem. The issue here is that since we would split the dataset in three we would have only a few replays for some combinations of damage and opening which would make the dataset very inaccurate when these cases occur in game. Other ideas are to improve the labeling itself. Instead of using a simple clustering methods a more complex algorithm may be used to overcome this limitation. Or to invent a more complex metric combining both time and damage done at a certain ratio. The complex metric approach may work with simple clustering the metric itself is however not trivial to balance. Since we would need to tweak the metric for our particular case (dataset) we would expect that on different dataset this would not work as expected. 6.2 Scouting model Generally, in real game many things may happen and evaluating weather our scouting prediction is correct or not from things such as win rate is not very accurate. This is why we decided to develop Prediction tester an open software working on 43

48 basis of our Prediction manager. In Prediction tester we can easily test what would our Prediction manager say about a game state without actually needing to play the game and introducing such a state. Prediction tester also includes automated plotting in R and is freely available as a Git repository 23. When testing the Prediction manager, we focused on the main point of the Advanced scouting approach. We do not want to scout if we are certain of what is our enemy doing and we anticipate no new buildings in his base. Therefore, if our Prediction manager has a set of observation that are expected in current time we do not want to scout. In our experiment we set the observations of Prediction manager to a set build tree that is specific for certain opening. Than every second we let Prediction manager to infer the probability that the opening changes with perfect information. The results are then plotted. We expect to see fairly high probability before the build tree is common in game. Then at the time it is common to have this build tree, we expect to see much lower probabilities. Afterwards the probability may again rise because this build tree is no longer common and our enemy probably has more buildings he may even switch his strategy. This is very similar to the way human players may be thinking about scouting. If we saw certain build tree and we know the time this build tree is common, we will not bother scouting. On the other hand, later when the build tree we saw is no longer common and we expect our enemy to have more buildings we will scout again. We can see two different plots in Figure 6.2 and Figure 6.3 one versus Zerg player indicating Mutalisk opening and the other one versus Protoss indicating Dark Templar opening both from Protoss perspective

Figure 6.2 Graph showing probability that our prediction would change with perfect information for build tree indicating Mutalisk opening for PvZ matchup. X axis represents time in seconds.

The blue part of the graph is not interesting since it is too early for this build tree to exist and we should ignore it.

49 Figure 6.2 Graph showing probability that our prediction would change with perfect information for build tree indicating Mutalisk opening for PvZ matchup. X axis represents time in seconds. Y axis represents the probability. The build tree is {Hatchery, Overlord, Extractor, Expansion, Spawning pool, Lair, Spire}. The blue part of the graph is not interesting since it is too early for this build tree to exist and we should ignore it. The green part of the graph represents the time, when this build tree is common and we can see the huge dive in probability. Figure Graph showing probability that our prediction would change with perfect information for build tree indicating Dark Templar opening for PvP matchup. X axis represents time in seconds. Y axis represents the probability. The build tree is {Nexus, Pylon, Assimilator, Gateway, Cybernetics core, Citadel of Adun, Templar archives}. The blue part of the graph is not interesting since it is too early for this build tree to exist and we should ignore it. The green part of the graph represents the time, when this build tree is common and we can see the huge dive in probability. The results of our experiment show what we expected. When the build tree is expected from our enemy the probability we will gain anything is very low. In the Mutalisk build tree in went from about 75% to about 15% and in the Dark Templar build tree it went from about 78% to about 5%. This shows that our Prediction manager would advise not to scout in these times because most likely no new 45

A Bayesian Model for Plan Recognition in RTS Games applied to StarCraft

A Bayesian Model for Plan Recognition in RTS Games applied to StarCraft 1/38 A Bayesian for Plan Recognition in RTS Games applied to StarCraft Gabriel Synnaeve and Pierre Bessière LPPA @ Collège de France (Paris) University of Grenoble E-Motion team @ INRIA (Grenoble) October