Optimizing an Evolutionary Approach to Machine Generated Artificial Intelligence for Games

Size: px

Start display at page:

Download "Optimizing an Evolutionary Approach to Machine Generated Artificial Intelligence for Games"

Lillian Walsh
5 years ago
Views:

1 Optimizing an Evolutionary Approach to Machine Generated Artificial Intelligence for Games Master s Thesis MTA Aalborg University Medialogy

3 Medialogy Aalborg University Title: Optimizing an evolutionary approach to machine generated artificial intelligence for games Theme: Master s Thesis Project Period: Spring Semester 2016 Project Group: MTA Participant(s): Andrei Vlad Constantin Richard Alan Cupit Konstantinos Monastiridis Abstract: This thesis presents an investigation into how to effectively optimize the production of machine generated game AI, exploring the behavior tree model and evolutionary computation. The optimization methods focus on providing a proof of concept that a system can be designed and implemented, through a series of studies, being capable of producing game AIs with alternative behaviors within a playthrough of a game. The construction of these behaviors should be informed by the evaluation of previous behaviors, as well as show a quantifiable improvement in performance. The studies evaluate the performance of a generated AI for the game XCOM 2, a Turn-Based Tactics Supervisor(s): video game. The AIs will be evaluated by running combat simulations Martin Kraus Copies: 1 against the standard AI implemented by its developers. Ultimately, the results of the process led to an user ex- Number of Pages: 90 periment, in which the most successful Date of Completion: machine generated game AI won 50% May 24, 2016 of matches. The content of this report is freely available, but publication (with reference) may only be pursued due to agreement with the author.

5 Contents List of Figures List of Tables vii ix 1 Introduction 1 2 Background Game AI Perspectives History Modern Video Game AI Behavior Trees Overview Uses in game industry and research Evolutionary Algorithms Genetic Algorithms Genetic Programming Uses in game industry and research Evolving Behavior Trees Platform of Application Turn Based Tactic Games XCOM Project Statement 23 4 Design and Implementation Mod Implementation Game Systems Normalization Environmental Cover Default AI Genetic Algorithm Implementation v

6 vi Contents Chromosome Design Example Chromosome Implementation WatchMaker Framework Generational Evolution Engine Experiment And Results Pilot Test Design Analysis of Data Study Design Analysis of Data Study Design Analysis of Data Study Design Analysis Of Data Final Evaluation User Testing Discussion and Conclusion Discussion Dynamic Elitism Fitness Function Chromosome Structure Unit Conditions and Decisions Conclusion Future Directions Further Development Alternative Directions Bibliography 83 Appendices 85 A. Extra Content B. Unit Condition Implementation C. Unit Decision Implementation D. Questionnaire E. Classifying Evaluation Matches

7 List of Figures 2.1 Sequence checking for Ammunition and if so, the agent Reloads and the Sequence returns Success The Selector will return Success when either of depicted Actions return Success Visual representation of a string encoded chromosome, holding the solution variables Visual representation of a string encoded chromosome, holding random variables Single-point crossover performed an a string Two-point crossover performed an a string Visualization of Uniform Crossover - The H characters represent the positive result of a coin toss Visualization of Input (top) and Output (bottom) string from a genetic mutation operator.the character H represents a successful coin toss XCOM 2 unit movement UI An example decision tree structure, with example string representations Complete example chromosome Call to instantiate and Evolution Engine of type string, using the Generational Evolution Engine interface Example fitness evaluator code Chromosome encoding for the first action point Example of crossover producing undesirable offspring Average fitness % and win % of candidates per generation of Pilot Test Graph showing the number of Unit Conditions contained within candidate solutions who won a minimum of one game vii

8 viii List of Figures 5.5 Graph showing the number of Unit Decisions contained within candidate solutions who won a minimum of one game Chromosome decision structure for Study One Chromosome encoding for the first action point Average fitness % and win % of candidates per generation for Study One. Fitness average does not include modifier Graph showing the number of Unit Conditions contained within candidate solutions who won a minimum of one game at Study One Graph showing the number of Unit Decisions contained within candidate solutions who won a minimum of one game at Study One Average fitness % and win % of candidates per generation for Study 2. Fitness average does not include modifier Occurrence of Unit Conditions within candidates winning 3 or more consecutive matches at Study Occurrences of Unit Decisions within candidates winning 3 or more consecutive matches at Study Example chromosome structures showing the irrelevance of the order of the Unit Conditions Average fitness % and win % of candidates per generation for Study 3. Fitness average does not include modifier Occurrence of Unit Conditions within candidates winning 3 or more consecutive matches at Study Occurrences of Unit Decisions within candidates winning 3 or more consecutive matches at Study Playtime differences between XCOM2 and XCOM Enemy Unknown/Enemy Within Correlation between participants play times in descending order and combat outcome statistics Most evolved candidate AI: BCDgdcdaihfADEigidfdeg UC&D breakdown Classification error rates for a trained K-nearest neighbor classifier, evaluating both sets Confidence matrix of a K-nearest neighbor classifier evaluating the allonlyresims set

9 List of Tables 4.1 Example Unit Condition characters for the GA to choose from, and their identifiers Example Unit Decision characters for the GA to choose from, and their identifiers Set of Unit Conditions used in the pilot test Set of Unit Decisions used in pilot test Pilot Test s Solution space size Candidates that won multiple games over the course of the evolution Set of Unit Conditions used in Study Set of Unit Decision used in Study Solution space size for study Success of elite candidates produced by the first 3 generations of Study Elite candidate performance during Study One Elite candidate performance during Study One Set of Unit Conditions used in the Study Set of Unit Decision used in Study Size of Solution space from Study Elite candidate performance during Study Combat performance information about candidates that failed to win consecutive matches Study One s Solution space size Elite candidate performance during Study Combat performance information about candidates that failed to win consecutive matches Combat performance information about candidates that failed to win consecutive matches Results from the Wilcoxon rank sum test, comparing candidates from study 3 against those from study ix

10 x List of Tables 5.21 Results from the User Testing performed on the 5 best BTs evolved and on the Default AI Results from the User Testing performed on the 5 best BTs evolved and on the Default AI

11 Chapter 1 Introduction In the recent years, the game industry has shown tremendous growth, and has created a multi-billion-dollar industry, reaching millions of consumers. [4]. This, and continued increase in computational power, has resulted in advances in every aspect of video game software. A single-player video game usually involves a player competing against enemies or obstacles. Hence, there is a need for the further development of game Artificial Intelligence (AI). AI is applied in different aspects of games, from movement to Non-Player Character (NPC) behaviors and reactions, thus having a large impact on gameplay. This impact on gameplay makes game AI a crucial part of game development. However, developing a strong game AI is an arduous task. The AI agents need to behave realistically, as well as seem as human as possible. Furthermore, they need to be able to react to random events and make decisions dynamically, depending on a player s reactions. These demanding attributes have made game AI development an area that the game industry has focused less on, in comparison to areas such as graphics, animations or physics. At the same time, this lack of focus makes game AI an interesting research topic both for academic and industry use. Many game AI strategies have been developed to cater to the often disparate needs of games from different genres. Behavior trees (BTs) have been proposed as a new approach to the designing of game AI. Their advantages over traditional AI approaches are being simple to design and implement, scalability when games get larger and more complex, and modularity to aid reusability and portability. The popularization of BTs within the gaming industry [10] and scientific community, has led to research that employs techniques favored by fields of traditional AI, such as evolutionary algorithms. They show that diverse sets of BTs can be generated [14], which are able to solve the problem of defeating an opponent, for specific game genres, by generating strategies that might not be immediately obvious to a developer manually implementing an AI[18], or simply creating strategies that 1

12 2 Chapter 1. Introduction solve specific situational needs. This research has the potential to expedite the production of game AI s, and create sets of AIs which present diverse behaviors, or even use the behaviors generated to be the sole method by which a game alters the challenge presented to players. However, they still need to be pre-computed and implemented into a game, they cannot adjust their behavior based on the actions of a player, and thus still offer a predictable experience. Methods for creating machine generated game AI s are usually expensive, requiring a large amount of sample data. This has resulted in very little research exploring how to optimize these methods to such a degree that they are able to generate AI s within a single play through of a game. If this were possible, it could offer developers new ways to offer players a challenge. This report presents an investigation into how to effectively optimize the production of machine generated game AIs, using the techniques from a sub-field of machine learning - evolutionary computation - in which candidates are evaluated on their performance by some metric. The optimization methods proposed focus on providing a proof of concept that a system can be designed and implemented, through a series of studies. The system should be capable of producing game AIs with alternative behaviors within a playthrough of a game, and the construction of these behaviors should be informed by the evaluation of previous behaviors, and show a quantifiable improvement in performance. The studies conducted evaluate the performance of a generated AI for the game XCOM 2, a Tactical Turn-based (TBT) game. The AI s will be evaluated by running simulated game scenarios against the standard AI behavior implemented by the developers of the game.

13 Chapter 2 Background 2.1 Game AI In the video games industry, developers typically draw upon existing methodologies from the field of artificial intelligence to create behaviors for NPCs, which attempt to simulate the observed behavior of some known entity. However, a distinction should be made between what is considered general purpose AI - which encompass many scientific disciplines attempting solve the problem of creating a genuine intelligence - and game AI, which often refers to a broad set of algorithms that also employ techniques from control theory, computer graphics and computer science. Traditionally the development of game AIs was driven by providing the illusion of intelligence to players, and is focused on generating interesting or challenging gameplay, distinguishing it from the fields of general AI. Workarounds are employed to circumvent the limited intelligence of game AIs, for example the difficulty of a game can be increased by making the player face more and more enemies. Game AI/heuristic algorithms are utilized in a wide array of game systems. The most obvious is in the control of any NPCs in the game, with scripting currently being the most common method. Pathfinding is another common use for AI [17], widely seen in real-time strategy games. Pathfinding is the method for determining how to get an NPC from one point in a level to another, taking into consideration the terrain, obstacles and possibly visibility. The concept of emergent AI has also been explored in games The AI entities in these games are able to "learn" and adapt their behavior by analyzing the consequences of a player s actions, rather than the input driving the actions. While these choices are part of 1 Halo Series, Bungie Studios, Black and White, Lionhead Studios F.E.A.R, Monolith Productions,

14 4 Chapter 2. Background a limited pool, it does often give the desired appearance of an intelligence on the other side of the screen Perspectives In recent years, game developers have presented an increasing awareness of scientific AI methods and there is a growing interest in computer games by the academic community. There are significant differences between the various application domains of AI, which serve to prove that game AI can be viewed as a distinct sub-field of general AI. However, a note must be made about the fact that some game AI problems can not be solved without workarounds. As an example, calculating the position of an obscured object based on previous observations is considered a very difficult problem when the AI is deployed in a robotics simulation, but in a computer game, the NPC can simply look up the position in the game s scene graph. Although this can also lead to unrealistic behavior making it not always desirable History Games featuring a single player campaign with AI controlled enemies started appearing in the 1970s with AI unit movement being based on stored patterns. Later, around the 1980s, the success of arcade video games like Space Invaders 4 started popularizing the idea of AI opponents. Over the course of the next 10 years, this concept developed via the addition of features such as difficulty levels, distinct movement patterns, game events dependent on player input, unit formations, individual enemy personalities, leader-followers hierarchy, and that is to name only a few of the advancements featured by the games of that decade. In the 1990s, the emergence of new game genres and the general growth of the industry lead to the creation of formal AI tools like Finite State Machines (FSM). For example, real-time strategy games tasked the AI with many objectives, including incomplete information, path finding problems, real-time decisions and economic planning, among other things. Although the first games featuring this new AI implementation had major issues with the system, later games exhibited more sophisticated AI, thus confirming the benefits of using the method and leading to further development of the concept Modern Video Game AI Once these formal AI models became popularized, research and development shifted towards improving the behavior of computer controlled units. One example, of the more beneficial and efficient features found in contemporary game AI, is the ability to hunt player units. Many of the initial AI s exhibited what was 4 Tomohiro Nishikado, 1978

15 2.2. Behavior Trees 5 perceived as machine-like behavior, which makes sense considering the binary nature of yes/no. If the player was present in a specific area, the AI would react either entirely offensive or defensive. By contrast, in this hunting state, the AI will look for realistic cues, such as sounds made by the other units or footprints they may have left [24]. These developments ultimately allowed for more complex sets of rules, leading to a richer game-play experiences, because the player is encouraged to actually consider how to approach or if to avoid an enemy. Another valuable breakthrough in game AI was the development of a "survival instinct" for AI controlled units. In-game, the computer can recognize the shifting state of different objects or events in the environment and determine whether it is beneficial or detrimental to its survival. The AI can then search for an advantageous or safe position before engaging in a scenario that would leave it otherwise vulnerable, such as reloading or throwing a grenade. This can be achieved by set markers that tell the AI when to act in a certain manner. Alternatively an AI could contain a condition to check its avatars health throughout a game, then further commands can be set so that it reacts a specific way at a certain percentage of health. The more creative the conditions are, the more interesting AI behaviors can be achieved. However, the conditions and actions making up behaviors like these are usually optimized to make the AI seem more human. Even so, that is a very difficult task and there is still room for improvement in this area. Unlike a human player, the AI must be programmed for all the possible scenarios, which severely compromises its ability to surprise the player unless optimized at doing so, perhaps via the help of other aspects of the game [24]. 2.2 Behavior Trees Hierarchical, state-based techniques are simple and intuitive so they can provide good solutions. Nevertheless, when they increase in size they become too complicated and editing them can be risky as simple reconfigurations could make the whole AI system break down. Furthermore, those methods lack of flexibility, meaning that changes in design could require extensive programming work. Behavior trees (BTs) can help to avoid these problems, providing a means to describe sophisticated behaviors through a simple hierarchical decomposition using basic building blocks Overview A BT is a mathematical model of plan execution used in various fields of computer science and video games, to generate game AI. This model of creating behavior is a type of finite state machine, with the purpose of switching between a given set of tasks in a modular fashion. The most beneficial aspect of BTs is their ability

16 6 Chapter 2. Background to create very complex tasks out of simple tasks, regardless of how the simple tasks are implemented. BTs also share features with hierarchical state machines, with the main building block of a behavior being a task rather than a state. A high level of readability, make BTs accessible to developers with varying levels of coding experience, and are less prone to errors, which has seen them embraced by the game developer community [1][2]. Behavior trees provide a hierarchical way of organizing behaviors in a descending order of complexity. They are made of nodes, with the outgoing node of a connected pair being the parent, and the incoming node being the child. The child-less nodes are called leaves, and the unique parent-less node is the Root. Each node in a BT, with the exception of the Root, is one of several possible types: Composite node (Selector, Sequence, Parallel and Decorator), Condition node or Action node. There is no limit of how many children a node can have. The execution of a BT always begins from the root, which periodically sends ticks to its child, with a certain frequency. A tick is a signal that enables the execution of a child. When the execution of a node in the BT is allowed, it returns to the parent with a status of running, if its execution has not finished yet, success, if it has achieved its goal, or failure in any other case [20] [3]. Condition Nodes A condition node checks whether a certain condition has been met or not. In order to accomplish this, the node must have a target variable (e.g Does the player has ammunition? ) and a criterion to base the decision (e.g Player s Ammunition enough for shooting ). These nodes return SUCCESS if the condition has been met and FAILURE otherwise. Conditions do not return RUNNING, nor change values of system. Action Nodes Action nodes perform computations to change the system state. This can be, for example, shooting an enemy with a specific weapon. If the action node returns SUCCESS, then the player will shoot. It will return FAILURE if, for any reason, it could not be finished, like if the player has no ammunition or returns RUNNING while executing the computation. Composite Nodes In order to remain relatively simple to work with, while maintaining versatility, composite nodes are often employed. These flow-control nodes define the way in which the tree will be computed. The execution order will change according to the type and attribute of the composite node, but also according to the values of

17 2.2. Behavior Trees 7 their children. The two simplest composite nodes are the Selector node and the Sequence node. Sequence Node Sequence nodes will test their child nodes in a defined order - executing them sequentially from left to right. Sequence nodes will return SUCCESS if and only if all of its children return SUCCESS, and FAILURE if at least one child node returns FAILURE. In programmatic terms, a Sequence works identically to the logical AND function. The figure 2.1 below depicts a simple example of a Sequence sub-tree. Here, the Sequence node has two children, one condition ("Needs Reload?") and one action ("Reload"). In this sub-tree, the unit checks available ammunition and if that returns SUCCESS, the agent performs the "Reload" action, thus returning SUCCESS and so the whole Sequence returning SUCCESS. Figure 2.1: Sequence checking for Ammunition and if so, the agent Reloads and the Sequence returns Success. Selector Node A Selector node is an operational opposite of a Sequence node. Execution order remains unchanged, but Selector nodes will return SUCCESS immediately when one of its children returns SUCCESS. Selector nodes will return FAILURE only when all of its children return FAILURE. Analogously, a Selector is the behavior tree counterpart to the logical OR function. In figure 2.2, an example Selector is depicted. Here, the Selector node will return SUC- CESS when one of its children returns SUCCESS. If the agent fails at moving to cover, then the Selector will try to execute the next action and the agent will attempt to use the ability "Hunker Down".

18 8 Chapter 2. Background Figure 2.2: The Selector will return Success when either of depicted Actions return Success. Decorator Nodes The purpose of decorator nodes is to add functionality to any modular behavior, without necessarily knowing what it does. In a sense, it takes the original behavior and adds new features. This increases the readability and expressiveness of behavior trees. An example of a Decorator node could be to invert the result state of its child, similar to the NOT operator. There is no default algorithm for decorators, it depends on their purpose. Parallel Nodes The parallel node ticks all children at the same time, allowing them to work in parallel, a way to use concurrency in behavior trees. Parallel nodes return SUCCESS if the number of succeeding children is larger than a local constant S (this constant may be different for each parallel node), return FAILURE if the number of failing children is larger than a local constant F, or return RUNNING otherwise Uses in game industry and research Behavior Trees were first introduced in the game industry around 2004, most notably for Halo 2 by Damian Isla [10] and Façade by Andrew Stern and Michael Mateas [16] - both building upon prior work in the field of robotics and intelligent virtual agents. In later years, BTs became popular in the gaming industry as they could easily be used to implement game AI of different levels of complexity. Thus, they were used in AAA games such as the Halo game series 5, Spore 6, Black and White 7. Killing All Humans 2 8 is an open world game where players are free to roam around and experience the game as they see fit. This aspect makes the production of an AI even more challenging [12]. Although this non-linear gameplay has 5 Bungie Studios, Maxis Studios, Lionhead Studios Pandemic Studios, 2001

19 2.2. Behavior Trees 9 been shown to immerse players, it also makes it difficult for the developers to control, limit, and pre-script scenarios which players encounter. The way Pandemic Studios developers tackled this issue was by modifying the classic behavior tree formalism of a hierarchical finite state machine (HFSM) to a more modular, "puzzle piece" system that was more flexible and easy to use. In their implementation, everything that characters can do in the game is constructed by putting together states. A state can eventually have more sub-states, that will get activated if the parent state is activated as well. Each sub-state is a smaller part of the parent state responsible for a more specific job. By using this division, they were able to reuse, override or delete sub-states making the behaviors in the game more dynamic. In Crysis 9, a First Person Shooter by Crytek, the developers expanded the use of behavior trees by implementing a system for coordinated tactic actions among the NPCs [23]. In their implementation, they created a two-phased process. In the first phase, ideal candidates for the coordinated tactics are marked, but the action does not begin. The second phase starts the coordinated action when the minimum number of candidate NPCs have been marked ideal. This implementation eliminates the chances of high priority actions being overridden, but also allows coordinated actions between NPCs, making the gameplay more interesting and sophisticated. Driver: San Francisco, part of the Driver franchise 10, is a mission based actionadventure game. The game offers a variety of missions that required creating AI drivers with different personalities and goals, such as reckless racers, cops or getaway drivers. Each driver had a specific goal which was in charge of generating and updating paths that controlled their cars. These goals were built using an extension of traditional behavior trees, called Hinted-execution Behavior Trees (HeBT) [19], which allow to dynamically modify AI behaviors. HeBT give developers an extra layer of control over their trees and allow them to create and test new features in a plug-in fashion. Agents were able to take hints, about their behaviors. Hints were information given that suggested changes in how the agents should react. In the BTs that were created for each agent, a priority of actions was given to each Selector. This priority was able to change depending on the given hint without having to redesign the whole branch[19]. Going one step further, the developers added a new type of Conditions called Hint Conditions in order to improve the way Sequences worked with the new type of Selectors. The way Hint Conditions worked was by being able to bypass certain conditions depending on the hint given, resulting in the preferable behavior. With the method described, the developers were able to tweak and modify traditional BTs with few actual changes. Stephan Delmer describes in his research [25] that BTs can be really helpful in the development of game AI for different game genres. More specifically, he 9 Crytek, Ubisoft, 2011

20 10 Chapter 2. Background points out the requirement of a highly sophisticated AI when it comes to Real Time Strategy games. In this game genre, the AI should be able to both micro and macro manage operations in real time. Delmer mentions that human players solve this problem by putting their decisions in a hierarchy and continues by suggesting a new method of BTs that imitate the human s hierarchical approach, the Hierarchical Behavior Tree System or HBT. The AI must command all the aspects of the game, and although a single tree could achieve this, it s construction would be overly complicated and difficult to maintain. The proposed method tackles this problem by splitting the decision making process into sub-trees for each aspect of the AI s behavior. 2.3 Evolutionary Algorithms Evolutionary Algorithms (EA) are inspired by Darwinian principles of evolution and natural selection [9]. Over the course of many generations, species change to better suit their environmental needs. This is driven through the process of nonrandom natural selection, in which individuals of a species, who are better suited to an environment - due to traits inherited from their parents or random mutation - are more likely to survive and produce offspring than an individual which is not [8]. EAs employ a simplified model of biological evolution in order to solve problems. To solve a target problem with an EA, one must create an environment in which a potential solution (individual) can evolve. This environment should be shaped by attributes which help define the problem, and encourage the evolution of good solutions [11] Genetic Algorithms Genetic Algorithms (GA) are a class of EAs, and are implemented as a computer simulation of the evolutionary process [9]. The most common of these, the Generational Genetic Algorithm has shown to generate useful solutions to optimization and search problems [13], without having to enumerate over every possible candidate within the solution space. Methodology Once an initial population of individuals is generated, the Generational Genetic Algorithm follows an iterative process, in which each individual within a generation is rated by a fitness function, with respect to the problem being solved [8]. Once an entire population has been evaluated, a new generation can be created. During procreation, parent candidate solutions are obtained by applying a chosen selection strategy to the rated generation. This strategy discriminates in relation

21 2.3. Evolutionary Algorithms 11 to candidates fitness rating. Offspring candidate solutions are generated using genetic operators which recombine and mutate the parent candidate solutions, and form a new generation of solutions. Generational Genetic Algorithm Outline 1. Initialization Generate a population of individuals. combination of both. These can be random, created or a 2. Evaluation Evaluate each individual using a fitness function. 3. Selection Parent candidates for next generation chosen according to some selection strategy. Favoring those with higher fitness rating. 4. Evolution Generate a new population using genetic operators such as crossover and mutation on pairs of selected individuals to produce offspring. 5. Iteration Perform steps 2-4 until a solution is found that meets a termination condition. Chromosome In a GA, chromosomes are a set of parameters which define candidate solutions to a target problem. The standard representation of a chromosome is an array of bits which represent information that potentially provide partial solutions to a problem. However, other data structures can be used as well. Chromosome design will be introduced here in the form of a string of characters, as it closely follows the methodology employed in this report. For example, imagine that the solution to a given problem could be represented by the string Am I a problem?, with each character representing a part of the solution to a larger problem. In this case, a chromosome could be represented by a string of a fixed length, with each element being able to take on some range of character values (eg. alphabet/ascii). What these character values represent within the context of the problem is not relevant to this discussion. The figure 2.3 below shows a visual representation of each element of this chromosome.

22 12 Chapter 2. Background Figure 2.3: Visual representation of a string encoded chromosome, holding the solution variables. Population A population is a set of individuals containing potential solutions to a target problem. Each generation of the algorithm will produce a new population of these individuals. Often the initial generation will be comprised of randomly generated candidate solutions. An example of what a random individual might look like within the introduced context can be seen in the figure 2.4 below. Figure 2.4: Visual representation of a string encoded chromosome, holding random variables. Fitness Function The design of a GA s fitness function is critical in arriving at a solution to a problem, as it determines how optimal a candidate solution (chromosome) is at solving the problem. This is quantified by a numerical measure, and provides the fitness value by which candidate solutions are evaluated. The form of a fitness function is dependent upon the nature of the target problem. Following on from the above example of matching string values, if a character within a candidate solution is an exact match for the character which has the same index within the target string, the candidate solution could gain a fitness value of 1, and 0 if not. Meaning that the maximum fitness value would be the length of the string itself. Alternatively, each character could be compared for every index of the candidate and target strings. But rather than checking for an exact match, it could compare the distance of their Ascii values. For example, the first character in the target string is I, which is represented by the Ascii value 73. If a candidate solution were to have a first character of P which has an Ascii value of 80, then a quantifiable distance measure is available. This distance could then be a negative representation of an individual s fitness, with the ultimate solutions fitness value being 0.

23 2.3. Evolutionary Algorithms 13 Selection Strategy Selection strategies decide which individuals will be used to form the chromosomes of a new generation. Individuals are either copied entirely to a new generation, or paired with others to become parents to offspring candidate solutions, which will populate an evolved generation. All selection strategies favor the fitness value of individuals when choosing a parent. How a selection strategy utilizes the fitness value to select parents creates selection pressure. The GA process of iteratively selecting parent candidates, affords individuals the opportunity to be parents and reproduce multiple times, the likelihood of which can increase with fitness value. Generally, when selection pressure is high, the fittest individuals are being selected more often, breeding out those who are unfit. Roulette Wheel Selection is a fitness-proportionate selection technique where selection occurs based on the ratio of an individual s fitness, in relation to the fitness of all others within a population. The probability of selecting a candidate c can be summarized by the following equation 2.1. Psel(c) = Fitness(c) n i Fitness(c i ) (2.1) Unlike a real roulette wheel, where each section generally has the same sized slice of a whole wheel, and thus the same probability of being chosen, this fitnessproportionate method figuratively allows individuals to take up as much of the wheel as the ratio of their fitness to others dictates, making it possible and indeed probable that one or more individuals will be selected to be parents more than once. This is desirable, as in nature, fitter individuals might be expected to breed more than those who are less fit. However, it can lead to premature convergence on less optimal solutions than other selection strategies. Rank Selection is similar to Roulette Selection, except that the selection probability is based on fitness ranks rather than the fitness value. Individuals are ranked based on their absolute fitness values, and the probability of selection is based on this rank. This means that it makes no difference to the selection preference if the highest ranked candidate is 100% or 0.01% fitter than the next ranked, the selection probabilities will be the same. This strategy tends to avoid premature convergence, as it applies less selection pressure for large fitness differentials during early generations. However, as an optimal solution is approached in later generations, this method begins to apply higher selection pressure by amplifying small fitness differences. Tournament Selection is almost the default selection strategy for GAs, as it works

24 14 Chapter 2. Background well for a wide range of problems. Each time a parent candidate is to be selected, a random sample of individuals from a population are ranked by their fitness and the fittest of those are selected. The size of the sample taken from a population determines the selection pressure of this strategy, with more samples taken the higher the chance of selecting fitter individuals. When the sample size is chosen to be 2 individuals, the selection pressure is often applied by a fixed probability. During selection, a random number between the range of 0 and 1 is generated, and if it is lower than the fixed probability - which usually greater than 0.5 to favor fitter individuals - then the fitter individual is selected. Both variants provide a simple way to control selection pressure. Elitism is effectively a form of truncation selection, where a defined amount of the fittest individuals are copied directly to the next generation. Due to the way that offspring candidates are generated from the parent candidates, sometimes ideal candidates solutions can be lost, and the offspring can be weaker than the parents. GAs will often rediscover these candidate solutions later on, but it is not guaranteed. Elitism is designed to combat this and can have a large impact on performance by ensuring that the algorithm does not waste time re-discovering partial solutions that were previously lost. Individuals which are preserved through generations via elitism are still eligible for the selection as parents when breeding the rest of the offspring candidates of a new generation. Genetic Operators Genetic operators are used to produce the offspring candidates which form the next generation. It is a process that maintains genetic diversity, which is necessary for successful evolution. Crossover This genetic operator is what distinguishes GAs from many other evolutionary algorithms [15], due to selected candidates not always being copied to the next generation. Crossover takes two selected candidates and recombines them into offspring candidates. How this mixing and matching occurs depends on the format of a chromosome, but there are some commonly used methods. Single-Point Crossover A single-point crossover method generates a point within the the chromosome of parent candidates, and swaps T number of values before that point. Continuing the string example, for a string of length L, a number would be generated between 0 and L-1, and this crossover point would represent a character s index within the string. All characters stored in these indices

2.3. Evolutionary Algorithms 15 below the crossover point would be swapped between the parent candidates to form two offspring. See figure 2.5. Figure 2.

25 2.3. Evolutionary Algorithms 15 below the crossover point would be swapped between the parent candidates to form two offspring. See figure 2.5. Figure 2.5: Single-point crossover performed an a string. If the number generated is exactly 0 or L-1 then no crossover would occur, it s possible to define custom ranges for crossover point generation, such that crossover occurs more centrally within the data structure - should this be desirable. A problem with single-point crossover is that it can inhibit evolution when there is linkage between elements of a chromosome. If neighbor elements rely on each other more than others to form partial solutions, then the crossover point generated can favor breaking up certain sections of the chromosome [15] due to the length of that section. Two-Point Crossover Two-point crossover swaps all elements of the parent candidates between the two points generated, in order to generate offspring. See figure 2.6. This method helps reduce the bias created by chromosome linkage, however some longer sections of linked elements are still more likely to be broken up than others. The number of crossover points here can be increased in multipoint crossover methods, where each second section formed between the crossover points will be swapped.

26 16 Chapter 2. Background Figure 2.6: Two-point crossover performed an a string. Uniform Crossover In order to treat each element fairly with respect to linkage, crossover can occur on each element of a chromosome independently. This is known as uniform crossover, where a coin toss with a fixed probability, determines if crossover should occur at each element of the parent candidates chromosome as seen in the figure below 2.7. Figure 2.7: Visualization of Uniform Crossover - The H characters represent the positive result of a coin toss. Mutation Mutation is a tool to prevent premature convergence and to help maintain diversity within a population. In the early generations of a GA, a lot of information is discarded because it performs badly within the context of that particular candidate solution. As the algorithm begins to produce fitter candidates, these pieces of discarded information, could now potentially offer new combinations that might be desirable. The mutation operation iterates through every individual within a newly created generation, and at each element of its chromosome a coin toss with a certain

27 2.3. Evolutionary Algorithms 17 probability is conducted, to determine if a mutation occurs. An example can be seen in the figure 2.8 below. Figure 2.8: Visualization of Input (top) and Output (bottom) string from a genetic mutation operator.the character H represents a successful coin toss Genetic Programming Genetic Programming is, in essence, - merely an adaptation of Genetic Algorithm, with further changes to accommodate handling of a different data structure. The main difference between Genetic Programming and Genetic Algorithms is the representation of the solution. Genetic Programming creates computer programs in the lisp or scheme computer languages as the solution. Genetic Algorithms create a string of numbers that represent the solution [11]. Genetic Programming, uses four steps to solve problems: Uses in game industry and research This section will discuss how genetic algorithms have been employed to generate solutions to game related AI problems. Bullen et al. [5] pointed at the ever increasing complexity of the game industry requiring more sophisticated AIs in games. Specifically they focus on the development of Non-Player Characters(NPCs). Achieving high quality AI NPCs is an arduous task involving many parameters that have to be defined and tuned according to the nature of the game in order to achieve the desired behaviors. Bullen et al created a prototype system that evolved NPCs using Genetic Algorithms. In their experiment, they used the Unreal Tournament 2004 game as a research platform and put two groups of NPCs to compete each other, one group was evolving over time using GA and the other one was a fixed control group. Through their experiment, the evolved NPCs kept improving their performance successfully proving that Genetic Algorithms could be used to evolve NPCs that could compete in a commercial game. In similar manner, Cole et al [7] used GAs to tune the bot behavior in Counter Strike 11, a popular first person shooter game. In Counter-Strike, the behavior of the NPCs is based on hard coded parameters. By increasing the amount of these parameters, the NPCs behaviors become more sophisticated and realistic. 11 Valve Corporation, 2000

28 18 Chapter 2. Background But tuning these parameters is an arduous, time consuming and costly task. To solve this problem, a GA was implemented to fine tune variables dealing with NPC weapon selection and aggressiveness. In their experiment, they compared the evolved NPCs with NPCs that had their parameters manually tuned. Their results showed that the evolved NPCs performed as well as those with manually tuned parameters. Genetic Algorithms have also been used in Turn-Based Games. Byrne et al. [6] researched the use of genetic algorithms as a method for creating game AI in a fighting turn-based game called Toribash 12. They pinpointed a problem with developing game AI is that usually it is quite restricted and prone to errors when the game environment changes. Thus, they suggested that the use of GA can solve this problem by being able to adjust the AI behaviors depending on the changes on the game environment. Their goal was to develop a GA capable of producing realistic AI behaviors. The use of GAs could potentially increase the replayability of the game and reduce human intervention and development time. In Toribash, one controls more than 20 joints of a rag doll giving them one out of four possible behaviors (extend, contract, relax or hold). There are more than 4 trillion possible move combinations, making GA ideal to search and find solutions. Byrne et al results shows that GAs are capable of evolving moves for Toribash successfully, but also that using GAs is ideal for developing game AI. 2.4 Evolving Behavior Trees As discussed above, behavior trees are a tree-based structure with conditions, actions and composite nodes. Their formalism implies that behavior trees could be evolved successfully using Genetic Algorithm techniques. Perez et al [22] investigated the use of Genetic Programming as a means for developing game AI in dynamic game environments. More specifically, they applied Grammatical Evolution (GE) [21], a grammar-based form of Genetic Programming, to evolve controllers for the Mario AI Benchmark 13 based on a Behavior Tree representation. GE works by using GAs to evolve integer strings, which are then used to create a syntax of possible solutions to the problem in hand. Afterwards, these solutions are evaluated and their fitness is fed back into the system. Perez s implementation came fourth in the Mario AI Championship 14, strengthening the idea that Genetic Programming systems can be used successfully to evolve game AI, but also that Genetic Algorithms can be combined with Behavior Trees and other AI methods to produce novel solutions. 12 Nabi Studios, Mario AI Benchmark, 14 Mario AI Benchmark,

29 2.5. Platform of Application 19 The work of Lim et al. [14] specifically deals with evolving behavior tree structures. It used Genetic Programming (GP) to evolve AI controllers for the DEF- CON 15 game. It starts with a set of hand-crafted trees, encoding feasible behaviors for each of the game s parts, and separate GP runs are then used for each part, creating new behaviors from the original set. The final combined tree, after evolution, was pitted against the standard AI controller that comes with the game, and achieved a success rate superior to 50%. This hints at the possibility that such an approach is indeed feasible in the development of automated players for commercial games and put it on the map of viable methods for developing game AI. In his thesis, Oakes [18] claims that interest in AI from the game industry has consistently increased in recent years and mentions that there is a need for more complex and sophisticated game AI, as players expectations are also on the rise. Therefore, he highlights the importance of AI development as a research topic. His research focused on applying GAs to evolve strategies for a turn-based strategy game using Behavior Trees, that he then tested against each other as well as against the default AI of the game. To evaluate these strategies, he used Battle for Wesnoth 16, an open-source game. Oakes results show that the evolved strategies can successfully compete with each other, but also win against the default game AI. 2.5 Platform of Application Turn Based Tactic Games Certain game genres require AI s which are able to produce complex, seemingly intelligent behaviors (strategy, tactics, shooters), often requiring the AI implementation to be performed via some behavioral model (e.g. Behavior Trees). One of these genres is turn-based tactics, or TBT. The gameplay of turn-based tactics games can be broken down to 2 major components - a turn-based timekeeping mechanic and tactical combat scenarios. TBT games lean towards employing military tactics and focus on their intricate and planned-out execution. The genre is inspired by tactical and miniature war-gaming, and due to the static nature of turn-based gameplay, dice or random number generators are often used to emulate variables that can be perceived as based on chance. A few examples are attributes such as unit attack hit chance or attack critical hit chance. The specific mechanics and encounter depth of a given TBT can vary greatly, however the gameplay generally centers around two opponents (player or AI) controlling a team of units each, with the winning condition being to eliminate the 15 Introversion Software, David White, 2003

30 20 Chapter 2. Background opposing team. Each opponent takes turns to issue instructions to each of their units, to move or use abilities, and once every controlled unit has no more actions available for the active turn, control is passed to the opponent XCOM 2 The XCOM games series is a science fiction video game franchise that began with the turn-based tactics/strategy video game UFO: Enemy Unknown created by Mythos Games and MicroProse in In 2012, the series was rebooted under the title X-Com: Enemy Unknown, belonging to the same TBT/Strategy game genre, published by 2K Games 17, developed by Firaxis Games 18, with the expansion entitled Enemy Within, being released in XCOM 2 was released in the direct sequel to Enemy Unknown/Within - and bundled with the game was the Source Development Kit (SDK) that Firaxis used for making it. The campaign mode sees players in command of a mobile military base, fighting against alien overlords controlled by an AI. The base forms the ground of the games strategy layer, which essentially houses the game s progression systems. Progression manifests itself by allowing players to conduct research into upgrades for items and abilities, so that the units commanded by the player in the tactical layer of the game, become more powerful or produce alternative gameplay. While the strategic base management layer is an important aspect of XCOM 2, the true core of the game lies in the gameplay provided by the tactical missions. This involves the player commanding a squad of units and leading them into combat against alien units. Generally, the players squad of units will have to fight several pods (the alien equivalent of a squad) in order to complete a mission. The environment of the tactical layer is traversed using a tile-based grid layout representing fixed positions in the game world that units can be moved to. Combat scenarios begin when a unit under control of the player enters the sight range of an alien unit, activating the entire pod, who then assume a defensive position. The combat scenario will continue with the player and AI taking turns to issue orders to controlled units, until either has no remaining units (the size of a player s squad is variable and tied to strategic layer of the game). Each unit within a squad gains 2 actions points (AP) when the players turn begins, and the turn ends when all available action points have been exhausted or the player manually ends the turn. The APs are used to issue orders to units, and this is where tactics are employed in the game. Players must move units to tile positions on the map that give them some kind of advantage, so that they can use the units abilities to deal damage, 17 2K Games, 18 Firaxis Games,

31 2.5. Platform of Application 21 heal allies, etc.the placement of units is important because of the way the game decides if an ability selected by the player is successfully executed, as many abilities in the game are based on some element of chance. For example, when a player wants a controlled unit to shoot at an alien unit, the chance for the player unit to hit is based off a set of calculations, including the units base hit chance, ideal range, and whether or not the target is in cover - thus obstructing the view of the shot. However, this environmental cover is dependent on the tile-based levels, movement being done by point and click interaction with the tiles. From the AI s perspective, all tiles are evaluated based on an internal weighing system with regard to the purpose of the movement. Since a player can devise tactics that rely on multiple units working together, the AI features 2 systems to attempt to compensate for this advantage, thus retaining the illusion of intelligence and presenting a more appropriate challenge to the player. One of them is the leader-follower system, which allows the multiple pods of enemies present in a level to organize themselves and better tackle the player s advance, while the other is the AI s "hunting" ability, or tracking alerts triggered by the player or narrative level events, which in combination with the first system confers the AI opponent awareness of the level without using cheats. Because of all these different layers of strategic and tactical considerations, the developers decided to employ behavior trees in the implementation of XCOM s AI system. As such, several types of trees were crafted manually, providing different, but complementary behaviors for the various enemy units present in the game. Furthermore, the AI system takes advantage of the implementation of level encounters, featuring multiple groups of enemies on the same map, that present awareness to potential alerts triggered by the player or map narrative events. This allows the otherwise highly individualized behaviors to group together and present emergent team behavior.

33 Chapter 3 Project Statement Recent academic contributions to the field of game AI show genetic algorithms to be capable of producing useful AI s for games which use a BT interface to design their behaviors. Genetic algorithms are expensive search heuristics, and studies generally present methodologies which evolve AI solutions before they can be tested against a player, due to the volume of simulations required. It is proposed that a methodology can be developed to evolve candidate AIs that sufficiently challenge a player, and that they can be produced within relatively few evaluations. Based on the work done previously with genetic algorithms and BT driven game AI, the methodology will be developed by simulating game sessions between candidate AIs and a default AI of the chosen test environment. It is expected that the methodology should be optimized sufficiently to produce solutions within a normal difficulty playthrough of a game, such that it could be employed to learn from human players, as they engage with every candidate AI of every generation, each with their own distinct behavior - creating a highly dynamic and potentially engaging experience. The development platform for the project is the turn-based tactics game XCOM 2. This genre of games feature some of the most complex gameplay rules, emphasizing the necessity for tactical and strategic player thinking. The gameplay provides a worthy test for an AI s ability to defeat human opponents, in a scenario where the latter have ample time to consider their options before taking action. As one of the most recent releases in the genre, XCOM 2 matches all the required criteria of this project, as well as being of AAA production value and providing a free SDK, holding itself as an appropriate choice of development platform. 23

34 24 Chapter 3. Project Statement Considering all the above, the project s problem statement is thus: Can a system be developed to generate game AI behaviors for a TBT game, which are capable of challenging human players? How can this process be optimized to the extent that BTs can be produced alongside a player s progression?

35 Chapter 4 Design and Implementation The evolution of AI s for XCOM 2 using a genetic algorithm required several elements to be designed and implemented in order to create an environment capable of producing solutions. The results of that process provide the basic structure of the proposed methodology created to develop the system used for this project. 4.1 Mod Implementation Game Systems To provide an environment to evolve AIs, several changes need to be implemented into XCOM 2. These changes were made possible by creating a mod for the game, using the SDK provided. AI vs AI By default, XCOM 2 does not provide functionality which allows the AI to play against itself. However, the source code accessed through SDK is extensive, and almost all mechanics can be altered or recreated. There was no simple fix to this problem, however, a workaround was created based on the implementation of the Panic effect. When a player s controlled units are in combat, certain events can cause this effect to trigger, removing control of the unit from the player for 1-3 turns. During this time, control over the unit is given to a specific BT contained in the AI configuration files. Therefore, an ability was created to run a modified version of the panic effect, such that units normally under the control of a player, instead run a specified behavior inside the AI configuration files, created by the evolution strategies employed. 25

36 26 Chapter 4. Design and Implementation Manual simulation The configuration file which contains the definition of behaviors available to the AI (and indeed all configuration files), are only read when the game is launched. Hence, any modifications made after launch would require a game restart to take effect. This meant that the AI configuration files needed to be manually altered to contain the current generated AI to be evaluated, and thus the entire process of simulating matches could not be fully automated. Data Export To effectively evaluate the generated AIs, combat data was needed. XCOM 2 shipped with an analytics system which is used to display various fun statistics to a player at the end of a tactical mission. The implementation of this interface element is contained within the MissionSummary class, which in order to output information used to evaluate the generated AIs, was overridden within the implementation of the mod Normalization XCOM 2 is a complex game, with numerous types of units, abilities, missions types, and a pseudo-random map generation system. It has various systems working together to allow its current set of AIs to produce a desired challenge to players. It was necessary to restrict a large portion of the game content to provide a controllable, stable and fair environment within which to evolve AI s. Units The generated AIs and their opposition will control a team of 6 units each, and these units will be identical with the exception of the definition of the BT driving their behavior. Setup of the units and their associated armor, weapons and statistics was achieved by editing the relevant configuration files. For example, the XComGameData_Char acterstats.ini, provides access to units health and combat statistics, such as critical strike chance. There are many unit variables that can be assigned values, but most of them are not relevant to this discussion. The main item of interest for this report is that the health of each of the units was initially set to 4. Abilities The abilities available to both sets of units are also identical, with the ones selected being the basic building blocks or the game, and feature within most of the fallback AI behaviors set up within the AI configuration file. It was felt that these should be

37 4.1. Mod Implementation 27 sufficient to provide a level of complexity that would support the proof of concept and define manageable solution space sizes. Attack/Shoot Attack/Shoot is a generic ability available to all units active in the game world. Its action is tied to the weapon equipped on a unit, as it defines the range of the attack, the damage it deals, etc. All units are equipped with a standard first tier assault rifle, resulting in them doing 3 damage with a successful shot and 5 damage with a critical strike. Whether or not a shot hits or lands with a critical strike is not only determined by the statistics of the unit using the ability, but also its target and their relative positions in the environment. For example, a target unit could have an ability which increases their defensive abilities, or it might be located on a tile which has environmental cover between itself and the attacking unit - reducing hit chance greatly. In fact, if there is no cover between two units, they are considered to be flanking each other, and shots taken against flanked units have a greatly increased critical strike chance (50%). A unit can only shoot once per turn, however it can do so after using a first action point for something else, as attacking requires a minimum of 1 AP and consumes any remaining APs. Move Move is another generic ability available to all units, as it facilitates a units traversal of the game world. The tactical mission game environments are comprised of an array of tiles arranged in a grid formation and as such, a pathfinding algorithm is employed to find routes to all target tiles. The UI available to players (figure 4.1) gives a good overview of how movement is handled. The current active unit, pictured bottom-left, is considering a move to a tile location which provides environmental cover from two directions (indicated by the blue shield overlay).

28 Chapter 4. Design and Implementation Figure 4.1: XCOM 2 unit movement UI. A unit can only move through a defined number of tiles per action point. The blue line (see figure 4.

38 28 Chapter 4. Design and Implementation Figure 4.1: XCOM 2 unit movement UI. A unit can only move through a defined number of tiles per action point. The blue line (see figure 4.1) surrounding the active unit shows all tiles it can move to using a single AP. The outer yellow line represents the tiles it can move to at the cost of consuming both APs available for that turn. The variable which defines the mobility of a unit was set to 12. When the XCOM 2 AI wants to move a unit, it has to evaluate each tile it can reach and assign them a value. This calculation is dependent on information such as the cover provided by the tile, the distance to the tile, how many enemies are visible from the tile, if moving to that location affords a flanking position, and various other considerations. The AI configuration file contains a set of profiles, which have weight values intended to represent different tactical movements. For example, an aggressive movement strategy might care less about the cover value of a destination tile, than a defensive one. The strategies implemented are: Defensive, Standard, Aggressive, Fanatic, Hunting, Advance Cover, and Flanking. These will form the basis of the movement options available to the evolution environment. Overwatch Overwatch is an ability which, like shooting, consumes all remaining APs when used but only requires 1 AP to activate. It is designed to allow a unit to watch an area of the map (within the units line of sight) during the opponent s subsequent turn, such that if the opponent moves a unit through this area, the overwatching unit will take a reaction shot at the moving target. Reaction shots taken with overwatch suffer a hit chance reduction and cannot deal critical damage. If units using overwatch take damage, the effect is removed.

39 4.1. Mod Implementation 29 Hunker Down Hunker Down is a defensive ability that consumes both APs when used and only requires 1 AP for activation. With this ability, a unit will gain a large boost to their defensive statistics at the cost of not being able to use offensive abilities. To be able to hunker down, a unit must be on a tile which has environmental cover in at least 1 direction. Map Implementation XCOM 2 features a procedural content generation (PCG) system to generate maps in its tactical missions. The implementation of this PCG system is quite deep and difficult to restrict, as it was designed to create maximum variability and reusability. This means most of the assets are compatible with each other and can be combined to create a large number of playable levels. Each map is considered a plot and it holds the data for any enemy encounters, victory conditions, any level narrative elements, objectives, etc. Each plot has some predetermined positions that can be filled by parcels. These smaller level elements - parcels - can be any combination of even smaller map elements, but usually form some sort of blueprinted structure, ranging from things like a small park to a house or even large buildings. The fact that this process can not be altered or bypassed infers a lack of control over the test environment and might lead to noisy data. Due to the random nature of PCG, combined with how influential terrain is as a game mechanic (cover distribution, line of sight, height, etc.), it was decided to attempt to reduce the variability of the terrain, by setting up a single map, that was designed to be more consistent in it s content. This map was modified in the level editor provided with the XCOM 2 SDK, using assets that the PCG algorithm can select from, which are less likely to impact combat scenarios. For example, assets designed for missions taking place in the wilderness are less likely to have explosive assets, such as gas canisters which can damage XCOM 2 units. Any map in XCOM 2 must contain at least one parcel. The map implemented for the evolution environment contained a single small-sized parcel positioned in the centre of the map, with each team setup to spawn either side of it Environmental Cover Environmental cover has been formally mentioned at various points of the report so far, and it clearly plays an important role in the defensive tactics employed in the game. Environmental elements such as trees often permanently occupy a given tile of a map. If a unit located at a tile which is adjacent to a cover tile, the unit receives a cover value in the direction between the two tiles. A unit s defense value is greatly influenced by the cover mechanic within

40 30 Chapter 4. Design and Implementation XCOM 2. There are 3 types of cover values that a unit can have and they are represented by the numbers 0 (not in cover), 1 (in low cover) and 2 (in high cover). The values are used as modifiers in a units defense calculation, providing a 20% penalty to attack hit chance per unit of cover (20% in low cover and 40% in high cover), Default AI To evaluate the generated AIs, an opponent AI was required. This Default AI was created from the default behaviors constructed for the basic alien units. The default behaviors are not designed to be overly challenging to a human opponent, as the game uses other means to provide that. As such, the generated candidates are expected to perform better against the default AI than human opponents. The behaviors used to construct the default AI, and its definition are found in the AI configuration file, which contains all the BT nodes used to construct the game s AI. AI configuration file Following the BT formalism, the behavior nodes contained within the DefaultAI.ini configuration file, are employed with a modular approach. It contains a vast array of condition nodes, small action sequences and selectors, tactical weights, and various other items. Within the INI files, each item is defined as a Behavior and is given a name, for example Move_Defensive. This name is then able to be referenced by various sequences and selectors to execute the associated instructions. The names of the Behaviors is how the GA will generate AI solutions, using a data structure to represent selected Behaviors from the configuration file. Additional Behaviors can also be added and as such, the output of the GA has to consider the syntax of the INI file, after evaluating performance and creating the new AIs. 4.2 Genetic Algorithm Implementation The design of the environment in which a GA will evolve candidate AIs for XCOM 2 encompasses more than balancing the settings and variables which define a game state. The environment in which the evolution takes place must be considered as to encourage the generation of good solutions. This section will describe the process of how the AIs behavior tree structures are encoded into a chromosome and the implementation process of the GA itself.

41 4.2. Genetic Algorithm Implementation Chromosome Design As stated previously, the data structure of a chromosome for this project will be based on a fixed-length string of characters. The exact length and structure of the string is altered between experiments, however two important things are always required: the elements from which candidate solutions can be formed and the structure in which they will be placed. The elements available for the GA to generate a candidate will be represented by two distinct sets - Unit Conditions and Unit Decisions. Unit Conditions Unit Conditions are a set of variables which reference condition nodes contained within the XCOM 2 AI behavior configuration files. These enable an AI generated by the GA to consider information about a controlled units current active match scenario. The Unit Conditions available to the GA for each experiment will be represented by a range of consecutive capitalized characters, depending on how many are needed. Representative String Character "A" "B" "C" Example Condition Identifier UnitHasHighHealth UnitIsFlankingAnEnemy UnitHasAmmo Table 4.1: Example Unit Condition characters for the GA to choose from, and their identifiers. Unit Decisions Unit Decisions represent a set of XCOM 2 behaviors which end in a single action node, or one of several action nodes. They are designed to be the decision that a generated AI makes, after considering a number of Unit Conditions. They often contain more than one action node within a selector, to prevent a unit from exiting a behavior without performing an action. For example, if an action node that gives the instruction to shoot is reached but the AI s currently controlled unit has no ammo, these behaviors could allow the character to move or reload in the case that shooting is not possible. In the SDK implementation, these are represented as selector nodes, and each denominator in the name of the decision is the counterpart to a basic unit action. These basic actions are considered as such because they can not be further decomposed and their implementation is done in the original source code.

32 Chapter 4. Design and Implementation Representative String Character "a" "b" "c" "d" Example Decision Identifier Shoot Move OverwatchOrShootOrReload MoveAggressiveOrFlanking Table 4.

42 32 Chapter 4. Design and Implementation Representative String Character "a" "b" "c" "d" Example Decision Identifier Shoot Move OverwatchOrShootOrReload MoveAggressiveOrFlanking Table 4.2: Example Unit Decision characters for the GA to choose from, and their identifiers. The representation of Unit Conditions and Unit Decisions (UC&Ds) as capitalized and lower case characters respectively, does limit the maximum potential size of each set. Within the stripped down state of the test environment, and the attempts to keep string complexity as low as possible, this limit was never reached. Given the evaluation of the chromosomes will require the observation of entire match simulation, the representation of partial solutions as recognizable characters enabled real-time informed analysis Example Chromosome Implementation The structure within which the sets of characters will be placed within the string representing the chromosome, is altered between experiments, however an example that works very similarly will be presented. A decision structure is required in order to define the form of the string. An example decision structure can be seen in 4.2, and this can be encoded in many ways using a character s index within the string to always represent the value a specific node in the decision structure. Unit Conditions and Unit Decisions are mutually exclusive for each index of the string. Figure 4.2: An example decision tree structure, with example string representations Each AI generated will need to contain behaviors for the 2 action points that an XCOM unit has available per turn. As such, each chromosome will have its length

43 4.2. Genetic Algorithm Implementation 33 doubled to encode two separate decision structures. Further considerations, such as restricting certain ordering situations or the duplication of characters will be addressed for each experiment. Figure 4.3: Complete example chromosome WatchMaker Framework The WatchMaker Framework for Evolutionary Computation 1, is an object oriented framework for implementing evolutionary/genetic algorithms in Java. This framework is a useful prototyping tool, as it enables users to freely design custom implementations for the individual elements of a GA, such as genetic operators. The central component to the Watchmaker Framework is its Generational Evolution Engine, for which a number of interfaces are implemented Generational Evolution Engine As the AIs will be encoded into strings, the Generational Evolution Engine interface was used with a string as it s defined type. The arguments to the method call (figure 4.4) are elements of a GA that will need to be either selected from pre-built implementations, or specifically designed and implemented for the context of the problem area. The framework provides common implementations found in Evolutionary Computation, as well as interfaces to quickly implement custom variants. The custom implementations developed for these experiments were iterated upon for each experiment, therefore an overview will be provided here, with specific information presented when required. 1 WatchMaker Framework,

44 34 Chapter 4. Design and Implementation Figure 4.4: Call to instantiate and Evolution Engine of type string, using the Generational Evolution Engine interface The arguments to the method call 4.4, are the references to the following classes: Candidate Factory A class which contains a method which returns random candidate solutions in the form of a string Evolutionary Operator Pipeline A class which pipelines a number of classes which perform evolutionary operations on candidate solutions. Crossover A class which recombines candidate solutions into offspring. Mutation A class which randomly mutates Fitness Evaluator A class which returns an integer value to the evolution engine, representing the fitness of a candidate solution. This part of the algorithm will remain consistent between experiments from an implementation perspective, as the fitness evaluation takes place within XCOM 2. Once all candidates of a population have been evaluated, their fitness values are stored. When the evolution engine needs the fitness for a candidate, this class returns its associated value.

45 4.2. Genetic Algorithm Implementation 35 Figure 4.5: Example fitness evaluator code Selection A class which allows the evolution engine to implement a chosen selection strategy. The GA used a pre-built implementation of Roulette-Wheel Selection for the studies conducted throughout this project, which was chosen due to its fitnessproportionate nature encouraging faster convergence. Experimental Procedure As discussed, the evaluation of a candidate solution within this experimental scenario is conducted outside of the GA framework. This necessitates a slightly convoluted experimental procedure: 1. Generate initial population of n candidate solutions. 2. Copy a single candidate (generated behavior tree) into the modified AI configuration file. 3. Launch X-Com 2, simulate a match of generated AI vs Default AI and record data. 4. Repeat steps 2 and 3, for each candidate within a given generation. 5. Compile results and make available to GA. 6. Evolve population to for a new generation of candidate solutions.

47 Chapter 5 Experiment And Results This chapter will demonstrate the results of the development of the system employed to optimize the process of evolving Behavior Trees via a Genetic Algorithm for the turn-based tactics game XCOM 2. The development process involved conducting a pilot test, three studies and a final user test. Differences in implementation for each individual study, and associated results will be presented and discussed chronologically. 5.1 Pilot Test The complete test environment required several modifications and custom implementations, therefore it was important to check that everything was working as expected. A pilot test was conducted to evaluate the entire experiment process, in order to analyze whether test environment is fair, and the various elements of the process are performing as expected. Particular consideration was given to the following: The Fitness evaluation method Unit Health and Damage Map and Mission setup Initial evaluation of the sets of Unit Conditions and Unit Decisions Implementation of the GA Design The test will attempt to evolve a simple chromosome design over 4 generations of 40 candidate solutions, following the procedure described in the genetic algorithm experimental design section. 37

48 38 Chapter 5. Experiment And Results Chromosome The chromosome used in the pilot test, was designed to be simpler than the example described previously. The primary reason for this was that the simpler representation made it possible to evaluate Unit Conditions and Decisions in real-time, to see if their implementations were evaluating and acting upon the game-state correctly according to their value and location within the chromosomes structure. The behavior for each action point an XCOM 2 unit has available per turn, was defined by six successive pairs of Unit Conditions and Unit Decisions. Each Unit Condition would be checked in sequence, and whenever one would return true its associated Unit Decision would be used to enable a Unit controlled by a candidate AI to perform a specific action with the game. The encoding shown in figure 5.1 is the representation of a single action point, the final chromosome will contain 2 of these representations concatenated sequentially for AP1 and AP2 respectively. Figure 5.1: Chromosome encoding for the first action point To remove the possibility of performing redundant Unit Condition checks, and reduce the size of the solution space, the formation of candidates was restricted to not allow a Unit Condition to appear twice within a single action point. If the example action point encoding shown in figure 5.1 was to have the character contained at index 0 ("A") placed at any of the other indices available to Unit Conditions, it would be redundant as it would either never be reached (the same Unit Condition at index 0 would call its associated Unit Action), or do nothing (return false). Selection of Unit Actions have no such restrictions, the same value could be inserted into each index of the chromosome associated with a Unit Action. Unit Conditions and Unit Decisions The set of Unit Conditions available to the GA for the pilot test are shown below (table 5.1). Details on how they were implemented in the XCOM 2 SDK can be found in Appendix B..

49 5.1. Pilot Test 39 Representative String Character "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" Behavior tree identifier HasHighHP HasWounds HasKillShot IsFlanked OneEnemyVisible MultipleEnemiesVisible OneOrMoreOverwatchingTeammates NoAllyIsHunkerDown AnyAllyIsHunkerDown NoOverwatchingTeammates AllShotPercentagesAtOrAbove50 AllShotPercentagesBelow50 Table 5.1: Set of Unit Conditions used in the pilot test. The Unit Decisions available to the GA shown below (table 5.2), their implementation within the AI Configuration files is available in Appendix C.. Representative String Character "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" Behavior tree identifier SelectMoveStandard SelectMove_Defensive SelectMove_Aggressive SelectMove_AdvanceCover SelectMove_Flanking SelectMove_Fanatic SelectMove_Hunter TryShootOrReload ConsiderHunkerDown TryOverwatchOrReload TryShootIfIdealOrReload TryShootIfFavorableOrReload TryShootIfFavorableOrOverwatch TryHunkerDownOrShootIfFavorable ConsiderHunkerDownOrOverwatch Table 5.2: Set of Unit Decisions used in pilot test.

40 Chapter 5. Experiment And Results Solution Space The size of the solution space is dependent upon a combinations of the structure of the chromosome, and the amount of Unit Conditions and Decisions.

50 40 Chapter 5. Experiment And Results Solution Space The size of the solution space is dependent upon a combinations of the structure of the chromosome, and the amount of Unit Conditions and Decisions. Size of set of Unit Conditions c = 12 Size of set of Unit Decisions d = 15 Amount of Unit Conditions per AP e = 6 Amount of Unit Decisions per AP f= 6 Total permutations per AP [c*(c-1)*(c-2)*(c-3)*(c-4)*(c-5)]*(d f )7.5*10 12 Size of solution space Total permutations per AP 2 = 5.7*10 25 Table 5.3: Pilot Test s Solution space size. Evolutionary Operators Crossover A single-point crossover method was employed for the pilot test. For a string of length S (candidate solution), an integer value is generated between 2 and S - 3, which provides the index at which two parent candidates will recombine. With the restriction of no repeated characters within an action point, this crossover method could potentially cause undesired solutions to be created (see figure 5.2). Figure 5.2: Example of crossover producing undesirable offspring To fix this, after crossover is performed, the offspring are analyzed, and if a duplicate is found (as can be seen with the character G at Index 6 of the offspring shown in figure 5.2), then the character found at the same index (index 6) of the parent candidate which did not provide that duplicate character (the character "E" at index 6 of parent 1 in figure 5.2), will be chosen. It is possible that this character could also have been undesirable, if the alternative character had also appeared earlier, in this case a random value would be generated until a desirable one is found.

51 5.1. Pilot Test 41 Mutation The mutation operator was setup such that it could iterate through the entire string and mutate any gene to a value from the set-type (Unit Conditions or Decisions) associated with the index being currently operated on. The probability that a mutation would occur for any given index was arbitrarily chosen to be 0.03 (3%). Again, undesirable candidates can be produced by the mutation operator, and the same correction procedure described for the crossover operator was applied. Selection Roulette selection was used to choose parents for each subsequent generation of candidates. To keep good candidates intact from generation to generation, elitism was employed to retain the 10% candidates with the highest fitness value. As each generation was comprised of 40 candidates, the amount of elite candidates per population was 4. Fitness Evaluation The aim for the project is to optimize the generation of successful candidate AIs, and thus perform well in their evaluation matches by meeting the winning conditions of the game. To this end, the fitness function (Equation 5.1) will return a value based on the health of the units which compete. If a generated AI wins a match, then all the enemy units have been killed, and fitness is represented as all of the health taken from the opposing team. However, this means that all winning AIs will have the same fitness value. Instead, the AI teams health is used to create a more representative fitness. The health remaining for all alive units is added to the fitness value, thus having both survival and offense as components of the computation. The "Team_DamageDone" and the "Team_DamageTaken" can be from 0 to 24. This process is also used for AI teams which lose their matches, which attain fitness for every point of damage done to the enemy team, providing a way to distinguish between the performance of unsuccessful candidates. Fitness = Team_DamageDone + Team_DamageTaken (5.1) According to these equations, a fitness value would represent victory with any score above 24, while a score of 48 would mean flawless victory and one of 0 would mean total defeat.

42 Chapter 5. Experiment And Results 5.1.

52 42 Chapter 5. Experiment And Results Analysis of Data Overview Despite the simplicity of the decision structure housed within this chromosome, the GA was able to continually produce higher fitness values and more winning AI s for each generation, culminating with a 50% win rate after 4 generations. The average fitness % is average fitness of all candidates in a generation, represented as a percentage of the total attainable fitness. While the win % is the number of wins represented as a percentage of the population size (see figure 5.3). Figure 5.3: Average fitness % and win % of candidates per generation of Pilot Test. Successful Candidates With the way fitness was evaluated, and with only the 4 most elite candidates being retained in a new generation, many candidates which won their matches would not be able play another match, but they still would have a high chance of breeding into the next generation. In fact, of the 36 candidates which actually managed to win a single game, only 10 ever made it through to play another match. Of the 10 candidates who played more than a single game, 5 of those played enough games to eventually lose a match, and those candidates each won 2 of the 3 games they played in total(see Table 5.4). Obviously these are small sample sizes,

53 5.1. Pilot Test 43 however there was an observed tendency for candidates to win a game due to a series of favorable dice rolls, and specific game situations. For example, as units only had 4 health they could be eliminated within a single action point when receiving a critical shot. Often favorable dice rolls would result in heavy implications to the overall match situation. Candidate Matches Played Matches Won Average Fitness KaIdFhLnBeAbJcKmFcCcHdDh HeBhCkEeFfKeGlEbIkKeJeDf efijkdjlmegihghidellehkai DeCbFcEdHmAkJcKmBmHfLdFa LbBkIfJjHlDeIkCiJmHeLdFa Table 5.4: Candidates that won multiple games over the course of the evolution. Chromosome Despite the simple decision structure of the chromosome and the sets of Unit Conditions and Decisions available to form a candidate solution, the solution space is still large. However, the candidate solutions generated showed improvement over the 4 generations. The subsequent studies in this paper will evaluate decision structures which are more complex, and as such consideration will be taken with regards to the size of sets of Unit conditions and Decisions. Unit Conditions The graph below (figure 5.4) displays the amount of times a particular Unit Condition appears within candidate solution which managed a victory in a least 1 match. Interestingly, the conditions relating to a unit s health (A and B), appear the least amount of times overall - contrary to expectation. It is presumed that these have a weaker presence in the successful candidates due to the fact that their values within a game state are highly variable, and within such a simple rigid decision structure, this is not desirable. This idea is supported by the fact that many of the Unit Conditions which had a strong presence in victorious candidate solutions, often would return true the majority of the time, leading to their associated Unit Decisions always being called.

54 44 Chapter 5. Experiment And Results Figure 5.4: Graph showing the number of Unit Conditions contained within candidate solutions who won a minimum of one game As an example, MultipleEnemiesVisible ("F") appears 49 times within the 36 candidates which won an evaluation match; this equates to 49 of the 72 action points (70%). In general this Unit Condition will always return true until the enemy team has been reduced to a single remaining unit, a scenario in which it is still highly likely to win the match. Unit Decisions The Unit Decisions found in candidate solution achieving 1 or more victories can be seen in figure 5.5. The fact that the GA was evolving solutions that favored Unit Conditions which would generally return true, led to shooting-based Unit Decisions being associated with them. This is not initially clear from the data shown in figure 5.5, as although there are shooting-based variants which are well represented (k, m), there are also movement-based variants (b, e) which are also well represented within the strings of successful candidate solutions.

55 5.1. Pilot Test 45 Figure 5.5: Graph showing the number of Unit Decisions contained within candidate solutions who won a minimum of one game However, a closer inspection of the 36 successful candidates shows that the shooting-based Unit Decisions were being favored to be contained in the early elements of the first AP represented by the string. For example, the highest represented of the shooting-based variants ("k"), appears in AP1 of successful candidates a total of 25 times, and 19 of them within the first two Unit Decision elements of the strings. When Unit Conditions are generally returning true, these slots are more likely to be reached than those appearing later in the string.

56 46 Chapter 5. Experiment And Results 5.2 Study 1 The purpose of this study was to investigate if the evolution environment implemented can produce candidate solutions capable of winning a specified percentage of matches, within a certain number of generations. It is expected that changes to the environment - informed by analysis of the pilot test - will produce solutions which are more consistent in their results, and adaptive to varying match scenarios Design The pilot test showed that candidates were evolving towards effective solutions, but that the structure of the chromosome, selection of elements it could contain, and the influence of chance, were guiding these solutions towards answering specific match situations and maximizing the positive impact of RNG. The resulting modifications to the evolution environment, and specific decisions in regards to the design of this study will be described. After 3 generations had been evaluated, a decision was taken to adapt the fitness function and elitism methods. Despite this creating an amount of unreliability when analyzing the data set as a whole, it was felt that the changes were necessary. The nature of these changes and the rationale behind them will be discussed within the relevant sections. Chromosome The decision structure housed by the chromosome, is based on a binomial structure similar to the one shown in GA implementation section 4.2. The difference was that it simply has another level of depth such that a third Unit Condition is checked along each path, resulting in 8 Unit Decisions being available for each action point [Figure X - Below, main structure]. A restriction was enforced, to not allow the repetition of Unit Conditions along any given decision path. This structure, due to each Unit Decision being arrived at by considering 3 consecutive Unit Conditions, is expected to encourage the evolution of candidate solutions which are flexible to changing match scenarios. An important consideration here was to try to encourage the candidates generated by the AI to make use of both action points available per turn.

57 5.2. Study 1 47 Figure 5.6: Chromosome decision structure for Study One In the pilot test, candidates evolved towards solutions which used Unit Conditions that often returned either true or false consistently, and led to Unit Decisions that were dependent upon chance to be successful. With the new structure, the candidates generated should be far less likely to arrive at a single Unit Decision after checking the status of 3 successive Unit Conditions, over a succession of turns. Despite the limitations of the structure used in the pilot test, candidate solutions were being produced which were able to win matches, after just 4 generations of 40 candidates, suggesting that there was room to increase the size of the solution space. The solution space s dimensions are dependent on a combination of the structure of the chromosome and the size of the sets of Unit Conditions and Decisions (UC&D). Despite this structure affording more depth in the decision process, the actual length of the string representing a candidate solution has only increased from 24, in the pilot test, to 30. However, due to the multiplicative nature of the relationship between the decision structure and the UC&D, upon which the solution space s dimensions are formed, the complexity could still increase by a large amount. Figure 5.7: Chromosome encoding for the first action point Unit Conditions and Unit Decisions Many of the Unit Conditions used in the pilot test were constructed by using Inverter nodes, which simply modify the return value of existing condition nodes,

58 48 Chapter 5. Experiment And Results flipping true to false and vice-versa. With the binomial decision structure, Unit Conditions which were constructed this way were no longer necessary. For example, HasHighHp checked if a unit s health was above a threshold, where as HasWounds checked if the health was below the same threshold. Having these checks along the same path, would provide and AI with no more information about the game state. Their removal from the set of Unit Conditions available to the GA for this study, reduced its size from 12 to 7. Representative String Character "A" "B" "C" "D" "E" "F" "G" Behavior tree identifier HasHighHP HasKillShot OneEnemyVisible NoOverwatchingTeammates NoAllyIsHunkerDown AllShotPercentagesAtOrAbove50 IsFlanked Table 5.5: Set of Unit Conditions used in Study 1. The set of Unit Decisions increased in size from 15 to 20, to give the GA opportunity to build its own solutions, rather than be overly guided by the Unit Decisions created for this project. The behaviors added end in a single action node, and are represented by the characters; "p", "q", "r", "s", "t" (see Table 5.6). The Unit Decisions represented by the characters "m" and "n" (see Table 5.6) received a concatenated Reload ability added to their selector node, in order to reduce the situations each node in a selector would return false and a unit would do nothing for an action point.

59 5.2. Study 1 49 Representative String Character "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" Behavior tree identifier SelectMoveStandard SelectMove_Defensive SelectMove_Aggressive SelectMove_AdvanceCover SelectMove_Flanking SelectMove_Fanatic SelectMove_Hunter TryShootOrReload ConsiderHunkerDown TryOverwatchOrReload TryShootIfIdealOrReload TryShootIfFavorableOrReload TryShootIfFavorableOrOverwatch TryHunkerDownOrShootIfFavorable ConsiderHunkerDownOrOverwatch TryShoot TryShootIfIdeal TryShootIfFavorable TryReload TryOverwatch Table 5.6: Set of Unit Decision used in Study 1. Solution Space Size of set of Unit Conditions c = 7 Size of set of Unit Decisions d = 20 Amount of Unit Decisions per AP f= 8 Tree depth p p (0,1,2) Number of indices i at p i 0 = 2 p 0, i 1 = 2 p 1, i 2 = 2 p 2 Total Unit Condition permutations per AP Tc =c i 0 *(c 1) i 1 *(c 2) i 2 = 7*36*625 = Total Unit Decisions permutation per AP Td = d f = 2.56*10 10 Total permutations per AP Tp = Tc * Td = *(2.56*10 10 ) =4.03*10 15 Size of solution space Ssp =Tp 2 = 1.62*10 31 Table 5.7: Solution space size for study 1

60 50 Chapter 5. Experiment And Results Evolutionary Operators Crossover Uniform crossover replaced the single-point crossover method used by the GA to produce a new generation of candidates. Due to the amount of linkage between elements of the chromosome (paths of Unit Conditions directly relate to specific Unit Decision indices), uniform crossover was chosen to treat each element fairly. Similarly to the pilot test, crossover can produce undesirable candidates by placing more than one Unit Condition within the indices of the string, while forming a decision path. A similar solution to the problem is implemented for this iteration. If a double Unit Condition is found at the index of an offspring candidate, values of both parents at that index are checked to see if they are a desirable option, and if not a new random value is generated until a unique one is found. Mutation Mutation follows closely to what has been presented before. However, due to the chance of random Unit Conditions being generated when crossover produces undesirable candidates, two probabilities were used to control mutations. The indices containing Unit Conditions have a lower mutation probability than those containing Unit Decisions, as having to generate random Unit Conditions is a similar process to mutation. The probabilities used were 0.01 (1%) and 0.02 (2%). Selection Roulette selection was again used and, for the first 3 generations of the study, a 10% elitism (10 candidates) model was employed. After those 3 generations, it was observed that many candidates were winning matches and not being able to try again, something which was also happening in the pilot test. Additionally, those candidates who did win and got selected by elitism, were not performing as well as expected. Generation Number of elite candidates Number of elite candidates which won their subsequent games Number of winning candidates in previous generation (60%) (60%) (40%) 30 Table 5.8: Success of elite candidates produced by the first 3 generations of Study 1 It seemed as though many of the elite candidates were produced due to the RNG element of XCOM 2, and were unable to repeat their successes over conse-

61 5.2. Study 1 51 quent matches (table 5.8). This could inhibit the evolution of good solutions, as assigning a high fitness value to a candidate which is not a good solution in most cases, allows that solution to deposit its UC&Ds throughout the population. If a candidate does win through a series of favorable outcomes, it will have the opportunity to play again, and be assigned another fitness value - again distributing potentially unwanted UC&Ds throughout the population. It was felt that every candidate should have the opportunity to play another match if they are victorious, meaning that the actual amount of elite candidates taken from each generation should be variable - this method was termed Dynamic Elitism for the rest of the paper. This way each winner will be able to prove their fitness across multiple matches, and reduce the impact RNG can have on the evolution process. As the GA converges on a set of solutions, it s possible for this method to produce generations that are completely comprised of elite candidates. As no further evolution is possible in this case, the amount of elite candidates should not exceed 50% of the population size. Setup Unit health was adjusted, due to the observed impact that the previous value had on the solutions produced during the pilot test. As a result, each unit had their health pools doubled from 4 to 8. All other unit settings remained unchanged. Fitness Evaluation The basic fitness evaluation method was essentially unchanged, except that it now could produce larger values due to the increase in unit health pools. This meant that a candidate would receive a score of at least 48 should it win a match, and the maximum possible fitness would be 96. However the evaluation process went through several changes. When the Dynamic Elitism was introduced at generation 3, thought was given to how this would work with the standard fitness evaluation. Firstly, given the variable nature of the fitness values assigned to candidates (The actual fitness are likely to be quite different, even when winning consecutive matches), the fitness should be averaged across generations for candidates selected via elitism, to lessen the impact of RNG. If a candidate wins 2 or more consecutive matches, it indicates that the solution has a higher probability of achieving those wins by being a good solution rather than by chance, and its fitness value should reflect this. As otherwise, any candidate that wins due to favorable circumstances could have an unrepresentative fitness associated with it and be favored over candidates with a fitness averaged over multiple matches.

62 52 Chapter 5. Experiment And Results To address this, a modifier was added to fitness scores of multiple winners. If a given candidate managed to win each of its first 5 games with a score of 48 (the minimum possible score for a win), then it s average and representative fitness value would be 48. This means that any candidate winning its first match is likely to have a higher fitness value. It was decided that a candidate winning 5 consecutive games should never have a lower fitness than a candidate winning it first game. This modifier is not intended to represent the fitness of a candidate from an analytical point of view, it is more of a tool to give potentially good solutions an environment in which to thrive. Any candidate which loses a match, loses its modifier value to allow other solutions the chance to evolve. As the fitness value is how the GA determines elite candidates, solutions which won a few games in a row would always be considered elites. The fitness modifier is calculated by the following formula 5.2: Fitnessmod = (Numbero f consecutivewins 1) 12 (5.2) This is applied up to a maximum of 5 consecutive wins, at which point the modifier remains consistent, otherwise fitness values could become so large that a single candidate could dominate the selection process. The value 12 was calculated from the maximum amount of successive wins considered by the fitness modifier, and the 48 fitness required to guarantee that a candidate winning its 5 th consecutive match would always have a higher fitness than a first time winner. A record of the unmodified fitness values of candidates will be retained for analysis. Procedure Overall, the procedure for this study follows precisely what has been described previously, except that the population size was increased to 100 for each generation produced, a 150% increase on the number generated for the pilot test. As the chromosome decision structure used here is more complicated than the one used in the pilot test, it was decided that to allow for more variation between candidates - and thus more potential directions in which to converge - the population size should be larger. The 100 candidates are evaluated over 6 generations in total, resulting in 600 evaluation matches being simulated Analysis of Data The introduction of the Dynamic Elitism method and adjusted fitness evaluation, was initially encouraging. There was a jump between generations 3 and 4, in both

5.2. Study 1 53 the number of wins and average fitness - not including the fitness modifier (figure 5.8). For generation 5, however, the amount of winning candidates fell from 45 to 37. Figure 5.

63 5.2. Study 1 53 the number of wins and average fitness - not including the fitness modifier (figure 5.8). For generation 5, however, the amount of winning candidates fell from 45 to 37. Figure 5.8: Average fitness % and win % of candidates per generation for Study One. Fitness average does not include modifier When Dynamic Elitism was introduced for generation 3, the number of elite candidates increased to 36, of which 19 were successful in the matches simulated during generation 4. Generation 4 then produced 45 winning candidates, that s 26 candidates winning who were non-elite candidates out of a non-elite pool of 64. The fact that of those 45 elite candidates, only 16 were able to win their subsequent match suggests that chance is having an impact here. The current used Dynamic Elitism model does allow candidates who win to continue to prove themselves, but it also allows candidates who achieved match victories though favorable dice rolls to breed and take up space in subsequent generations, the lower amount victories in generation 5 could result from the impact of fortunate candidates. Generation Number of elite candidates Number of elite candidates which won their subsequent games Number of winning candidates in previous generation (60%) (60%) (40%) (53%) (40%) 45 Table 5.9: Elite candidate performance during Study One.

64 54 Chapter 5. Experiment And Results Given these fortunate candidates are likely to drop out in subsequent generations, the evolutionary process can still evolve towards an optimal solution, however, the process could potentially be made more efficient by adjusting the evaluation procedure, by reducing the opportunities for lucky candidates to become elite candidates. The GA produced several candidate AI s which were capable of winning multiple matches (table 5.10). 17 candidates won 3 or more matches (7 with 1 loss, 5 undefeated), including 3 candidates that won 4 and were undefeated, indicating that the decision structure allowed the GA to generate more robust solutions than during the pilot test. Consecutive wins Number of defeats Number of candidates Table 5.10: Elite candidate performance during Study One. Unit Conditions and Unit Decisions When analyzing the occurrences of Unit Conditions within the strings which represent the decision structure of candidate solution, their index within the string also must be considered. A Unit Condition at index 0 of a string will be a part of all 8 paths within the decision structure, those at indices 1 and 2 are in 4 paths, and those at indices 3, 4, 5 and 6 are in 2 paths (for action point 1 only). This fact is reflected in the overview of Unit Conditions contained within the 17 candidate solutions which won 3 or matches (figure 5.9), where an appearance of a Unit Condition in the string was weighted by the number of paths associated with its index divided by 2 (4, 2, 1).

65 5.2. Study 1 55 Figure 5.9: Graph showing the number of Unit Conditions contained within candidate solutions who won a minimum of one game at Study One The Unit Condition "E" (a check to see if any other friendly units are using the ability Hunker Down ) is poorly represented within the dataset. Part of the reason for this, could be attributed to the fact that only 2 of the 20 Unit Decisions have the Hunker Down action node contained within its behavior, and one of those - "o" - was also poorly represented within the same dataset (figure 5.10). This would result in the Unit Condition "E", returning false the majority of the time, and be less useful. Another Unit Condition with a smaller representation is "B" (returns true if there is a visible enemy unit that can be killed with the normal hit damage AI units can do). This Unit Condition is the 2 nd least represented of the Unit Conditions for AP2 and overall. The check that this Unit Condition performs and its return value have no real influence on the Unit Decision it results in, as the targeting system used when instructing a unit to shoot is based on a set of weights that includes, hit chance being above 50% and health remaining equal or below the minimal damage of 3, so an AI unit might not specifically shoot at an enemy unit who can be killed with one shot.

66 56 Chapter 5. Experiment And Results Figure 5.10: Graph showing the number of Unit Decisions contained within candidate solutions who won a minimum of one game at Study One At first glance the figure 5.10 above shows a high variance distribution of Unit Decisions within the successful candidates, and in many ways it is higher, as many of the Unit Decisions are based around the same action nodes. The Unit Decisions "p" and "h" both instruct a unit to shoot if it has a visible target, the only difference being that "h" will instruct a unit to reload if no targets are available. The fact that "p" is favored on 1st AP, further indicates that RNG could still influencing the evolution of some of the candidates. It is expected that AI units will prefer to move into positions on the AP1 and use abilities on AP2 - and it can be seen that all of the movement based Unit Decisions (characters "a" through "e") are better represented for AP1, with the exception of "e" (move to try and flank an enemy unit). Additionally, "l", "m" and "r" are based around telling a unit to shoot if it has a favorable shot, each of these being well represented. It appears as though the shooting-based Unit Decisions are heavily favored by the GA, although this could be due to there being more of those than movement-based variants available when forming candidates. The shooting-based Unit Decision "q", which instructs a unit to shoot if it has an ideal shot (chance to hit that is greater than 70%), is extremely poorly represented. This is most likely due to the hit chance requirement, as it has no backup action to instruct a unit to perform if no ideal shot is available. The nature of the shooting based Unit Decisions should also be considered, when a unit tries to shoot it s because there is a visible target, that it has a chance to hit. So if a shooting action node doesn t fail, there is a chance of a tangible payoff with respect to the fitness evaluation, as fitness is given for each point of damage done to enemy units. The payoff from a movement-based variant is wholly

67 5.3. Study 2 57 dependent on what follows after that movement. This could slow down the process of identifying optimal partial solutions that contain should contain movementbased Unit Decisions. 5.3 Study 2 The data gathered in Study 1 showed the evolution environment generally produced more consistent candidate solutions than those produced during the pilot test. This study will investigate a modified dynamic elitism method, and a further refined set of Unit Conditions and Decisions, to see if they can help the GA to produce a similar number of candidate solutions capable of winning at least 3 matches in a row, but with a smaller population size, and thus fewer evaluations than it was able to, within the evolution environment of Study Design The implementation of the evolution environment followed closely to that which was used during Study 1. The alterations, and why they were made will be discussed here. Unit Conditions and Unit Decisions Two of the 7 Unit Conditions available to the GA in Study 1 were removed. They were the least represented within the successful candidates from that study, and as explained in the analysis of their data - they had limited meaningful impact on the decision making process, in the general case. Representative String Character "A" "B" "C" "D" "E" Behavior tree identifier HasHighHP OneEnemyVisible NoOverwatchingTeammates AllShotPercentagesAtOrAbove50 IsFlanked Table 5.11: Set of Unit Conditions used in the Study 2. The set of Unit Decisions went through a heavy revision based on the analysis of Study 1. Many of the similar variants were condensed, the balance between movement-based and shooting-based variants was evened out, and each variant was constructed to not return a failure in any circumstance.

68 58 Chapter 5. Experiment And Results Representative String Character "a" "b" "c" "d" "e" "f" "g" "h" "i" Behavior tree identifier SelectMove_Defensive SelectMove_Aggressive SelectMoveFlankingOrAggressive TryShootOrReloadOrOverwatch TryOverwatchOrReload TryShootIfFavorableOrReloadOrOverwatch TryShootIfIdealOrReloadOrOverwatch ConsiderHunkerDownOrMoveDefensive TryShootIfIdealOrMoveFlankingOrMoveAggressive Table 5.12: Set of Unit Decision used in Study 2. The movement-based variants best represented within the successful candidates generated during Study 1 are retained and represented by the characters "a", "b" and "c". The flanking movement behavior was able to fail should no flanking position be available. Now, the aggressive movement behavior is called should a flanking movement not be possible. The multiple TryShoot and TryShootIfFavorable variants available in study 1 have been condensed into Unit Decisions "d" and "f", given that each of them were very similar and no subsequent condition checks used within the variants were removed when condensing. To help with balancing the types of Unit Decisions, movement behaviors were used to create some variation and choice. For example "f" instructs a unit to shoot only if it has a hit chance greater than 70%, otherwise try to take up a flanking position, or move aggressively. For "h", units are instructed to attempt to use the Hunker Down ability, and should it not be in cover and thus not able to use the ability, it should move defensively. Solution Space Size of set of Unit Conditions c = 5 Size of set of Unit Decisions d = 9 Amount of Unit Decisions per AP f= 8 Tree depth p p (0,1,2) Number of indices i at p i 0 = 2 p 0, i 1 = 2 p 1, i 2 = 2 p 2 Total Unit Condition permutations per AP Tc =c i 0 *(c 1) i 1 *(c 2) i 2 = 5*16*81 = 6480 Total Unit Decisions permutation per AP Td = d f = 4.3*10 7 Total permutations per AP Tp = Tc * Td = 6480*(4.3*10 7 ) =2.78*10 11 Size of solution space Ssp =Tp 2 = 7.7*10 22 Table 5.13: Size of Solution space from Study 2.

69 5.3. Study 2 59 Selection Dynamic Elitism was designed to help candidates that win their matches become elite candidates, and be able to continue to contribute to the evolution of an optimal solution. However the continued impact of RNG leading to candidates winning with poor or highly situational solutions, could be reduced to improve the efficiency of the dynamic elitism model, by adjusting the previous fitness evaluation model. Setup All unit variables have remained constant, with the exception of the health pool. AI and Enemy units will now have 10 hit points, up from 8. This change was made due to observations during previous studies, that a good turn for either side would essentially determine the outcome of the match. 8 hit points equated to 1 critical hit and 1 non-critical hit being required to kill a unit, incrementing to 10 required. To kill a unit in two shots, they would both have to be critical hits, which is less likely to occur. Fitness Evaluation The basics of the fitness evaluation have remained the same. All damage done and all health remaining at the end of a match contribute 1 fitness. However due to the variability in performance, fitness is averaged through generations when candidates are copied through generation due to elitism. There are further iterations required for this study, including increasing minimum baseline fitness for a victory to be 60, and the maximum being 120. The Fitness modifier - introduced to help candidates who have won multiple matches in a succession be more likely to be chosen for selection - also factors in the increase in health, with the modifier values being 20 for 2 wins, 40 for 3 wins, and 60 for 4 wins. As before should a candidate lose a match at anytime, the modifier is lost and its fitness is only represented by the average fitness of all matches played across generations. Any candidate solution winning its first match will now have to be evaluated a second time. A candidate winning both matches will have their base fitness values averaged, and then the modifier value for 2 wins is applied. Candidates winning and then losing have their base fitness averaged. This is intended to supplement the Dynamic Elitism selection model, by reducing the chance of poor solutions becoming elite candidates. The modifier is also applied immediately, as a candidate could win with a very high fitness and narrowly lose a match, and have a higher average fitness than a candidate who narrowly won 2 matches.

70 60 Chapter 5. Experiment And Results Data Logging In addition the fitness value and match wins which have been logged during the previous candidate evaluation phases if the GA, the XCOM 2 mod was setup to output additional combat information to attempt to attain a deeper understanding candidate performance. Damage per hit This value represents the average amount of damage done on a successful hit, for all units of a team, within a single match. It is calculated by taking the total damage done to the enemy units, and dividing it by the number of shots taken by the AI units. It should be noted that the Overwatch ability damage contributes to damage done, but not to shots taken, meaning this value can be inflated by behaviors that make liberal use of Overwatch. Turn count The amount of turns taken before a match ended, with a turn being considered a succession of one AI turn and one AI opponent turn. Accuracy It is defined as the amount of attacks that resulted in a hit, divided by the amount of shots taken. This is calculated from the totals of all units at the end of a match. Cover per turn Procedure The value output cover per turn takes each active units cover modifier per turn, and calculates an average, then these per-turn averages are then averaged over an entire match, as such this value provides an indication as to the effect of favoring cover in candidate solutions The experiment procedure generally follows the same process. For this study candidate solutions which win their first match are re-simulated, and their associated average fitness and modifiers are manually recorded. The population size was reduced to 50 from 100, as it is expected that the performance of the GA should improve sufficiently to produce comparable candidates. Additionally, the decision to re-evaluate candidate solutions which win their first match, means that the amount of evaluations will be dynamic, so any potential optimization improvements will have to be offset by this.

5.3. Study 2 61 5.3.2 Analysis of Data The changes made to the evolution environment were designed to produce stable candidates more efficiently than those produced in study 1, by reducing the impact

71 5.3. Study Analysis of Data The changes made to the evolution environment were designed to produce stable candidates more efficiently than those produced in study 1, by reducing the impact of poor solutions on the GA. The changes should initially produce fewer winning candidates and thus lower fitness values (when viewed as a percentage of total fitness possible, given the increase in unit health), but these values are expected to rise more quickly due to the GA being able to converge on more optimal solutions. The values in figure 5.11 indicate that indeed these values did decrease. In generation 0, only 4% of candidate solutions were considered winners (won first 2 matches), compared to the 20% achieved in generation 0 during study1. Average fitness was less affected, with the average fitness in this study being 25% of the maximum attainable, compared to 29% in Study 1. Figure 5.11: Average fitness % and win % of candidates per generation for Study 2. Fitness average does not include modifier The generational change in fitness value and match wins is also more consistent, with both increasing from generation to generation, unlike the performance observed during study 1, where these values could stagnate or even fall between subsequent generations. An 18% increase in wins and a 19% increase in the average fitness percentage, can be seen between generations 0 and 5, This compares favorably with the candidates produced during study 1, which produced results of 16% and 17% respectively.

62 Chapter 5. Experiment And Results Winning Candidates There were 30 candidates generated which won consecutive matches, and of those, 14 won 3 or more 5.

72 62 Chapter 5. Experiment And Results Winning Candidates There were 30 candidates generated which won consecutive matches, and of those, 14 won 3 or more 5.14, compared with the 17 of 38 produced during study 1. Although fewer stable candidates were produced, they were produced by a GA evolving population size which was 50% smaller. Consecutive wins Number of defeats Number of candidates Table 5.14: Elite candidate performance during Study 2. The breakdown of Unit Conditions contained within the 14 candidates winning 3 or more matches (figure 5.12) shows each of them to be reasonably evenly distributed, when looking at both the totals and appearances per action point. Although "A", "B" and "E" are slightly better represented than "C" and "D". Indicating that each of these Unit Conditions have some value, and should be able to contribute towards producing stable candidate solutions. Figure 5.12: Occurrence of Unit Conditions within candidates winning 3 or more consecutive matches at Study 2 Unit Decisions went though a large revision for this study, and their representation within the stable candidate solutions produced (figure 5.13), shows that these changes had an immediate effect on how the GA utilized them to form the stable solutions. It can be seen that most of the Unit Decisions are now being favored

5.3. Study 2 63 for a particular action point. Each of the movement based variants ("a", "b", "c") are now heavily favored for AP1, with the shooting based variants ("d", "f", "g") favoring AP2.

73 5.3. Study 2 63 for a particular action point. Each of the movement based variants ("a", "b", "c") are now heavily favored for AP1, with the shooting based variants ("d", "f", "g") favoring AP2. It can also be shown that each of the Unit Decisions is well represented within the sample space, though they appear to have low total occurrences within the stable candidates. The tendency to heavily favor one action point over the other ("a", "b", "h"). Indicating that the placement in one action point or another of these Unit Decisions is what gives them their value. Figure 5.13: Occurrences of Unit Decisions within candidates winning 3 or more consecutive matches at Study 2 In stark contrast to this, is the Unit Decision represented by the character "i". The newly created variant introduced for this study, instructs a unit to shoot if it has a 70% or greater chance to hit an enemy unit, or move to a flanking position, or if that is not possible move aggressively. Given this Unit Decision ends in either movement and shooting action nodes, it s perhaps not surprising that it is well represented for both AP1 and AP2, although the condition gating the shooting action node will ensure that the movement action node is run more often. The extent to which "i" was favored could also indicate another potential method to help the GA produce better candidate solutions. Perhaps more of these variants could be created when trying to ensure that a given Unit Condition is not able to return false (no action executed). They could offer alternatives to the current set which attempt to group similar or related action nodes together ( Shoot, Overwatch, Reload ), to create Unit Decisions that are perhaps able to handle changing game situations.

74 64 Chapter 5. Experiment And Results Re-evaluation There were 36 candidate solutions which after successfully winning their initial matches, failed to win their subsequent matches.table 5.15 displays combat information about these how these candidates performed, broken down between the two matches, and compared against the combat information off all matches played. Damage / Hit Turn Count Match 1 (Win) Match 2 (Loss) Difference (absolute) Difference % Average of all played matches Average of all matches won Average of all matches lost Accuracy Average Cover/turn Table 5.15: matches Combat performance information about candidates that failed to win consecutive As expected, accuracy, damage and inclination towards favoring cover all fall between matches 1 and 2, however the biggest difference can be seen in the accuracy of the candidates, where they performed with 29.4% reduced accuracy on average. This is likely due to the fact that the average accuracy attained during each first matches was 86%. This shows a 19% increase over the average accuracy attained by candidates during all matches played, and a 10% increase over the average of all matches in candidates won. During match 2, candidates on average had an accuracy that was 5% higher than the average for all lost matches, and only 6% lower than the average for all matches played. This data could indicate that favorable RNG impacted the results of matches which were won, rather than poor luck in the subsequent re-evaluation matches. Additionally, it can be seen that the other items of data extracted seem to not have a strong correlation with the match results, with all average values being similar no matter the outcome. 5.4 Study 3 This chapter will detail the investigation of a restriction to the structure of the chromosome used by the GA, that is designed to reduce the size solution space. It s expected that this restriction should produce more successful and stable candidates than those found in Study 2.

75 5.4. Study Design The only redesigned elements of the evolution environment were the chromosome structure, and associated evolutionary operators Chromosome This iteration of the chromosome focused on further restricting the decision structures housed within, to primarily reduce to solution space, but to also provide a more rigid structure for the GA to work with, such that a crossover of a Unit Decision has a better chance of still being meaningful at its location within a generated offspring candidate. Within the previous binomial decision structure, the only restriction was that no character representative of a Unit Condition could appear more than once along any path. That restriction still remains, with the added restriction that each tree will be made from exactly 3 Unit Conditions. Using only 3 Unit Conditions means no matter which order you place them within a binomial tree, if the rule that no more than a single Unit Condition can reside on any given path is respected, then the ordering of the characters makes no difference to the outcomes of the decision paths (figure 5.14). Figure 5.14: Example chromosome structures showing the irrelevance of the order of the Unit Conditions As order is not important, the size of the current set of Unit Conditions (r), and the number of Unit Conditions required to form a decision structure (n), can be used with the equation shown in 5.3 to calculate the total possible decision structure combinations. Using the same set of Unit Conditions as previously (r = 5) the total amount of possible combinations Unit Conditions is 10. Because the combinations ABC, CBA or BAC, are considered the same, each potential combination is restricted to alphabetical order when considering them as decision structures, reducing the sample space for this part of the chromosome to 10 from 6480 during Study 2.

76 66 Chapter 5. Experiment And Results n! (n r)!(r!) (5.3) Evolutionary Operators Crossover The Crossover operator for Unit Decision indices of the chromosome remains unchanged, however some implementation changes were required to handle the new decision structure. Although there are only 10 possible variations possible, it was decided to continue with the principle of uniform crossover, rather than simply swapping each parent candidate s entire decision structures. As such, uniform crossover is applied to each of the 3 characters, following the rule of no repeated characters. Similarly to before, if no valid character is available from the associated indices of either parent candidate, a random valid Unit Condition is selected. Once crossover is applied to an entire string, the decision structure sections are sorted into alphabetical order, such that it takes the form of one of the 10 possible decision structures chosen. Mutation The actual implementation of the mutation operator remained unchanged. However given the limited amount of potential decision structures available, and the fact that crossover has a chance to essentially mutate them when it has to generate random Unit Conditions, the mutation probability for the decision structure elements of the chromosome is set to 0. The chance for mutation to occur on Unit Decisions remains unchanged at 2%. Solution Space Size of set of Unit Conditions c = 5 Size of set of Unit Decisions d = 9 Amount of Unit Decisions per AP f= 8 Total Unit Condition permutations per AP Tc = 10 Total Unit Decisions permutation per AP Td = d f = 4.3*10 7 Total permutations per AP Tp = Tc * Td = 10*(4.3*10 7 ) =4.3*10 8 Size of solution space Ssp =Tp 2 = 1.8*10 17 Table 5.16: Study One s Solution space size.

5.4. Study 3 67 5.4.2 Analysis Of Data The average percentage of maximum fitness and win percentages per generation (figure 5.15), show a general improvement of those attained during study 2.

77 5.4. Study Analysis Of Data The average percentage of maximum fitness and win percentages per generation (figure 5.15), show a general improvement of those attained during study 2. In particular a 50% increase in win percentages can be observed, for example the win percentage for generation 5 in studies 2 and 3, was 22% and 44% respectively. The increase to the average fitness percentage was improved to a lesser degree, with each generation returning with a 5-10% improvement compared with Study 2. Figure 5.15: Average fitness % and win % of candidates per generation for Study 3. Fitness average does not include modifier There appears to be a large increase in both metrics between generations 1 and 2, resulting in a decrease in average fitness, and no increase in win percentage between generations 2 and 3. Similar occurrences were observed from the overview of results obtained during study 1, and the was attributed to candidates benefiting from favorable RNG, and led to the decision to re-evaluate candidates winning for the first time. Here it appears that the offspring candidates generated from the candidates found in generation 2, were not good solutions in general. 15 winning candidates were found in generation 2, 11 coming from the offspring candidates produced from generation 1, and 4 elite winners. 7 of the 11 candidates won their 3rd matches as elite candidates during generation 3, and 2 of the 4 elites from generation 2, also won. As generation 3 had a total of 15 winning candidates, the total wins coming from newly formed offspring candidate solutions was only 6, and 9 coming from elite candidates.

78 68 Chapter 5. Experiment And Results Winning Candidates There were a total of 46 candidates which won 2 or more consecutive matches, 21 of which did so without losing a match. Of these 26 candidates were able to win 3 or more matches, with 14 suffering 1 defeat and 12 remaining undefeated (table 5.17). This shows an improvement in successful candidates produced during this study when compared to that of study 2. Consecutive wins Number of defeats Number of candidates Table 5.17: Elite candidate performance during Study 3. In addition, it seems as though these candidates also show improved stability, as there were candidates winning, 4, 5, 6, 7, and 8 consecutive matches, Indicating that their behaviors are consistent in various scenarios, and less sensitive to random events or chance. The Unit Conditions contained in the successful candidates, show a fairly even distribution (figure 5.16) between the representative characters "B", "C", "D". While those represented by "A" and "E" feature more prominently, favoring AP1, and AP2 respectively. In general the distribution of of Unit Condition within candidates did not alter that much compared with Study 2.

17) was also similar to Study 2, with movement-based variants preferred on AP1, and shooting-based variants preferred for AP2.

79 5.4. Study 3 69 Figure 5.16: Occurrence of Unit Conditions within candidates winning 3 or more consecutive matches at Study 3 The distribution of Unit Decisions (figure 5.17) was also similar to Study 2, with movement-based variants preferred on AP1, and shooting-based variants preferred for AP2. These similarities in distributions, between studies could indicate that the restrictions to the chromosome structure was successful in helping to reduce the solution space, and that the reduction didn t have an obvious impact how the solutions were formed. Figure 5.17: Occurrences of Unit Decisions within candidates winning 3 or more consecutive matches at Study 3

80 70 Chapter 5. Experiment And Results Re-evaluation The results show that 22 candidate solutions won their first matches and suffered defeat when they were re-evaluated. The combat data obtained during the evaluation of these candidates can be seen in Table On the surface, Damage per Hit and Average Cover seem to continue show show little variability in relation to the outcome of a given match. Although there is a larger decrease between the average turns taken, for these candidates than was seen in Study 2. Damage/ Hit Turn Count Match 1 (Win) Match 2 (Loss) Difference (absolute) Difference % Average of all played matches Average of all matches won Average of all matches lost Accuracy Average Cover/turn Table 5.18: Combat performance information about candidates that failed to win consecutive matches. During study 2 it was suggested that the values of accuracy obtained from candidates - which won the first and lost the second of their two matches played - indicated that the RNG had been favorable during match 1, and that during match 2 the accuracy attained was what was expected given the population means. Here it can be seen that the average accuracy during match 1 was in line with population means, and the average accuracy from match 2 fell below the population means of even all lost matches. It contradicts the assertion from Study Final Evaluation The data indicates that Study 3 was able to produce more successful and stable candidates than those from Study 2. To support this, the 5 fittest candidates from the final generation of each study were further evaluated up to a total of 10 matches each against the default AI. The overview of the results of these evaluations in table 5.19 show that the fittest candidates from study 3 won more matches in general. With the average amount of wins per candidate increasing from 6.8 to 8.2 for studies 2 and 3 respectively, supporting the suggestion that there is an improvement in candidate performance.

81 5.5. User Testing 71 Study 2 Study 3 Candidate Wins Average Fitness Candidate Wins Average Fitness Average Average Table 5.19: Combat performance information about candidates that failed to win consecutive matches. To see if the improvement could be described as statistically significant, a Wilcoxon rank sum test was conducted on the average fitness of each candidate. This test investigates a null hypothesis that the medians of each set of samples are equal, and the alternative hypothesis states that the median of study 2 is less than the median of study 3. The resulting P-value, is the probability of observing the given result, or one more extreme, by chance if the null hypothesis is true. To be statistically significant the P-value should be less than the alpha value of alpha Value p Value Null Hypothesis Fitness True Wins True Table 5.20: Results from the Wilcoxon rank sum test, comparing candidates from study 3 against those from study 2. The ranksum test was conducted in Matlab 1 for both the wins candidates achieved and their fitness (see table 5.20). In both cases the null hypothesis that the medians of each sample set were equal could not be disproved, however the p-values 0.06 and 0.07 are very close to the significance level defined by the alpha value, indicating that there was an improvement, even if it was not statistically significant. 5.5 User Testing Motivation Having identified the methodology used during Study 3 produced the most successful and stable candidates, a selection of the most evolved of these were pitted against real players, and their performance analyzed. This was necessary in order to judge the potential that the proposed method has in providing another useful tool for developing game AI. 1 Mathworks,

82 72 Chapter 5. Experiment And Results Expectations Based on the observations of the previous study in the process and the fact that the AIs trained against the default AI - which itself is not representative of the challenge posed to players by XCOM 2 - it is not expected for the generated AIs to be extremely difficult for human players to defeat. However, they should provide superior performance than the default AI. Additionally, it is expected that the candidate AIs should be able to win at least 1 game against test candidates. Setup The 5 candidate AI s with the highest average fitness of generation 5 from Study 3 were selected to play against human opponents - although there were candidates which may have won more games, and then eventually lost in previous generations. These were chosen as they are more representative of the quality of solutions produced on the most evolved point of the methodology available, and thus any potential applications. To validate the performance of these candidate AIs relative to the default AI, the latter will be the 6 th AI that the players will face. Since the mod was initially configured to perform simulations between computer controlled opponents, changes had to be made in order to allow human players to regain control of their units. The effect responsible for running behavior trees on the player units was adjusted to no longer perform this functionality. Along with this change, the fitness evaluation was adjusted to track the AI opponents units rather than the player units. In terms of physical setup, it was decided to perform the experiment in a comfortable environment for the players, one that would provide the optimal circumstances for them to concentrate. Inspired by competitive chess tournaments, efforts were made to reduce, background noise, provide sufficient lighting, and personal comfort. As such, a well lit and somewhat soundproofed room was utilized. Inside, the two simultaneous participants would be positioned back to back. Two computers running the experiment game platform were provided for the participants respectively and the test would then begin. Candidate Demographics The pool of 10 participants feature ages ranging from 23 to 31 and as such, are fitting the general target audience for testing video game products. This segment is comprised of young adults that have had contact with computer technologies in some capacity and are capable of understanding the basic game mechanics required for participation in the experiment. All participants have reported playing games among other leisure activities and could even recall at least one turn-based tactics game that they have played in the past.

83 5.5. User Testing 73 Figure 5.18: Playtime differences between XCOM2 and XCOM Enemy Unknown/Enemy Within Of all participants, 40% have not played XCOM 2 at all, 30% have played somewhere between 1 and 20 hours, 10% played between 20 and 60 hours and 20% had more than 60 hours experience (figure 5.18). Also, because the gameplay is quite similar between X-Com Enemy Unknown/Enemy Within and XCOM 2, it was potentially relevant to ask about experience with the prequel as well. As such, the percentages are mostly the same with the exception of participant 5, which had more experience with XCOM EU/EW than XCOM 2. Methodology Two participants would play simultaneously, positioned back to back, using two computers capable of running the game. The first step would be filling out a questionnaire (See Appendix D.), which gathered data about age, past TBT gaming experience, past XCOM EU/EW and XCOM 2 play times. The first two items are designed to get an impression of a participant s overall experience, while the last two are aimed specifically at potential correlations between experience and competence. Upon completing the questionnaire, the participants would begin their series of 6 games, always starting with the Default AI setting and then randomly switching between the remaining behavior trees until all have been played once. The Default AI was selected to be encountered first, to not bias the data in favor of the candidate AI s, as test participants who are less familiar with XCOM 2 could potentially gain increased aptitude as their experience with the game increases. A short break was allowed between games, partly due to data recording that was performed and partly due to the intensity of the matches, which appeared to have some exhausting effects on test participants, primary due to match lengths exceeding 30 minutes.

84 74 Chapter 5. Experiment And Results Results and Analysis Overall, the behavior trees scored fitness values above 0, which suggests that regardless of their performance, all the AIs managed to present some degree of challenge to the players. Additionally, most matches seem to take around 10 turns to reach resolution, with the exception of games in which the AI wins, which appear to take around 12 turns. Victory Rate Average Fitness Average Winner Fitness Average Turns Average Winner Turns Default 0% ACEcagcficiBCEdagafdfc 20% ADEfcccbfbfBCEdbeggffg 10% BCDgdcdaihfADEigidfdeg 50% BCEcaccaaidADEdbgdbbge 0% ACEiaabficbADEiaggfgfg 0% Table 5.21: Results from the User Testing performed on the 5 best BTs evolved and on the Default AI. Out of the 5 candidate behavior trees tested, 2 lost all games played and the other 3 managed to win at least 1 match against test participants. The default AI failed to win any match, as expected. The average fitness of the AIs which failed to win a match (Default AI and 2 candidate AIs), was 30 (50% of number required to win). 2 of the 3 candidate AIs with victories achieved an average of around 40 points and with the most successful candidate AI having achieved an average fitness of 60 points. The percentage of victories achieved by the 3 candidate AI s which managed to win at least one match were 10% and respectively 20%, while the most successful one boasts a 50% victory rate. From the test participants perspective, there is a strong correlation between the reported play times of XCOM games and AI victory rates. Players with a reported playtime of above 20 hours have achieved 100% victories, while players below that mark are the ones most susceptible to the tactics employed by the candidate AIs, with each of them suffering at least 1 defeat to a candidate AI. A similar correlation can be observed in relation to the average enemy fitness, which shows higher AI fitness values for the inexperienced players and lower for the experienced ones (figure 5.19).

85 5.5. User Testing 75 Figure 5.19: Correlation between participants play times in descending order and combat outcome statistics A two-tailed Wilcoxon rank sum test was conducted to determine if the candidate AI s showed a statistically significant increase in fitness value compared to the Default AI (table 5.22). The null hypothesis states that the default and candidate AI should provide equivalent performance and the alternative hypothesis states that the candidate AIs should show increased performance. The alpha value was The candidate AI represented by the string BCDgdcdaihfADEigidfdeg, achieved a p-value value, disproving the null hypothesis. This is the only candidate AI which can be said to have shown a statistically significant improvement in performance compared to the default AI. Mean Standard Variance Fitness Deviation p-value Default ACEcagcficiBCEdagafdfc ADEfcccbfbfBCEdbeggffg BCDgdcdaihfADEigidfdeg BCEcaccaaidADEdbgdbbge ACEiaabficbADEiaggfgfg Table 5.22: Results from the User Testing performed on the 5 best BTs evolved and on the Default AI. Discussion and Conclusions Ultimately the results confirm that there was a general improvement in performance of the candidate AI s over the the Default AI when analyzed with respect to

76 Chapter 5. Experiment And Results their fitness. With 3 of the 5 candidate AIs achieving higher fitness values on average, and 2 with lower average fitness having quite similar values.

86 76 Chapter 5. Experiment And Results their fitness. With 3 of the 5 candidate AIs achieving higher fitness values on average, and 2 with lower average fitness having quite similar values. Despite this, only 1 candidate would be confirmed to perform significantly better than the defaults, meaning it is not possible to confirm the expectations of this test completely. Figure 5.20: breakdown Most evolved candidate AI: BCDgdcdaihfADEigidfdeg UC&D It is clear from the breakdown of the most successful candidate AI (figure 5.20), that the success of its behavior might come out of how many of its Unit decisions are damage dealing or offensive abilities. From the AI s perspective, it is true that the fitness value has always promoted dealing damage as the scoring factor and as such, it is clear that is exactly the result that is seen here. Over the entire process of optimizing this AI generation technique, the largest optimization criteria was always the fitness calculation which was always based on the AI s capability of defeating its opponent, which can only be done by dealing damage, as is dictated by the environment.

87 Chapter 6 Discussion and Conclusion 6.1 Discussion The development of the methodology described in this paper, produced system which showed consistent improvements in the quality of the AI generated and the amount of evaluations required to generate them. Each iteration highlighted unforeseen issues, or potential directions to optimize the process, and generally the implementation was adjusted to address these. Despite the overall increase in quality of the generated AIs, it is difficult to infer from the data what impact, if any, some of these adjustments made towards their intended goal. Many decisions which instruct an XCOM 2 unit to perform an action are subject to the outcome of a dice roll. This infers variability in evaluation of candidates, from a combat performance perspective, and meant that the true quality of a candidate AI would only reveal itself over a series of evaluations. Typically, in evolutionary algorithms, a fitness value is a fixed evaluation of the fitness of a candidate, however, this application required consideration of the variable nature of fitness Dynamic Elitism The Dynamic Elitism method introduced during study 1, was designed to allow candidate AIs to play again should they win, even if their fitness value was below what was needed to be considered an elite candidate. Allowing each candidate AI that won a game to be considered an elite candidate created further issues, as either AI might receive favorable dice rolls during an evaluation match, and win when their contained behavior normally wouldn t. Re-simulating a match each time a generated AI wins its first match, while helping produce promising results over time, proved to be a poorly considered adjustment that could be improved. The assumption made was that winning successive matches indicated that a candidate AI was sufficiently likely to be of better 77

88 78 Chapter 6. Discussion and Conclusion than average quality. Not only does this solution not account for candidate AIs receiving poor dice rolls, it also, by its nature, reduces the efficiency of the whole methodology. An improvement could be made by setting up a system that can review combat data from a match and flag those matches which data indicates that there was an anomaly, as only they should be re-simulated Fitness Function Another issue was that candidate AIs winning multiple successive matches reduce the potential impact that chance can have on average fitness attained, making them more representative of the solution s quality, but this was not reflected in the evaluation of candidates. The fitness function was adjusted to favor successive winners, but how well it performed was difficult to analyze. Overall, it s a worthwhile consideration and didn t noticeably impede production of higher quality solutions. However, higher values of the modifier applied to winning candidates meant faster algorithmic convergence. Therefore, the application of this is to be considered in the context of what the intended effect was. The fitness evaluation method presented other issues. In its basic form, it rewards solutions that instruct units to deal damage, but there is no way to tell if the fitness gained from doing that damage was the result of a well positioned (within the AI decision structure) movement action, or an effect of volume of fire tactics. The fitness function should evaluate with respect to what is desired. If a generic best AI behavior is desired, perhaps the fitness function could reward candidates whose combat data shows particular trends that can be attributed to tactical behavior. For example, aggressive movement could be a desired trait of an ideal solution, so the rate at which a candidate AI moves to flanking positions could be represented somehow within the fitness function. This could in turn lead to a potential simultaneous evolution of different species of behaviors that may cooperate as a result of their evolution environment, rather than being reliant on other game systems to allude at this emergent group intelligence Chromosome Structure The binomial decision structure employed was limited in scope, only having 16 potential decision paths, and was not expected to produce solutions which could out-maneuver an experienced human player, even considering that the solution spaces produced initially were large. Each restriction made to the structure of the chromosome reduced the sample space, but at the same time reduced the maximum potential quality of a candidate solution. The decisions taken to restrict the decision structures were taken with logical consideration, to always provide as much flexibility. The 10 potential structures available in study 3 produced the highest overall

89 6.2. Conclusion 79 amount of successful candidates, with some of those being capable of defeating novice human opponents, as was presented by the user test. It is likely that, should the methods from studies 2 and 3 be evaluated for further generations, the candidates from study 2 would begin to show improved performance as well Unit Conditions and Decisions Another contributor to the size of the solution space was the sets of Unit Conditions and Decisions. The initial sets were constructed from what was commonly used in the default AI configuration file, and perhaps a more careful selection and design of additional entries would have led to higher quality analysis of the methodological aspects discussed above. These sets exist entirely within the context of XCOM 2 and, for an exhaustive genetic algorithm search, the entire set of behaviors in the configuration file could be considered, with a clear chance of achieving a quality solution. However, in the context of this project, it was unrealistic and the restriction to the more select options used in studies 2 and 3 should have been identified earlier in the process. Nonetheless, there are no actual limitations imposed by the system in regards to the choice of UC&Ds, so the overall potential of optimization is this area is entirely dependent on the design and implementation of the game. 6.2 Conclusion Optimization of the methods used to generate AIs for XCOM 2 was a complicated problem. Even in the restricted version of the game created for evaluating candidate AIs, an intimate knowledge of the game s system were required in order to conceive for ways to streamline the process. The research presented showed that, in it s current setup the system was capable of producing a candidate capable of defeating novice human opponents, and that this candidate was formed after approximately 300 match evaluations. In campaign mode, players of XCOM 2 can expect to encounter around 200 pods of enemies, and as such the system is incapable of currently providing adequate AIs within this context. There are many games in which 300 encounters barely scratches the surface, and it can be said that the research presented here shows that it could be possible to generate AIs during the natural progression of some games. But context is vital to the success of a GA employed to solve this kind of problem, as the evolution environment is shaped by it. The way any given game works impacts everything, and as such it cannot be said that this project was able to provide a solution to its problem statement. However, it is felt that the dynamic elitism and solution space reduction methods showed encouraging results in their respective goals, and could potentially be used in any future work conducted into how to optimize the

90 80 Chapter 6. Discussion and Conclusion evolution of AIs with a GA, for a game where outcomes heavily dependent on RNG.

91 Chapter 7 Future Directions 7.1 Further Development In XCOM 2 a complete rendering of its combat scenarios is required due to the game s calculations being performed based on physics, as well as units statistics. A game in which the visual layer is just a rendering of actions with calculations performed in the background would likely provide a more suitable platform for this type of research. Any further work with this system would require solving the problem of fully automating the evaluation process so that simulations would continue until a certain termination condition was met.this would provide the freedom to quickly explore and iterate through different strategies, and allow development of a wide array of potential applications. It was suggested in the discussion section that the re-simulation of winning candidates could be improved by identifying anomalous combat statistics, and only re-evaluating those matches. An investigation was conducted using the combat data gathered during studies 2 and 3 of this project, where pattern classification methods were applied to try and begin identify matches which should have been re-evaluated (See Appendix E.). It showed some potential With an automated evolution procedure, training classifiers to identify anomalous data could be possible. Evolving AIs from a set of completely randomly formed characters is perhaps not going to be useful in a commercial application. The initial generations are likely to offer very little challenge to players, and with a system capable of automating the evolution procedure, there are potential workarounds should be they be required. For example, a game could ship with an array of semi-evolved candidate solutions, and each time a player begins a campaign, the initial generation is selected from this array, and from there they evolve organically. This could allow developers to determine the initial challenge offered to players, and allow for the production of AIs alongside the natural progression of a game. The evolution of AIs for this project required evaluation against a fixed Default 81

92 82 Chapter 7. Future Directions AI. Though sufficient for the scope of the work, evolving solutions against a single AI is likely to produce solutions which are over-trained in solving the problem presented, and this could result in the generated AI s not being adaptive enough to handle alternative tactics. Thus, any system which aims to produce AIs which are capable of defeating human beings employing a variety of strategies, should consider varying the tactics of the opponents faced by the candidate AI s. 7.2 Alternative Directions Video game genres that favor cyclic gameplay can possibly incorporate the evolution of BTs as part of the cycle, so that with each new iteration, the AI opponents use different strategies, based on the same root. Adding in some of the improvements mentioned earlier could lead to interesting ever-changing gameplay, creating an infinite problem for players to solve. Furthermore, it could be interesting to allow players to interact with this mechanic. This might not sound like a very active game mechanic, but it does not have to be the main one either. Extrapolating from this concept of player generated BTs leads to something resembling bot competitions for strategy games, which is an idea that has seen very little concrete game implementations, likely due to its very niche popularity. Perhaps a game that plays similar to XCOM 2 while the player is engaged, but is about evolving AIs through simulated battles while he is away from the game, could be worth developing within the context of Massive Multiplayer Online games, thus considering the high computational demands of such a design. The potential of using this mechanic as a way of constantly shifting the meta-strategy of the game presents yet another intriguing possibility for developers to create diversity. This could take advantage of the best of both worlds, allowing for intense tactical combat scenarios, deeply strategic AI evolution, as well as allowing the player to perhaps teach tactics to an AI.

93 Bibliography [1] J Champandard Alex. Behavior Trees for Next-Gen Game AI. In: Game Developers Conference, Lyon, France. 2007, pp [2] J Champandard Alex. Understanding behavior trees. Online, September [3] J Champandard Alex. Using decorators to improve behaviors [4] Entertainment Software Association, Ipsos Insight, et al. Essential facts about the computer and video game industry: 2015 sales, demographic and usage data. Entertainment Software Association, [5] T Bullen and M Katchabaw. Using Genetic Algorithms To Evolve Character Behaviours in Modern Video Games. [6] Jonathan Byrne, Michael O Neill, and Anthony Brabazon. Optimising Offensive Moves in Toribash Using a Genetic Algorithm. In: Proceedings of the Sixteenth International Conference on Soft Computing (MENDEL) [7] Nicholas Cole, Sushil J Louis, and Chris Miles. Using a genetic algorithm to tune first-person shooter bots. In: Evolutionary Computation, CEC2004. Congress on. Vol. 1. IEEE. 2004, pp [8] Daniel W. Dyer. "Evolutionary Computation in Java. A Practical Guide to the Watchmaker Framework." [9] John H Holland. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. U Michigan Press, [10] Damian Isla. Handling complexity in the Halo 2 AI. In: Game Developers Conference. Vol [11] John R Koza. Genetic programming: on the programming of computers by means of natural selection. Vol. 1. MIT press, [12] John Krajewski. Creating all humans: A data-driven AI framework for open game worlds. In: Gamasutra (February 2009) (2009). [13] François Dominique Laramee. Genetic algorithms: Evolving the perfect troll. In: AI game programming wisdom. Charles River Media (2002), pp

94 84 Bibliography [14] Chong-U Lim, Robin Baumgarten, and Simon Colton. Evolving behaviour trees for the commercial game DEFCON. In: Applications of evolutionary computation. Springer, 2010, pp [15] Sean Luke. Essentials of Metaheuristics. second. sean/ book/metaheuristics. Lulu, [16] Michael Mateas and Andrew Stern. Façade: An experiment in building a fully-realized interactive drama. In: Game Developers Conference. Vol [17] Alexander Nareyek. AI in Computer Games. In: Queue 1.10 (Feb. 2004), pp issn: doi: / url: acm.org/ / [18] Bentley James Oakes. Practical and theoretical issues of evolving behaviour trees for a turn-based game. PhD thesis. Citeseer, [19] Sergio Ocio. Adapting AI Behaviors To Players in Driver San Francisco: Hinted-Execution Behavior Trees. In: Eighth Artificial Intelligence and Interactive Digital Entertainment Conference [20] Petter Ogren. Increasing modularity of UAV control systems using computer game behavior trees. In: AIAA Guidance, Navigation and Control Conference, Minneapolis, MN [21] Michael O Neill and Conor Ryan. Grammatical evolution. In: IEEE Transactions on Evolutionary Computation 5.4 (2001), pp [22] Diego Perez et al. Evolving behaviour trees for the mario ai competition using grammatical evolution. In: Applications of evolutionary computation. Springer, 2011, pp [23] Ricard Pillosu. Coordinating Agents with Behavior Trees. In: Paris Game AI Conference [24] Tim Schreiner. "Artificial Intelligence in Game Design.". Artificial Intelligence Depot [25] Delmer Stephan. Behavior Trees for Hierarchical RTS AI. In: 2012.

95 Appendices A. Extra Content On the attached CD, readers can find additional content relevant to the understanding of the implementation of this project s test platform. We provide the Java code utilized for generating behavior trees, as a stand-alone Eclipse IDE 1 project, and the modification files used to alter XCOM 2, as a custom Visual Studio IDE 2 project. Additionally, we provide the source code to XCOM 2, for any additional relevant code references that may have been omitted, as well as the unmodified configuration ( *.ini ) files. Unfortunately, due to XCOM 2 being a commercial game, the actual use of the mod is dependent on ownership of the game and its installation being present on the target computer via a distribution platform, such as Steam. However, as mentioned earlier, the relevant *.uc and *.ini files can still be viewed using any simple text editor like Notepad++ 3. Given the large amount of folders provided, we recommend using any available search functionality to navigate to any of the files referenced in the following Appendix sections, as well as finding relevant sections of code within those files. The bonus content CD also features the raw experiment results sheets, under the folder Experiment results. The files can be explored with any Microsoft Office Excel type software. Furthermore, the introductory audio-video production accompanying the project can be found on the bonus content CD. 1 The Eclipse Foundation, 2 Mircrosoft Corporation, 3 Notepad ++, 85

96 86 Bibliography B. Unit Condition Implementation The implementation of the conditions can be explored in the file XGAIBehavior.uc which is part of the XCOM 2 source code, in the XCOM 2 source code folder.

97 C.. Unit Decision Implementation 87 C. Unit Decision Implementation The implementation of the actions are spread throughout a number of files, but they all share a naming convention for easy identification - X2Action_*ActionIdentifier*.uc - and are also part of the XCOM 2 source code, in the XCOM 2 source code folder.

98 88 Bibliography D. Questionnaire Questionnaire Questionnaire After answering the following items, we will ask you to play a total of 6 XCOM 2 matches, versus various computer controlled opponents. There are no restrictions for how you play the game, however, this is a modified version of XCOM 2 which features only a small selection of the game's mechanics. 1. Candidate Number 2. Age 3. Have you played any turn-based tactics games in the past? If yes, then name one or a few. 4. Please provide us with an estimation of your XCOM Enemy Unknown / Enemy Within play time: Mark only one oval. None 1-20 Hours Hours 60+ Hours 5. Please provide us with an estimation of your XCOM 2 play time: Mark only one oval. None 1-20 Hours Hours 60+ Hours Powered by 1 of 1 5/24/ :50 PM

99 E.. Classifying Evaluation Matches 89 E. Classifying Evaluation Matches The decision to re-simulate matches has been shown to in general with the amount of successful and stable candidates produced on a per generation basis, but it does have a cost in terms of optimizing the procedure which generates evolved AI s. It would be more efficient if it was possible to tell from the combat data of an evaluated candidate, if it needed to be re-simulated regardless if it won or if it lost. Multivariate classification techniques make it possible to find patterns in data that has a high dimensionality, and although only 4 combat data items are available to analyze, it could still be possible for classification technique to find useful patterns in the data.the idea is that these candidates who lost their re-evaluation matches have essentially been flagged as potentially having produced abnormal evaluation results for their contained behavior, due to favourable or otherwise RNG. If a classifier is trained on the data from matches which were not flagged, to understand which makes a candidate likely to win or lose, it could then be used to decide if any given candidate should be re-evaluated. To see if this might be possible, 2 sets of combat data were extracted from the entire set of all matches played by all candidates. The first set (allnoresims) contained the combat data for all matches in which candidates did not lose their re-evaluation match (the all prefix refers the set containing data from both studies 2 and 3). The second set (allonlyresims) contains the combat data for all matches of candidates which lost their second evaluation match. In classification terms, these sets will be the training and test sets, each match instance is a sample, and their combat data provides the feature vectors. The class labels for the sample data will classify each match as either being a win or a loss (1 and 0 respectively). A classifier based on the K-nearest neighbors algorithm, was trained on the allnoresims dataset. Using only the 4 combat data items, it was able find a classification which would correctly label a candidate as having won its match or not 74% of the time (figure 1). Figure 1: Classification error rates for a trained K-nearest neighbor classifier, evaluating both sets Although this value is a little optimistic given it was trained on that same dataset as it was tested on, using cross-validation shows the value to be closer to 70%. Ideally there would be additional combat data extracted from X-Com 2, to

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Scott Watson, Andrew Vardy, Wolfgang Banzhaf Department of Computer Science Memorial University of Newfoundland St John s.