Creating Human-like AI Movement in Games Using Imitation Learning

Size: px

Start display at page:

Download "Creating Human-like AI Movement in Games Using Imitation Learning"

Claude Higgins
6 years ago
Views:

Movement in Games Using Imitation Learning CASPER RENMAN KTH

1 DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2017 Creating Human-like AI Movement in Games Using Imitation Learning CASPER RENMAN KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

2 Creating Human-like AI Movement in Games Using Imitation Learning May 31, 2017 CASPER RENMAN Master s Thesis in Computer Science School of Computer Science and Communication (CSC) Royal Institute of Technology, Stockholm Swedish Title: Imitation Learning som verktyg för att skapa människolik rörelse för AI-karaktärer i spel Principal: Kristoffer Benjaminsson, Fast Travel Games Supervisor: Christopher Peters Examiner: Olov Engwall

3 iii Abstract The way characters move and behave in computer and video games are important factors in their believability, which has an impact on the player s experience. This project explores Imitation Learning using limited amounts of data as an approach to creating human-like AI behaviour in games, and through a user study investigates what factors determine if a character is human-like, when observed through the characters first-person perspective. The idea is to create or shape AI behaviour by recording one s own actions. The implemented framework uses a Nearest Neighbour algorithm with a KD-tree as the policy which maps a state to an action. Results showed that the chosen approach was able to create human-like AI behaviour while respecting the performance constraints of a modern 3D game.

4 iv Sammanfattning Sättet karaktärer rör sig och beter sig på i dator- och tvspel är viktiga faktorer i deras trovärdighet, som i sin tur har en inverkan på spelarens upplevelse. Det här projektet utforskar Imitation Learning med begränsad mängd data som ett tillvägagångssätt för att skapa människolik rörelse för AI-karaktärer i spel, och utforskar genom en användarstudie vilka faktorer som avgör om en karaktär är människolik, när karaktären observeras genom dess förstapersonsperspektiv. Iden är att skapa eller forma AI-beteende genom att spela in sina egna handlingar. Det implementerade ramverket använder en Nearest Neighbour-algoritm med ett KD-tree som den policy som kopplar ett tillstånd till en handling. Resultaten visade att det valda tillvägagångssättet lyckades skapa människolikt AI-beteende samtidigt som det respekterar beräkningskomplexitetsrestriktioner som ett modernt 3D-spel har.

5 Contents Contents v 1 Introduction Artificial Intelligence in games Imitation Learning Human-likeness Objective Limitations Report outline Background Imitation Learning Policy Demonstration State representation Policy creation Data collection Demonstration dataset limitations Related work Summary and state of the art Performance in games Measuring believability of AI Turing test-approach Automated similarity test Conclusion Implementation Setting Method motivation Implementation Summary Recording movement and state representation Playing back movement v

6 vi CONTENTS Policy Feature extraction Avoiding static obstacles Avoiding dynamic obstacles KD-tree Discretizing the environment Additional details Storing data Optimization and measuring performance Overall implementation Evaluation User study The set-up Participants Stimuli Procedure Hypothesis Results User study Imitation agent performance Discussion The imitation agent The user study Creating non-human-like behaviour Performance in relation to games Ethical aspects Conclusions Future work Use outside of games Bibliography 49

7 Chapter 1 Introduction This chapter gives a brief overview of Artificial Intelligence in games, Imitation Learning and human-likeness. It also presents the objective, limitations and the outline of the project. 1.1 Artificial Intelligence in games Computer and video games produce more and more complex virtual worlds. This introduces new challenges for the characters controlled by Artificial Intelligence (AI), also known as agents [20] or NPC s (Non-Player Characters), meaning characters that are not being controlled by a human player. The way characters move and behave in computer and video games are important factors in their believability, which has an impact on the player s experience. Being able to interact with NPC s in meaningful ways and feel that they belong in the world is important [4]. In Virtual Reality (VR) this is even more important, as the gaming experience is even more immersive. The goal of many games AI is more or less the same as attempts to beat the Turing test - to create believable intelligence [12]. A popular genre in computer and video games is First-person shooter (FPS). In an FPS game the player experiences the game through the eyes of the character the player is controlling, also known as a first-person perspective. Typically a player is at most able to see the hands and arms of the character the player is controlling. The player can however see the whole bodies of characters of other players characters or NPC s (Non-Player Characters). This is visualized in Figure

2 CHAPTER 1. INTRODUCTION Figure 1.1: An example first-person perspective game scenario, seen from the eyes of the character that the player controls. The blue and red characters are NPC s.

8 2 CHAPTER 1. INTRODUCTION Figure 1.1: An example first-person perspective game scenario, seen from the eyes of the character that the player controls. The blue and red characters are NPC s. AI in games is traditionally based on Finite State Machines (FSM), Behaviour Trees (BT) or other hand-coded techniques [27]. In these techniques, a programmer needs to explicitly define rules for what an agent should do in different situations. An example of such a rule could be: "if the character s health is low and the character sees a hostile character, the character should flee". These techniques work in the sense that the agent is able to execute tasks and adapt its behaviour to its situation, but the result is predictable and static [11]. For example, if a player sees an NPC react to a situation the same way it did in an earlier similar situation, the player can be quite sure that the NPC will probably always react like that given a similar situation. In 2006, Orkin [17] said: in the early generations of shooters, such as Shogo (1998) players were happy if the A.I. noticed them at all and started attacking.... Today, players expect more realism, to complement the realism of the physics and lighting in the environments. In order to get more realism and unpredictability in order to increase the entertainment for the player, it would perhaps be a good approach for agents to imitate human behaviour Imitation Learning Imitation Learning (IL) is a technique where the agent learns from examples, or demonstrations, provided by a teacher [1]. IL is a form of Machine Learning (ML). ML has been defined as the field of study that gives computers the ability to learn without being explicitly programmed [14]. Unlike Reinforcement Learning algorithms, IL does not require a reward function to be specified. Instead, an IL algorithm observes a teacher perform a task and learns a policy that imitates the teacher, with the purpose of generalizing to unseen data [28]. IL is regarded as a promising technique for creating human-like artificial agents [3]. Some approaches have shown to be able to develop agents with good performance in non-trivial tasks using limited amounts of data and computational resources [3]. It is a technique

9 1.2. OBJECTIVE 3 which also can be used to dynamically change game play to adapt to different players based on their play style and skill [7] Human-likeness Shaker et al. [24] describe character believability, which says that an agent is believable if someone who observes it believes that the agent is a human being. Player believability on the other hand, says that the agent is believable if someone observing the agent believes that it is a human controlling it. It is player believability that is meant by human-like in this project. 1.2 Objective The primary goal of this project is to describe a method for creating human-like agent movement using IL with limited amounts of data. The idea is to create an agent by recording one s own actions, shaping it with desired behaviours. Most related works in the field of IL in games want to create competitive AI, meaning AI that is good at beating the game. This is not the case in this project. The goal is to create AI that lets an agent imitate a demonstrating human well, while respecting the performance requirements of a modern 3D game. A hope is that this will lead to a more unpredictable and human-like agent which in turn could lead to better entertainment for a player playing the game. Lee et al. [9] say that human-like agent behaviour leads to a raised emotional involvement of the player, which increases the players immersion in the game. Whether it is more fun or not to play with a human-like agent will not be explored. This project aims to answer the following question: Q1: How can IL be used to create human-like agent behaviour, using limited amounts of data? This question is further split up into two sub-questions: Q1.1: How to create an agent that imitates demonstrated behaviour, using IL with limited amounts of data? Q1.2: What determines if a character is human-like, when observed through the character s first-person perspective? The human-likeness of the agent will depend on how human-like the human is when recording itself. This means that behaviour that is non-human-like will also be possible to create. Suppose that it is desired to create a behaviour for a dog in a game. A human would then record itself playing the game, role-playing a dog and behaving like it wants the dog to behave. If the intended behaviour is that the dog should flee when it sees something hostile, then so should the human when recording itself. The outcome should then be an agent that behaves like a dog.

10 4 CHAPTER 1. INTRODUCTION 1.3 Limitations By agent movement is meant that the actions that the agent can execute are limited to movement including rotation, i.e. moving from one position to another. As contrast, actions that are not considered movement in this project could for example be shooting, jumping or picking up items. The simulations will be done in a 3D environment but the movement of the implemented agent will be limited to a 2D plane. This means that the agent will not be able to walk up a ramp or climb stairs for example. The movement behaviour of the agent will be limited by the feature extractors implemented, as described in the implementation chapter. In theory, any behaviour which only requires the agent to be able to move could be implemented, like path-finding and obstacle avoidance for example. The project will use limited amounts of data, meaning that it should be possible to create agent behaviour using the framework created in this project, by recording one s own actions for a couple of minutes. The motivation for this is that if game developers should be able to design their own agent behaviour for a game, there will not exist data for them to use. Some works listed in the related works section perform their experiments in big games such as Quake III 1, where there is a lot of saved data available. Quake is a first-person shooter video game. This allows them to use complex algorithms which perform better with more data. Not requiring a lot of data is also thought to make the contributions of this work more attractive to the gaming industry, as it will require less time and effort to be able to utilize. 1.4 Report outline The report starts with presenting background information about the areas of Imitation Learning and measuring believability of AI, and related work. Following is the implementation chapter which motivates the choice of methods and describes the implementation process. The evaluation chapter describes the user study which was conducted in order to evaluate the human-likeness of the resulting imitation agent. It also presents the results of the user study and a brief performance measurement of the imitation agent, as well as summarizes what was done in the project and discusses the results. Finally conclusions are made in the conclusions chapter. 1

11 Chapter 2 Background This chapter presents background knowledge and related works about Imitation Learning and measuring believability of AI controlled characters. It also presents why heavy computations with long computational times are particularly bad in games. 2.1 Imitation Learning The work by Argall et al. [1] is frequently cited and is a comprehensive survey of IL. The survey is the biggest source of background knowledge in the area of IL in this project. They describe IL as a subset of Supervised Learning, where an agent learns an approximation of the function that produced the provided labeled training data, called a policy. The dataset is made up out of demonstrations of a given task Policy A policy π is a function that maps a state x to an action u. A policy allows an agent to select an action based on its current state. Developing a policy by hand is often difficult. Therefore machine learning algorithms have been used for policy development [1] Demonstration A demonstration is a sequence of state-action pairs that are recorded at the time of the demonstration of the desired behaviour [1]. This way of learning a policy through examples, differs from learning it based on data collected through exploration such as in Reinforcement Learning [25]. A feature of IL is that it focuses the dataset to areas of the state-space that is actually encountered during the execution of the behaviour [1]. This is a good thing in games where computation time is very limited, as the search space of appropriate solutions is reduced. 5

12 6 CHAPTER 2. BACKGROUND State representation A state can be represented as either discrete, e.g. can see enemy or cannot see enemy or continuous, e.g. 3D position and rotation of the agent Policy creation Creating a policy can be done in different ways. A mapping function uses the demonstrated data to directly approximate the function mapping from the agent s state observations to actions (f() : Z A) [1]. This can be done using either classification where the output is class labels, or regression where the output consists of continuous values. A system model uses the demonstrated data to create a model. A policy is then derived from that model [1]. Plans use the demonstrated data together with user intention information to learn rules that associate pre- and post-conditions with each action. A sequence of actions is then planned using that information [1] Data collection The correspondence problem [16] has to do with the mapping between the teacher and the learner (see Figure 2.1). For example, a player playing an FPS game using a mouse and keyboard sends inputs which are processed by the game and translated into actions. An NPC in the same game, is controlled by AI which sends commands to control the character several times per second, which is not directly equivalent to keystrokes and mouse movements of a human player. Figure 2.1: Visualization of the record mapping and embodiment mapping. The record mapping is the extent to which the exact states/actions experienced by the teacher during demonstration are recorded in the dataset [1]. If there is no record mapping or a direct record mapping, the exact states/actions are recorded in the dataset. Otherwise some encoding function is applied to the data before storing the data. The embodiment mapping is the extent to which the states/actions recorded within the dataset are exactly those that the learner would observe/execute [1]. If there is no embodiment mapping or a direct embodiment mapping, the recorded states/actions are exactly those that the learner will observe/execute. Otherwise there is a function which maps the recorded states/actions to actions to be executed. Two data collection approaches are demonstration and imitation [1]. In demonstration, the teacher can operate the learner through teleoperation where the record

13 2.2. RELATED WORK 7 mapping is direct. There is also shadowing where the agent tries to mimic the teachers motions by using its own sensors. Here the record mapping is non-direct. Within imitation the embodiment mapping is non-direct, and the teacher execution can be recorded either with sensors on the teacher where the record mapping is direct, or external observation where the record mapping is non-direct Demonstration dataset limitations In IL, the performance of an agent is heavily dependent on the demonstration dataset. Low learner performance can be due to areas of the state space that have not been demonstrated. This can be solved by either improving upon the existing demonstrations by generalizing them or through acquisition of new demonstrations [1]. As mentioned, low performance can also be caused due to low quality of the demonstration dataset [1]. Dealing with this involves eliminating parts of the teacher s executions that are suboptimal. Another solution is to let the learner learn from experience. If feedback is provided on the learners actions, this can be used to update the policy [1]. The demonstration dataset limitations are not dealt with in this project, as it is considered out of scope. It is however mentioned as a possible extension in the Future work chapter. 2.2 Related work This section gives an overview of the related work in the field of Imitation Learning in games in chronological order. Thurau et al. [26] in "Imitation In All Levels of Game AI" create bots for the game Quake II 1. Different algorithms are presented that learn from human generated data. They create behaviours on different levels: strategic behaviour used to achieve long-term goals, tactical behaviour used for localized situation handling such as anticipating enemy movement, and reactive behaviour like jumping, aiming and shooting. The generated bots are compared to the existing Quake II bots. It is shown that Machine Learning can be applied on different behavioural layers. It is concluded that Imitation Learning is well suited for generating behaviour for artificial game characters. The bots created with Imitation Learning outperformed the Quake II bots. It should however be taken into consideration that these results are thirteen years old at the time of writing this report. Priesterjahn et al. [20] in "Evolution of Reactive Rules in Multi Player Computer Games Based on Imitation" propose a system in which the behaviour of artificial opponents is created through learning rules by observing human players. The rules are selected using an evolutionary algorithm with the goal of choosing the best and most important rules and optimizing the behaviour of the agent. 1

14 8 CHAPTER 2. BACKGROUND The paper shows that limited learning effort is needed to create behaviour which is competitive in reactive situations in the game Quake III. After a few generations of the algorithm, the agent was able to behave in the same way as the original players. In the conducted experiments, the generated agent outperformed the built in game agents. The world is simplified to a single plane. The plane is divided into cells in a grid, with the agent centered in the grid. The grid moves relative to the agent. Each frame, the agent checks each cell if it is empty or not and scores it accordingly. They limit the commands to moving and attacking or not attacking. A rule is a mapping from a grid to a command. Human players are recorded and a basic rule set is generated by recording the grid-to-command matches every frame of the game. An evolutionary algorithm is then used to learn the best rules and thus the best competitive behaviour. Saunders et al. [21] in "Teaching Robots by Moulding Behavior and Scaffolding the Environment" teaches behaviour to robots by moulding their actions within a scaffolded environment. A scaffolded environment is an environment which is modified to make it easier for the robot to complete a task, when the robot is at a developmental stage. Robot behaviour is created by teaching state-action memory maps in a hierarchical manner, which during execution are polled using a k-nearest Neighbour based algorithm. Their goal was to reproduce all observable movement behaviours. Their results show that the Bayesian framework leads to human-like behaviour. Priesterjahn [19] in "Imitation-Based Evolution of Artificial Players in Modern Computer Game" which is based on the paper by [20], proposes the usage of imitation techniques to generate more human-like behaviours in an action game. Players are recorded, and the recordings are used as the basis of an evolutionary learning approach. The approach is motivated by stating that to behave human-like, an agent should base its behaviour on how human players play the game and try to imitate them. This as opposed to a pure learning approach based on the optimization of behaviour, which only optimizes the raw performance of the game agent. The authors present the result of the conducted experiments and explain that the imitation-based initialization has a big effect on the performance and behaviour of the evolved agents. The generated agents showed a much higher level of sophistication in their behaviour and appeared much more human-like than the agents evolved using plain evolution, though performing worse. Cardamone et al. [3] in "Learning Drivers for TORCS through Imitation Using Supervised Methods" develop drivers for The Open Racing Car Simulator (TORCS) using a direct method, meaning the method uses supervised learning to learn driving behaviour from data collected from other drivers. They show that by using highlevel information about the environment and high-level actions to be performed, the developed drivers can achieve good performance. High-level actions mean that they learn trajectories and speeds along the track, and let controllers achieve the target

15 2.2. RELATED WORK 9 values. This as opposed to predicting/learning low-level actions such as pressing the gas pedal an amount, or rotate the wheel an amount of degrees. It is also stated that the performance can be achieved with limited amounts of data and limited computational power. The learning methods used are k-nearest Neighbour and Neural Networks with Neuroevolution. The performance is measured in how fast a driver completes a race, which means they want to create an AI that is good at playing the game. It is compared to the best AI driver. Munoz et al. [15] in "Controller for TORCS Created by Imitation" create a controller for the game TORCS using Imitation Learning. They use three types of drivers to imitate: a human player, an AI controller created with Machine Learning and one hand-coded controller which performs a complete lap. The imitation is done on each of the drivers separately and then a mix of the data is combined into new controllers. The aim of the work is to create competitive NPCs that imitate human behaviour. The learning method is feed-forward Neural Networks with Backpropagation. The performance of the driver is measured by how fast a driver completes a race. It is compared to other AI and human drivers. They conclude that it is difficult to learn from human behaviour, as humans do not always perform the same actions given the same situation. Humans also make mistakes, which is not good behaviour to learn if the goal is to create a driver that is good at playing the game. Mehta et al. [13] in "Authoring Behaviors for Games using Learning from Demonstration" is similar to [21] in that behaviour is taught by demonstrating actions and annotating the actions with a goal. Here, the learning involves four steps: Demonstration: Playing the game. Annotation: Specifying the goals the teacher was pursuing for each action. Behaviour learning: Using a temporal reasoning framework. Behaviour execution: Done through a case-based reasoning (CBR) technique, case-based planning. The goal of this project was to create a framework in which people without programming skills can create game AI behaviour by demonstration. The authors conclude that by using case-based planning techniques, concrete behaviours demonstrated in concrete game situations can be reused by the system in a range of other game situations, providing an easy way to author general behaviours. Karpov et al. [8] in "UT 2 : Believable Bot Navigation via Playback of Human Traces" create the UT 2 bot for the BotPrize competition 2, a Turing-like test where 2

16 10 CHAPTER 2. BACKGROUND computer game bots compete by attempting to fool human judges into thinking they are just another human player. UT 2 broke the 50% humanness threshold and won the grand prize in The bot has a component called the Human Trace Controller, which is inspired by the idea of direct imitation. The controller uses a database of recorded human games in order to retrieve and play back segments of human behaviour. The results show that using direct imitation allows the bot to solve navigation problems while moving in a human-like fashion. Two types of data are recorded, pose data and event data. The pose includes position, orientation, velocity and acceleration. An event is for example switching weapons, firing weapons or jumping. All of the pose and event data for a player in a particular game form a sequence. Sequences are stored so that preceding and succeeding event and pose data can be retrieved from any given pose or event. In order to be able to quickly retrieve the relevant human traces, they implemented an efficient indexing scheme of the data. The two most effective indexing schemes used were Octree based indexing and Navigation Graph based indexing using a KDtree. Ortega et al. [18] in "Imitating Human Playing Styles in Super Mario Bros" describe and compare different methods for generating game AI based on Imitation Learning. Three different methods for imitating human behaviour are compared: Backpropagation, Neuroevolution and Dynamic scripting. The game is in 2D. Similarity in playing style is measured through comparing the play trace of one or several human players with the play trace of an AI player. The methods compared are hand-coded, direct (based on supervised learning) or indirect (based on maximizing a similarity measure). The conclusion is that a method based on Neuroevolution performs best both when evaluated by the similarity measure and by human spectators. Inputs were the game state, e.g. enemies, obstacles and distance to gaps and outputs were actions Summary and state of the art In 2006, Gorman et al. [6] stated that every particular game is different from the other, and claimed that it thus probably is impossible to suggest an ultimate approach. They said that "Currently, there are no generally preferred knowledge representation data structures and machine learning algorithms for the task of creating believable behaviour". They claim that believable characters should possess certain features, that hardly can be achieved without observing and/or simulating human behaviour. Imitation Learning is listed as a proven human behaviour acquisition method. Few of the works listed here have the sole aim of creating an agent that imitates demonstrated behaviour as well as possible, and no such works could be found. Most

17 2.3. PERFORMANCE IN GAMES 11 have another aim, such as performing as well as a human, or performing well after being inspired by human behaviour. The most popular and successful approach in these works are using Neural Networks with Neuroevolution, which is a form of Machine Learning that uses evolutionary algorithms to train Neural Networks [18]. The Human Trace Controller in the work by Karpov et al. [8] however, is the most recent and successful found work which aims to imitate demonstrated behaviour, without doing it in a "beating the game"-manner. 2.3 Performance in games In games it is important to keep computational times low and a high and stable frame rate, usually measured in frames per second (FPS). The frame rate is the frequency at which frames (images) in a game (or video) are displayed. A high frame rate typically means about 60 FPS for normal computer games and about 90 FPS for VR games, in order to have objects on the screen appear to move smoothly. Games usually contain a function called the update or tick function, which runs once every frame. The game will wait for the update function to finish before processing the next frame. If the calculations made in the update function take longer than the time slot for one frame (in order to keep 90 FPS, one frame has 1/ ms to run its calculations) the game will not be able to stay at its target FPS and will not run as smoothly. 2.4 Measuring believability of AI Umarov and Mozgovoy [27] study current approaches to believability and effectiveness of AI behaviour in virtual worlds and gives a good overview of different approaches. They talk both about measuring believability as well as various implementations for achieving it in games. It is stated how believability is not the only feature that makes AI-controlled characters fun to play with. A game should be challenging, so the agent should also be skilled or effective. However, they explain that the goals of believability and effectiveness are not always the same. A skilled agent is not necessarily believable, and a believable agent might be a weak opponent Turing test-approach To evaluate the believability of an AI controlled character, Umarov and Mozgovoy [27] refer to a Turing test-approach, where a human player (judge) plays a game against two opponents, where one opponent is controlled by a human and one is controlled by an AI. The judge s task is to determine which one is human. A simplification of this test is also mentioned, where the judge instead watches a game between two players which both can be controlled either by a human or an AI. The judge s task is

18 12 CHAPTER 2. BACKGROUND to identify game participants. Lee et al. [9] learn human-like behaviour via Markov decision processes in the 2D game Super Mario. They evaluate the human-likeness by performing a modified Turing test [22] as well. Gorman et al. [6] performed an experiment which [27] refers to. Quake II agents were evaluated by showing a number of people a series of video clips as seen by the characters first-person camera. The task was to identify whether the active character is human. The different characters were controlled by a real human player, a Quake agent and a specifically designed imitation agent that tried to reproduce human behaviour using Bayesian motion modeling. The imitation agent was misidentified as a human 69% of the time and the Quake agent was mistaken as a human 36% of the time. "Sample evaluators comments, quoted in (Gorman et al., 2006), indicate that quite simple clues were used to guess human players ( fires gun for no reason, so must be human, stand and wait, AI wouldn t do this, unnecessary jumping )" Automated similarity test One way to compare human player actions and agent actions, is by comparing velocity direction angle changes and frequencies of angles between player direction and velocity direction. Another is to compare pre-recorded trajectories of human players with those of agents [27]. 2.5 Conclusion This chapter presented Imitation Learning and the different challenges that it involves. Then related works were listed and the state of the art was determined. It seems like a direct imitation method is a good approach as used by Karpov et al. [8]. Since no learning is done the approach should give a lot of control, which is good as the computational performance of AI in games is important. The choice of method is described in detail in the next chapter. In order to evaluate the believability of the agent, a Turing test-approach is described as an option. The evaluation is described in Chapter 4.

19 Chapter 3 Implementation This chapter describes the implementation of the Imitation Learning framework and thereby aims to answer Q1.1. Section provides a summary of what was implemented. Throughout an iterative implementation process it was determined what to implement, in order to create an agent with behaviour which can be evaluated. The agent created in this process will be referred to as the agent when no other type of AI controlled character is in the same context. Otherwise it will be referred to as the imitation agent. 3.1 Setting The implementation was carried out in the Unity Pro game engine 1. Unity is a cross-platform game engine developed by Unity Technologies and used to develop video games for PC, consoles, mobile devices and websites. 3.2 Method motivation To keep the complexity of the framework low, and to allow for quick evaluation and iteration, it was decided to go with a Nearest Neighbour (NN) classification approach as used by Cardamone et al. [3]. Policy creation is thus done through a mapping function. No learning is done, and the collected data represents the model. Argall et al. [1] state that regardless of learning technique "minimal parameter tuning and fast learning times requiring few training examples are desirable". This speaks against more sophisticated algorithms such as Neural Networks, which require a lot of data to perform well. Cardamone et al. [3] claim that it is desirable to have the output of the agent be high-level actions, such as a target position and velocity as opposed to low-level actions such as a certain key press for a certain

20 14 CHAPTER 3. IMPLEMENTATION amount of time. Other classification techniques may perform as well or better than Nearest Neighbour algorithms, but the focus of the thesis is not to compare or find the best classification algorithm. It is however important that the algorithm is fast, as there is not much time for heavy calculations in a game. Karpov et al. [8] show that using direct imitation, i.e. playing back recorded segments of human gameplay as they were recorded, allows the bot to solve navigation problems while moving in a human-like fashion. Their work passes the test of a structured and recognized competition aimed at measuring human-likeness, which gives the work high credibility. It is also one of the most recent works. This project was therefore inspired by their solution. The implementation used imitation as the data collection approach, where the record mapping is direct and the embodiment mapping is indirect. This is described in more detail in the next section. 3.3 Implementation Summary An Imitation Learning framework was created which allows a human to create human-like agent behaviour by recording its own actions. Below is a summary of the implementation of the imitation agent. Details are described in the subsections following this summary. Recording movement: The human is in control of the agent and the agent s state is continuously recorded. Playing back movement: The agent moves by executing actions. An action is a set of states. An action is chosen by classifying the agent s state and weighing actions. Classification is done using a Nearest Neighbour algorithm. Feature extraction: The agent uses sensors to sense the environment. Reading the sensors results in a feature vector that is a representation of the environment. Avoiding static obstacles: If there is recorded data which corresponds to the agent s current state, the agent will be able to avoid obstacles by executing the nearest neighbour action. If that is not the case, static obstacles are avoided by checking if an action goes through a static obstacle or not in the Nearest Neighbour algorithm. If it does the action is not considered a near neighbour and is not chosen. Avoiding dynamic obstacles: Dynamic obstacles are avoided like static obstacles, but a different feature extractor is utilized which extracts different features. The dynamic obstacle avoidance was the last part of the implementation process.

3.3. IMPLEMENTATION 15 KD-tree: A KD-tree is used to speed up the Nearest Neighbour algorithm. Grid: The environment is discretized into a grid of cells. The grid is used in weighing actions.

21 3.3. IMPLEMENTATION 15 KD-tree: A KD-tree is used to speed up the Nearest Neighbour algorithm. Grid: The environment is discretized into a grid of cells. The grid is used in weighing actions. An action is weighted with the score a cell. The grid can be manipulated to make the agent move to a destination Recording movement and state representation Figure 3.1: Flowchart visualizing the record mode. The agent can be in either Record or Playback mode. During recording, a human is in control of the agent from the agent s first-person perspective using a mouse and keyboard. The record mapping was direct, meaning that the exact states/actions were recorded in the dataset. Data was recorded when the direction vector of the agent changed, and the distance between the agent s current position and the last

16 CHAPTER 3. IMPLEMENTATION recorded position was bigger than a set threshold. The policy that an IL algorithm is meant to learn, maps a state x to an action u.

22 16 CHAPTER 3. IMPLEMENTATION recorded position was bigger than a set threshold. The policy that an IL algorithm is meant to learn, maps a state x to an action u. Adopting this terminology, one record of data was structured as a state. Several states make up an action. A state consists of two parts. The first part is the agent s position, rotation and direction (i.e. the agent s forward vector), called the pose state. The state representation is thus continuous. The pose state also contains the time passed between the previous state and the current state. The second part is a feature vector of floats, corresponding to a representation of the environment at the current pose state. This second part is called the sensor state. How the sensor state is created is explained in further detail in the section Avoiding static obstacles. Karpov et al. [8] similarly use sequences of states for representing the stored human traces, separating them into a pose state and an event state. The data is stored by writing all recorded states as binary data to a file. When more data is recorded, the data is appended to the existing file. Figure 3.2 shows one environment, or scene, used during development at an early stage of the implementation process. The aim here was to play back recorded data by having the agent move to the closest position in the recorded data. Figure 3.2: The scene. Recorded trajectory data in black and the agent s trajectory in blue Playing back movement During Playback the agent moves on its own by executing actions. Executing an action means moving from one recorded pose state to the next, interpolating between states to achieve a position and rotation that approximate the recorded data. This interpolation/approximation is a form of embodiment mapping, as the agent maps the recorded data into movement. The embodiment mapping was therefore nondirect, meaning that the recorded states/actions were not exactly those that the agent would execute. To find an action to execute, the agent s sensor state is classified using a NN algorithm. The algorithm returns the nearest recorded action to the agent s current sensor state. This action is then applied relative to the agent s current pose state so that the action s first state is the same as the agent s current

23 3.3. IMPLEMENTATION 17 rotation. To create smooth rotation between states the following was done: Suppose that the agent is at the first state a where it has the correct rotation r 1, and the next pose state is b containing rotation r 2. When moving from a to b, the rotation of the agent is set to be the value of the interpolation between r 1 and r 2 by the distance traveled from a to b. Upon reaching b the rotation is therefore r 2. Slight errors in the imitation occurs here, since the human most likely did not rotate at a constant speed when demonstrating. However, making the distance between states short made it hard to tell a difference when observing the agent. When the agent has finished executing an action, meaning it has reached the final pose state position of an action, the process is repeated by classifying the sensor state again Policy A policy is a function that maps a state to an action. The NN algorithm receives a state as input, efficiently finds the best action with the KD-tree data structure and returns it. Thus the NN algorithm with the KD-tree can be said to be the policy Feature extraction An IL algorithm learns a policy that imitates the teacher, with the purpose of generalizing to unseen data. In order to generalize, the agent had to sense its environment and represent it in a way which allows for recognizing similar states. The feature extraction process uses sensors on the teacher to sense the environment and represents it as a vector of floats, called the feature vector or simply the features. When recording a state or classifying a state, the sensor state is created by extracting features for the agent s current pose state Avoiding static obstacles In many games, a desirable skill for an agent to have is to be able to avoid obstacles, so called obstacle avoidance. In order to be able to avoid static (non-moving) obstacles, such as walls, sensors were implemented similar to the ones used by the authors of [8] in [23]. They show a figure similar to Figure 3.3a which represents the sensors they use on their Quake III bot. Their motivation was that there are more sensors near the front so that the agent can better distinguish locations in front of it.

The function returns information about what was hit, including the distance to the hit obstacle/collider. This results in a feature vector v containing the distances x1,.

24 18 CHAPTER 3. IMPLEMENTATION (a) (b) Figure 3.3: Sensors similar to those used by Schrum et al. [23] (a) were added to the agent (b). The feature extractor creates the sensor state by ray casting in all sensor directions using Unity s function Physics.Raycast. The function returns information about what was hit, including the distance to the hit obstacle/collider. This results in a feature vector v containing the distances x1,..., x6 to obstacles in the different directions. Figure 3.4 shows how data could be recorded in one environment (Figure 3.4a) and played back in another (Figure 3.4b), thus showing that the approach generalizes to new environments. (a) (b) Figure 3.4: Recorded traces in black, the chosen action in green, the chosen action applied to the agent in blue and sensors in white.

25 3.3. IMPLEMENTATION 19 Figure 3.4b shows how the agent currently is in the top right corner. When classifying its state, it is determined that an action should be chosen as if it currently was in the lower left corner (the action is highlighted in green). This makes sense, as it is a similar situation. If there is recorded data which corresponds to the agent s current state, the agent will be able to avoid obstacles by executing the nearest neighbour action. However, that may not always be the case, as there probably will not be recorded data for every possible state. Therefore the NN algorithm checks if actions go through a static obstacle, and if so does not consider them near neighbours and they will not be chosen Avoiding dynamic obstacles Another common task for game AI is to be able to avoid moving (dynamic) obstacles. A new feature extractor was created which sensed the environment in a different way. The area within a certain radius around the agent was sensed with the purpose of sensing moving obstacles, visualized in Figure 3.5a. To be able to recognize a state correctly, it was needed to be able to differentiate between obstacles moving in different directions. For example if an obstacle is close and headed straight towards the agent, the agent should probably dodge the obstacle somehow. If the obstacle is headed away from the agent however, no particular action needs to be taken. Intuitively when an agent should avoid an obstacle, it would be important to know: How close is the obstacle to the agent? Is the obstacle moving towards or or away from the agent? Will the obstacle hit the agent if the agent does not move? What is important is to be able to distinguish one state from another. The resulting extractor extracts three features per moving obstacle within the sensor. This is described in Algorithm 1 and visualized in Figure 3.5b. The features are

26 20 CHAPTER 3. IMPLEMENTATION Algorithm 1 Dynamic obstacle extractor 1: function ExtractFeatures(agent) 2: Sort obstacles in sensor by distance 3: for each moving obstacle obstacle at index i in sensor do 4: velocitysimilarity dot(agent.velocity, obstacle.velocity) 5: sqrdist sqrdist(agent, obstacle) 6: diffvector obstacle.position - agent.position 7: velpossimilarity dot(diffvector, agent.velocity) 8: 9: features[3 * i] velocitysimilarity 10: features[3 * i + 1] sqrdist 11: features[3 * i + 2] velpossimilarity (a) (b) Figure 3.5: The new sensor (a) and visualization of the vectors used in calculating features for the dynamic obstacle extractor (b). The velocitysimilarity is the dot product of the agent velocity and the obstacle velocity. It will tell whether an obstacle is heading in the same direction as the agent or not. velpossimilarity is the dot product between the diffvector and the agent s velocity. This value says whether the obstacle lies in the agent s current path or not. If this value is 1 it means that the two vectors are in the same direction. This means that the agent is headed straight towards the obstacle. sqrdist could act as a weight for how crucial the situation is. The proposed approach is by no means the correct or the best solution. Different approaches similar to the above were tried, but these values were able to distinguish the agent s state the best out of the tried values. Using this with recorded data containing around 100 actions demonstrating how to avoiding a single obstacle, the

In theory, like with static obstacle avoidance, if there is data for every situation, the feature extractor separates different situations well and the quality of the data is good, then the agent

27 3.3. IMPLEMENTATION 21 agent was able to avoid a single obstacle efficiently. Attempts were also made with more obstacles at the same time. In many situations, the agent would avoid obstacles well, but in some it would not. In theory, like with static obstacle avoidance, if there is data for every situation, the feature extractor separates different situations well and the quality of the data is good, then the agent should be able to always avoid obstacles. Good data is meant in the sense of the current goal behaviour. If the goal behaviour is obstacle avoidance, the data is good if the recorded human performed good/avoiding actions and did not walk into an obstacle while recording. (a) t = 1 (b) t = 2 Figure 3.6: The agent avoiding an obstacle (blue square) moving in the opposite direction. The blue curve is the agent s chosen action trajectory that it chose at t = 1 when it sensed the obstacle. At t = 2 the agent has moved further along the trajectory and the obstacle has moved further to the right KD-tree It was decided to implement a data structure to make the NN algorithm more efficient. Karpov et al. [8] use a KD-tree as one of their approaches to efficiently retrieve recorded data. KD-tree is a common approach to making NN algorithms more efficient. Weber et al. [29] showed that if a nearest neighbour approach is used in a space of magnitude higher than ten dimensions, it better to use a naive exhaustive search. The reason is that the work of partitioning the space becomes more expensive than the similarity measure. The number of features was six (distance to walls in six directions), which is less than ten, so a KD-tree should speed up the NN algorithm. A KD-tree is a space-partitioning data structure for organizing points in k-dimensional space. During construction, as one moves down the tree, one cycles through the axes used to select the splitting planes that divide the space. In the case of a twodimensional space, this could be the x and y coordinates (Figure 3.7). Points are inserted by selecting the median point from the list of points being inserted, with

28 22 CHAPTER 3. IMPLEMENTATION X root (7, 2) Y (5, 4) (9, 6) X (2, 3) (4, 7) (8, 1) (2, 7) Figure 3.7: The points (7, 2), (5, 4), (2, 3), (4, 7), (9, 6), (8, 1), (2, 7) inserted in the KD-tree. respect to the coordinates in the axis being used. If one starts with the x axis, the points would be divided into the median point with respect to the x coordinate and two sets: the points with an x coordinate less than the median and the points with an x coordinate bigger than the median. Then, recursively the two sets do the same thing, cycling on to the next axis (y). This would correspond to cycling through the features representing distances to walls in different directions. Algorithm 2 describes the construction of the KD-tree. Algorithm 2 Construction of the KD-tree 1: function BuildTree(actions, depth = 0) 2: dimensions numfeatures(actions) 3: axis depth % dimensions 4: sort(actions) by comparing feature[axis] for actions 5: median median element in sorted actions 6: if median is the only element then 7: return TreeNode(median, null, null, axis) 8: a actionsbeforemedian 9: b actionsaftermedian 10: return TreeNode(median, BuildTree(a, depth + 1), 11: BuildTree(b, depth + 1), axis) The nearest neighbour algorithm using the KD-tree is described in Algorithm 3. The search time is on average O(log n).

29 3.3. IMPLEMENTATION 23 Algorithm 3 The Nearest Neighbour algorithm 1: function NN(node, inputstate, ref nearestneighbour, ref nearestdist) 2: if node is null then 3: return 4: searchpointaxisvalue inputstate[node.axis] 5: dist 6: nodeaxisvalue, index 0 7: 8: // Determine how near current action is to input 9: for state s at index i in node.action do 10: if dist(inputstate, s) < dist then 11: dist dist(inputstate, s) 12: nodeaxisvalue node.action.state(i)[node.axis] 13: index i 14: if node.leftchild is null && node.rightchild is null then 15: return 16: 17: // Applying the action on the current state 18: appliedaction applyactiononstate(inputstate, node.action) 19: 20: // Let calling model weigh action (it may i.e. go through an obstacle) 21: weight weighaction(callingmodel, appliedaction) 22: dist weight 23: 24: // Determine the nearest side to search first 25: nearestside, furthestside null 26: if searchpointaxisvalue < nodeaxisvalue then 27: nearestside node.leftchild 28: furthestside node.rightchild 29: else 30: nearestside node.rightchild 31: furthestside node.leftchild 32: NN(nearestSide, inputstate, nearestneighbour, nearestdist) 33: if dist < nearestdist then 34: // Update nearest neighbour as recursion unwinds 35: nearestneighbour node.action 36: nearestdist dist 37: 38: // Check if it is worth searching on the other side 39: nearestaxisvalue nearestneighbour.state(index)[node.axis] 40: splittingplanedist dist(inputstate, splittingplane) 41: nearestneighbourdist dist(inputstate, nearestneighbour) 42: if splittingplanedist < nearestneighbourdist then 43: NN(furthestSide, inputstate, nearestneighbour, nearestdist)

30 24 CHAPTER 3. IMPLEMENTATION Following is a short and slightly simplified explanation of the algorithm. An extended description of how the algorithm works can for example be found in the Wikipedia article 2. The algorithm recursively moves down the tree, starting from the root. When it reaches a leaf, that leaf is set as the current best. As the recursion unwinds, each node compares its distance to the input to the current best. If the distance is smaller than the current best, then the node is set to the current best. It also checks whether it is possible that a nearer neighbour can be on the other side of a node. If the distance between the current best node and the input search point is bigger than the distance from the input search point to the current node, then there might be a nearer neighbour on the other side of the current node, so that side is searched. When the search reaches the root node, the search is done Discretizing the environment In games, it is desirable to be able to tell an AI to go to a position. This diverges from the Imitation Learning, as the sensor state is not used to decide what action to execute. Instead an external input says what position to go to. It was decided to implement it however, for the sake of practical usability. One could argue that the agent still moves in a human-like fashion, as it executes actions the same way the actions were recorded, and the only way for the agent to move is by executing actions. A first approach in making the agent go to a goal was to weigh the actions by how close an action would take the agent towards the goal. This worked to some extent, but the agent did not register where it had been or if it walked into a dead end. This resulted in it sometimes walking around in the same area for a long time, without realising that it did not get closer to the goal. The phenomenon is shown in Figure 3.8. It was therefore concluded that some sort of path finding was needed and that it would help to be able to say if a position on a map was good or bad, or close to the goal or not. 2

3.3. IMPLEMENTATION 25 Figure 3.8: Problem with getting stuck. The blue lines show traces of the agent trying to get to the white goal. Priesterjahn et al.

31 3.3. IMPLEMENTATION 25 Figure 3.8: Problem with getting stuck. The blue lines show traces of the agent trying to get to the white goal. Priesterjahn et al. [20] used a grid to represent a state in their Neuroevolution approach. Inspired by them, the map was discretized into a grid of cells where each cell had a score which represented the distance from the cell to the goal. Actions were then weighted by the score of the cell that the action ended up in. A lower score means closer to the goal (greener in Figure 3.9a). As the agent moved around the map, the score of the nine adjacent cells to the agent were increased, thus decreasing the chance of picking an action which ended up in one of those cells again. Spending time in a corner would result in those cells getting a higher score, which would lead to the agent not going there again. This is visualized in Figure 3.9.

26 CHAPTER 3. IMPLEMENTATION (a) The grid. (b) t = 1 (c) t = 2 (d) t = 3 Figure 3.9: As cells are visited, their scores are increased.

This was however more of an exploring approach, which could be used if the agent does not know where the goal is.

Telling the agent to go to a position means that the agent knows where the goal is. Therefore a better path finding strategy was implemented.

32 26 CHAPTER 3. IMPLEMENTATION (a) The grid. (b) t = 1 (c) t = 2 (d) t = 3 Figure 3.9: As cells are visited, their scores are increased. This approach solved the problem of the agent getting stuck in corners or close to the goal but on the wrong side of a wall. This was however more of an exploring approach, which could be used if the agent does not know where the goal is. Unless the agent is meant to be blind, this strategy would need to be improved by scoring cells which the agent can see. Telling the agent to go to a position means that the agent knows where the goal is. Therefore a better path finding strategy was implemented. Using the classic A* algorithm 3, the grid would calculate the shortest path from the agent to the goal, and score each cell the shortest path touches with its path distance to the goal. Other cells were scored with a bad score. This is visualized in Figure The grid is the tool a programmer/user would use to influence what the NN algorithm should consider a good action to be. In the NN algorithm, actions are weighted according to the cell score at the action s last pose state position. 3

3.3. IMPLEMENTATION 27 Figure 3.10: Cells that touch the A* path from the agent to the goal are scored with a low score (green). 3.3.10 Additional details The length of an action could be chosen, which would split up the recorded data into actions of the given length.

33 3.3. IMPLEMENTATION 27 Figure 3.10: Cells that touch the A* path from the agent to the goal are scored with a low score (green) Additional details The length of an action could be chosen, which would split up the recorded data into actions of the given length. States in an action were recorded in sequence after each other, so while executing an action, the agent moves like the human who recorded itself did. Choosing a big action length would result in long actions, and thus longer continuous segments of the agent behaving human-like. The downside of long actions is that they might not be able to get the agent out of certain situations without hitting an obstacle. They may also take the agent to worse locations. If there is no recorded data similar to the agent s current state, the returned action probably does not suit the situation well. A longer action would then result in a bigger bad investment, whereas a shorter action would be able to re-classify the state sooner and hopefully get a better suiting action. Short actions however would result in shorter continuous segments of the agent behaving human-like. It would also require the state to be classified more often, which has an impact on the performance. Classifying often however, increases the chance of choosing a correct action for the situation. An action length that was somewhere in between long and short was chosen at first. Later, support for splitting up the data into several action lengths at the same time was implemented. This would help by making long actions available for areas without obstacles and short actions available for trickier situations. In practice, for an AI to be useful in a game, it should be possible to define different types of behaviour and be able to switch between them depending on the situation. The implementation was structured to allow for several types of actions and models, resulting in a loop described in Algorithm 4. Data was recorded separately for each

34 28 CHAPTER 3. IMPLEMENTATION behaviour. Algorithm 4 The agent loop 1: function Update 2: if recording then 3: // Recording 4: features featureextractor.extractfeatures(agent) 5: recorder.record(agent, features) 6: else 7: // Playback 8: if action is done executing or was aborted then 9: features featureextractor.extractfeatures(agent) 10: action model.classify(agent, features) 11: else 12: action.execute(agent, destination) The agent used a controller for deciding which feature extractor to use. When a dynamic obstacle would come within a certain distance, the agent would switch to the feature extractor for dynamic obstacle avoidance with the corresponding recorded actions. Otherwise it would use the static obstacle avoidance model Storing data The recorded data was stored as raw binary data. A file containing data for 1000 recorded actions á 50 states per action corresponding to about 25 minutes of recording has a size of approximately 3 Mb. The stored data per state (pose + sensor state) is described in Table 3.1. Pose state Sensor state Vector3 position Quaternion 4 rotation Vector3 direction Delta time 5 Feature vector float pos x float rot x float dir x float time float n 0 float pos y float rot y float dir y... float pos z float rot z float dir z float n numfeatures float rot w Table 3.1: The data stored for one state The time between the previous state and this state.

35 3.4. OVERALL IMPLEMENTATION Optimization and measuring performance For usage in a proper game, the computational time of the AI should be as low as possible. The bottleneck was to apply an action on the agent s current state in the NN algorithm since it was checked for each traversed action if it would go through an obstacle if applied to the agent s current state. This was improved by instead of checking for collision between every state in an action, the check was approximated by only checking for collision between the first state and the middle state, and middle state and the last state in an action. To ensure the agent did not get stuck by picking an invalid action, it was forced to update its current action at a certain time interval. The performance of the imitation agent was measured by measuring the average computational time per game frame for different amounts of data; 100, 200, 500 and 1000 recorded actions with an action length of recorded actions correspond to about 25 minutes of recording. 3.4 Overall implementation The framework allows a user to create an agent which imitates demonstrated movement behaviour. To create a behaviour, a user creates a feature extractor which defines what environmental features should be classified. The user then chooses when the behaviour should be activated. The user collects data for the behaviour by recording itself. Finally the behaviour can be played back. An agent can possess several behaviours at once, and it is up to the user to define when which behaviour should be activated. This chapter described how IL can be used to create an agent that imitates human demonstrations using a direct imitation approach and limited amounts of data. In the next chapter, the evaluation of the imitation agent is described.

37 Chapter 4 Evaluation This chapter presents the user study that was conducted in order to answer the project s stated questions. The results of the study are presented thereafter along with a performance measure of the imitation agent. Following that is a discussion section which presents and discusses what was done in the project, what the study found to be important in looking human-like and the performance of the imitation agent in relation to games. Finally some ethical aspects are discussed. 4.1 User study Recall that the objective of the project (see Section 1.2) is to answer the following: Q1.1: How to create an agent that imitates demonstrated behaviour, using IL with limited amounts of data? Q1.2: What determines if a character is human-like, when observed through the character s first-person perspective? A user study was conducted in order to answer Q1.2 and to contribute to the answer to Q1.1 by asking humans how well imitation agent imitates demonstrations. The method chapter describes how IL can be used to create behaviour by imitating recorded human behaviour, but no evaluation of whether the behaviour is humanlike or not. The user study aimed to evaluate the human-likeness of the agent and to evaluate in a qualitative manner how well the agent imitates the recorded human. As a reminder, an agent is said to be human-like if it looks like it is being controlled by a human. The layout of the study was inspired by [27] which as presented in the background chapter describe a simplification of a Turing test-approach. It was also inspired by [9] which gave users statements to agree or disagree with. 31

38 32 CHAPTER 4. EVALUATION The set-up The study consisted of videos of three different character controllers: The imitation agent, a human and Unity s built in NavMeshAgent. These controllers will be labeled Imitation Controller (IC), Human Controller (HC) and NavMesh Controller (NC) respectively. The human provided the demonstrations for the imitation agent to imitate. The NC was intended to act as a sanity check. A person with a lot of gaming experience would be able to easily tell that the NC was not being controlled by a human, as it moves very statically, does no unexpected movements and turns with a set speed. Three different settings were set up: Setting 1 A simple environment like during development (Figure 4.1). When the character reaches the goal, the goal gets randomly positioned somewhere on the map. Figure 4.1: Setting 1. Setting 2 An even simpler environment but with a single moving obstacle (Figure 4.2).

4.1. USER STUDY 33 Figure 4.2: Setting 2 with a moving obstacle (blue) and the goal (white). Setting 3 Same concept as Setting 1, but different map (Figure 4.3).

This means that all characters take the same path. (a) (b) Figure 4.3: Setting 3 from a top-down view with the corresponding first-person perspective.

The videos were recordings of the controllers moving around in the three different settings, from a first-person perspective (Figure 4.3b).

39 4.1. USER STUDY 33 Figure 4.2: Setting 2 with a moving obstacle (blue) and the goal (white). Setting 3 Same concept as Setting 1, but different map (Figure 4.3). Here, the goal positions were deterministic, meaning that when the character reaches the goal, the goal gets positioned at the next index in the goal positions list. This means that all characters take the same path. (a) (b) Figure 4.3: Setting 3 from a top-down view with the corresponding first-person perspective. One video was recorded for each of the settings and for each character controller, resulting in a total of nine videos. The videos were recordings of the controllers moving around in the three different settings, from a first-person perspective (Figure 4.3b). In most games, a player would observe an NPC from a third-person perspective. Using a third-person perspective requires the observed character to be modeled and potentially animated. Whether a user wants to or not, these things will most likely affect the users thoughts on how the character should behave. It is also more difficult to spot detailed movement and rotation from a third-person perspective. In first-person perspective however, a user does not need to know or

UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces

UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces Jacob Schrum, Igor Karpov, and Risto Miikkulainen {schrum2,ikarpov,risto}@cs.utexas.edu Our Approach: UT^2 Evolve