Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches

Size: px

Start display at page:

Download "Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches"

Annabella Ryan
5 years ago
Views:

1 Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches Suoju He*, Yuan Gao, Jiajian Yang, Yiwen Fu, Xiao Liu International School, School of Software Engineering* Beijing University of Posts and Telecommunications, Beijing, China {Suojuhe, doi: /ijact.vol2.issue1.4 Abstract The goal for video game AI (artificial intelligence) is to generate AI that is both challengeable and satisfactory. Most existing game AI is implemented by FSM (finite state machine) which has drawbacks in the three respects: requirement of designer s intensive participation; no existence of meta-programming; no planning and looking forward. Contributions of the paper is to propose CI (computational intelligence) as an approach to create both challengeable and satisfactory game opponents better than using FSM in the above three respects. During the research, two prey and predator genre games of Dead-End and Pac-Man are used as test-beds to prove the proposed theory. To create a challengeable game opponent, we proposed two CI approaches: CI-controlled-NPC and knowledge-based-ci-controlled-npc. As the latter is based on knowledge and is more computational resource efficient than the former, so the latter is more applicable for multi-player online games, while the former is only applicable for standalone PC game. To create a satisfactory game opponent, is to optimize player s experience through the creation of an even game. To better satisfy the player, we proposed two CI-based DDAs (Dynamic Difficulty Adjustment): DDA by timeconstrained-ci and DDA by knowledge-based-time-constrained-ci ; both of which outperform the existing DDA. As the latter is based on knowledge and more computational resource efficient than the former, so it is more applicable for multi-player online games, while the former is only applicable for standalone PC game. Keywords: Challengeable and satisfactory game AI, CI, MCTS, UCT, Dead-End, Pac-Man. 1. Introduction The goal for video game AI is to generate AI that is both challengeable and satisfactory; this is quite different from that of classic computer games like Go, the only goal of which is to create the most challengeable game AI. The term of game refers to video game in this paper to avoid confusion. Most existing game AI is implemented by FSM. There is no doubt that FSM can easily produce challengeable game opponent with good intelligence, however it does not always provide the best solution [1]. In this paper, CI is presented as an approach to create both challengeable and satisfactory game opponents; two prey and predator genre games of Dead-End and Pac-Man are used as test-beds to prove the proposed theory. Detail about discretized game Dead-End can refer to [2], as CAT moves as two times speed as DOG does, so the AI of DOG must be much better than the CAT s; while detail about game Pac-Man can refer to [3],however the change for this Pac-Man test-bed are: the map is consisted of 3x3 cubes, with one cube consisting of 3 steps and intersection consisting one step. In the remaining part of this paper, Section 2 and Section 3 present discussions on generating challengeable game opponents by the use of both CI approach and knowledge-based-ci approach respectively; Section 4 and Section 5 bring discussions on generating satisfactory game opponents by the use of DDA (Dynamic Difficulty Adjustment) by time-constrained-ci and DDA by knowledgebased-time-constrained-ci respectively. 41

2 Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches Suoju He, Yuan Gao, Jiajian Yang, Yiwen Fu, Xiao Liu 2. Generating challengeable game opponents by the approach of CI-controlled- NPC In the previous studies, CI approaches like MCTS (Monte Carlo Tree Search) [4] and UCT (Upper Confidence Bound for Trees) [5] are used in the computer Go and received good results. However, only a few studies are found for their use in the video game, especially the use for NPC control. In this research, CI refers to either MCTS or UCT Existing game AI generation with FSM The game AI or NPC's control are considered an expert system for a particular game. An expert system is composed by the rules and knowledge, all of which are called knowledge base. Development of this expert system knowledge base on a particular game is mainly encoded by FSM. There is no doubt that FSM can easily produce intelligence, however it does not always provide the best solution, the following contributes to its unsuitability for the game AI in some perspectives: Game AI encoded with FSM requires participations of both games developers and experts in the field of a particular game, and it costs a lot of time. As the game developer and experts are often constrained by their limited domain knowledge and inadequate development time, which could produce analysis and coding errors and could also generate the knowledge base of achieving not the most challengeable game AI. FSM normally encodes only in low-level, there is no existence of meta-programming [1], so there is no way for FSM to control NPC from high level, but only from detail, as a result the game AI with FSM becomes tedious labor-intensive work. FSM work in the passive mode, only the event trigger can generate a state of transition. FSM-controlled NPC can not plan and look forward [1]. Therefore the game strategy must be explicitly generated by the game developers, rather than be automatically formed The pros and cons of the proposed CI-controlled-NPC approach The pros are: CI rarely requires human participation as it depends little on domain knowledge, and it is not required to fully manually hard-code domain knowledge by CI; complex domain knowledge and strategies can be automatically acquired through comprehensive computation. So CI is also considered an automatic game design and development technique. CI uses meta-programming, which just requires the definition of a number of meta-rules, without the need of low-level coding of details, therefore the workload of the game developers are greatly reduced. CI can plan and look forward, so game strategies can be automatically formed; rather than be explicitly created according to domain knowledge by the game developer. The cons are: CI is to create intelligence from computation running on real time (online), so it is more system resources intensive and inefficient. As absence of high-performance computing resources usually leads to a decline in the performance of game AI, so CI does not apply to multiplayer online games, whose AI is running from the server side; however it works for standalone PC game. To choose a more efficient CI approach for multiplayer online games; knowledge-based-ci, which is trained offline and running online stable and efficiently, comes to play. Knowledge-based-CI will be detailed in the next section. As the CI uses meta-programming, so game developers are subject to define a number of metarules/domain knowledge, which requires a good understanding about a specific game. Meta-rules 42

3 for game Dead-End might include: 1) DOG tries to catch CAT, while CAT tries to arrive the EXIT without being caught. The CAT is forbidden to go south to make the simulation time shorter. 2) Within the turn limit of 20, if DOG and CAT occupy the same cell (DOG caught the CAT) during the gameplay, the DOG/opponent wins and the opponent s score have 1 point added; if CAT/player occupy the EXIT without being caught, the DOG loses and the opponent s score have 0 point added. 3) For DOG, if CAT appears in its 8 closest cells, it will catch the CAT directly without any simulations. For CAT, if no DOG appears in its closest cells, it runs for the EXIT. Once a DOG shows up in its closest cells, it will try to escape from the DOG rather than run for the EXIT. Meta-rules for game Pac-Man might include: 1) GHOST and Pac-Man move in the same speed, one step in each turn. 2) Both GHOST and Pac-Man Make choice of direction only in the intersections. 3) Within the turns limit of 50, if GHOST catches the Pac-Man (when both of them occupy the same cell), then GHOST/opponent wins and Pac-Man/player loses (when we talk about the game wins, it always means the opponent wins); when Pac-Man eats over 40 pellets, Pac-Man wins and GHOST loses. Game developers are also required to have certain expertise of applying MCTS or UCT to control the NPC on specific genre of game CI control of NPC in the game Dead-End Figure 1. First alternative of MCTS on the game Dead-End. 1DOGs (Opponents) current state.2dog1 (Opponent1) choices. 3DOG2 (Opponent2) choices.4cat (Player) first step choices. 5CAT (Player) second step choices.6one turn.7full simulation. 8Simulation result.9simulation result is returned to the chosen branch. Letter denotes the direction: E(East), W(West), N(North), S(South), St(Static). 43

4 Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches Suoju He, Yuan Gao, Jiajian Yang, Yiwen Fu, Xiao Liu Figure 2. Second alternative of MCTS on the game Dead-End. 1DOG's current state.2combinational choices of the two DOGs. 3CAT first step choices.4cat second step choices.5one turn.6simulation result.7simulation result is returned to the chosen branch.8full simulation. Number stands for the DOG number: 1(the first DOG). 2(the second DOG). Letter denotes the direction: E(East), W(West), N(North), S(South) MCTS control of NPC MCTS control of NPC for the game Dead-End follows the procedure of MCTS and makes MCTS generalized to the specific game genre. Within the time limit of simulations, a large number of simulations are run. For example, in one case, the number of simulations reaches almost one thousand with time limit of simulations of 300 ms. The procedure taken in each simulation is presented in Figure 1 and explained below. Selection During each turn, choices for DOGs are among West, East, North and South; choices for CAT are among West, East, North and Static (CAT retreats to south is prohibited to have a quick simulation result, however Static is kept as a choice instead). In the first step for the first turn, DOG1 randomly selected a branch of West from the trunk in the probability of 1/4 (0.25), which is a pure Monte Carlo method. Expansion Then still in the first turn, DOG2 expended with 4 choices and randomly selected North in the probability of 1/4; CAT expended and made first move of North randomly in the probability of 1/4, then the second move East randomly in the probability of 1/4. Backpropagation 44

5 In the next turn, the sub branch is expanded according to the above procedure, until the end of the game is reached (win or lose), or turns limit for simulations is reached (in this case the limit is set 20). The selected trunk branch is added 1 with a win result and added 0 otherwise. Simulation Within the time limit of MCTS simulations, the above procedure will be repeated until time limit of simulation is reached. Two alternatives for implementing MCTS on game Dead-End are presented in Figure 1 and Figure 2 respectively, the major difference between them are: In Figure 1, during each turn, each DOG makes individual choice without regards of the other. In Figure 2, two DOGs make a combinational choice from 16 alternatives. In Figure 1, the alternatives for CAT are among West, East, North and Static; while in Figure 2, the alternatives for CAT are among West, East, and North. Figure 2 expends the width of search space; while Figure 1 expends the depth of search space. Result of MCTS control of NPC can be inferred from Figure 3: the performance of the MCTScontrolled NPC increases with the length of simulation time, the peak performance of about 70% winrate is reached around the simulation time of 300ms for Strategy-A player (Figure 8 in 3.3.1) UCT control of NPC UCT is a mechanism of doing MCTS. Control of NPC by using UCT and MCTS has some similarity; the only difference between them is in the way of selecting a branch from the trunk: MCTS follows Monte Carlo; while UCT follows UCB1 (Upper Confidence bound 1) [2]. Result of UCT control of NPC can be inferred from Figure 3: the performance of the UCTcontrolled NPC increases with the length of simulation time, the peak performance of about 72% winrate is reached around the simulation time of 300ms for Strategy-A player (Figure 8 in 3.3.1) UCT UCT- ANN MCTS MCTS- ANN Figure 3. Performance of the NPC controlled by (MCTS, UCT, ANN from MCTS-data, or ANN from UCT-data) with the length of simulation time strategy-a player (Figure 8 in 3.3.1) CI control of NPC in the game Pac-Man The mechanism of both UCT and MCTS control of NPC in game Pac-Man is similar to that of game Dead-End, with GHOSTs to replace the DOGs, and Pac-Man to replace CAT in both Figure 1 and Figure 2. 45

6 Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches Suoju He, Yuan Gao, Jiajian Yang, Yiwen Fu, Xiao Liu The performance of the NPC controlled by both MCTS and UCT with the length of simulation time for Strategy-X player/pac-man ( 1) in 3.3.2) is presented in Figure 4, which has the similar trend as that of Figure 3. The peak performance of the NPC controlled by both MCTS and UCT reached about 72% win-rate around the simulation time of 1400ms for Strategy-X player Result Figure 4. Performance of the NPC controlled by MCTS and UCT with the length of simulation time for Strategy-X player/pac-man ( ( 1). in 3.3.2) ). It can be inferred from Figure 3 and Figure 4, for two prey and predator genre games of both Dead- End and Pac-Man, the performance of NPC controlled by both MCTS and UCT increases with the length of simulation time, the peak performance of about 70% win-rate all can be reached, however the length of time simulation time could be significantly different for two different test-beds. 3. Generating challengeable game opponents by the approach of knowledgebased-ci-controlled-npc CI method usually consumes a large amount of system resources, such as CPU and RAM, so the approach of CI-controlled-NPC is not applicable for multi-player online games. To make up the drawback, we proposed the approach of knowledge-based-ci-controlled-npc, which uses the trained ANN (artificial neural network) to control the NPC, while the training data for the ANN are collected from the approach of CI-controlled-NPC during the gameplay. It can be inferred from Figure 3 that performance of Knowledge-based-CI (in this case, CI refer to MCTS) is more stable, is not going to downgrade with lack of system resources, so it is more applicable for multi-player online games The pros and cons of knowledge-based-ci-controlled-npc Wi n- r at e( %) MCTS ANN Amount of Thr eads Figure 5. Performance of MCTS-controlled-NPC compared with that of knowledge-based-mctscontrolled-npc against Strategy-A player (Figure 8. in 3.3.1) with increase of threads (players) simulated in a standalone computer. 46

7 Figure 6. Amount of simulations decreases with the increase of threads (Strategy-A players) for MCTS-controlled-NPC at the simulation time limit of 300ms. Figure 7. Performance of UCT-controlled-NPC compared with that of knowledge-based-uctcontrolled-npc against Strategy-A player (Figure 8 in 3.3.1) with increase of threads (players) simulated in a standalone computer. The pros are: It is using ANN to control the NPC, while ANN represents the knowledge. So it has no dependency on computation resources,it can generate more stable adaptive game intelligence. Therefore, it is ideal for multi-player online games. The advantage of it can also be demonstrated through Figure 5, Figure 6, and Figure 7. From Figure 5, we can infer that performance of MCTS-controlled-NPC (pink line) downgrade with the increased running threads (a single thread is run by a single player); however the performance of knowledge-based-mcts-controlled-npc (green line) remains steady with increased amount of players. The result from Figure 5 also helps to explain what is from Figure 6: with MCTScontrolled-NPC at the simulation time limit of 300ms, the amount of simulations decreased with the increased number of players, as a result the performance of MCTS downgraded, as we have already known from Figure 3 that with increase of simulations, the performance of MCTS is enhanced. The Figure 7 which is on UCT proved the same story as that of Figure 6 this is on MCTS. The cons are: Strategy-Based Player Modeling () is required to correlate with game AI implemented by knowledge-based-ci in order to generate challengeable game opponents. The reason is because that knowledge-based-ci is to use ANN trained from the data of CI-controlled NPC while against the player with a specific strategy. requires game developers have a good master of data mining and machine learning techniques; game AI from knowledge-based-ci requires game developers to have certain skills for a specific game genre: generating ANN from data of CI-controlled NPC and possible 47

8 Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches Suoju He, Yuan Gao, Jiajian Yang, Yiwen Fu, Xiao Liu optimizing the ANN in order to have a better performance of game AI Implementation of ANN Artificial neural networks (ANNs) provide a general, practical method for learning real-valued, discrete-valued, and vector-valued functions from examples; it is consisted of single or multiple layer perceptrons. Algorithms such as BACKPROPAGATION (BP) use gradient descent to tune network parameters to best fit a training set of input-output pairs are normally used [9]. ANN could be used as classification as well as control. In this paper, multilayer perceptrons with BP learning algorithms is used for NPC control ANN in the game Dead-End In the following while training the ANNs, only the instances belonged to the winning opponent are kept. 1) ANN from MCTS-created data (limit of simulation time for this MCTS is 200ms). Nodes in the inputs layer are 28; nodes in the output layer are 8; nodes in the hidden layer are 18 [8]; epochs are Tricks while generating the ANN are: the size of the training sets will affect the performance of the ANN. First of all, we set a desired win-rate,started with a large training set of 5000 entries,then build the ANN from that data. If the outcome is unsatisfactory, gradually reduce the size of the training set until we get a required performance of ANN. Data used for training are collected as following: For each strategy, choose the data from 200 gameplays which the opponents win (DOGs win). Each group of data is from just one gameplay consisted of a turn limit of 20 steps and approximately an average of 15 steps for each turn during the gameplay (some strategy in 12 steps and some are in 15 or 16 steps), so there are a maximum of 3*200*15=9000 entries of data used for training the ANN. Data used for training ANN in supervised : total of 54 groups, data from each step will be used to train the ANNs. Strategy-A: 1147 groups (*average steps) Strategy-B: 982 groups Strategy-C: 1492 groups Data used for training ANN in unsupervised : still total of 54 groups, data from each step will be used to train the ANNs. Cluster-1: 1523 groups Cluster-2: 882 groups Cluster-3: 1216 groups 2) ANN from UCT-created data (limit of simulation time for this UCT is 300ms). There are two dissimilar ANNs for two different DOGs. There 12 nodes in the hidden layer; 4 nodes in the output layer for both of the DOGs ANNs. There are 20 nodes in input layer for DOG1 s ANN; 21 nodes for DOG2 s ANN (output of ANN for DOG1 is added as the 21 st node). Data used for training are 428 entries. Epochs are 500 [6] ANN in the game Pac-Man 1) ANN from MCTS-created data. Detail about it is not implemented yet; however, generating ANN from UCT-created data for Pac-Man is quite similar as that of generating ANN from MCTS-created data for Pac-Man which is explained right in the following. 48

2) ANN from UCT-created data (limit of simulation time for this UCT is 200ms). There are 22 nodes in the inputs layer; 4 nodes in the output layer; 11 nodes in the hidden layer [7].

9 2) ANN from UCT-created data (limit of simulation time for this UCT is 200ms). There are 22 nodes in the inputs layer; 4 nodes in the output layer; 11 nodes in the hidden layer [7]. There are 500 epochs. Data used for training are collected as following: For each strategy, choose the data from 50 gameplays which the opponents win (GHOSTs win). Each group of data is from just one gameplay consisted of a turn limit of 50 steps and approximately an average of 35 steps for each turn during the gameplay, so there are a approximately maximum of 3*50*35=5250 entries of data used for training the ANN. Data used for training ANN in supervised : total of 54 groups (opponent win), data from each step will be used to train the ANNs. Strategy-X: 9 groups (*average steps) Strategy-Y: 14 groups Strategy-Z: 31 groups Data used for training ANN in unsupervised : still is the total of 54 groups, data from each step will be used to train the ANNs. Cluster-1: 19 groups Cluster-2: 13 groups Cluster-3: 22 groups 3.3. Strategy-Based Player Modeling () is to recognize player s strategy pattern during the gameplay per se. can be implemented by both supervised and unsupervised learning; however, unsupervised learning is more feasible for the real gameplay. Detail about supervised can refer to [3]. Unsupervised learning approach in this research is x-mean in the game Dead-End Figure 8. Player s (CAT s) three strategies planned (left) and those in the real gameplay (right). 1) Player s (CAT s) three strategies. CAT s three strategies (Strategy A Triangle, Strategy B Square, Strategy C Zigzag) can refer to Figure 8. 2) Attributes used for. 49

10 Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches Suoju He, Yuan Gao, Jiajian Yang, Yiwen Fu, Xiao Liu Attributes used for in Dead-End are same as those used for input nodes of ANN in both 1) and 2) of (3.2.1). For each strategy, collect the data from 200 gameplays. Supervised on the game Dead-End can refer to [2], with good performances around 85%-90% by using different supervised learning algorithms. 3) Evaluating the performance of opponent AI (supervised and unsupervised with knowledge-based CI) by cross-validation (CV). In this research, we proposed two dissimilar approaches for evaluating performance of opponent AI in CV used for supervised and unsupervised with knowledge-based CI, which are Average Win-Rate and Epsilon. As we have already assumed that in both supervised and unsupervised, the performance of Strategy(i) opponent against with Strategy(i) player should perform the best. A. Average Win-Rate. Average Win Rate = Strategy(i) player. W1+ W2 + W3 3. Wi is the win rate of Strategy(i) opponent against with B. Epsilon. Epsilon evaluates the performance of opponent game AI from the perspective of how well the matching is done in the CV. For cluster x strategy, we denote a variable Epsilon, where the item is the net win-rate of the cluster x fighting itself, and the item is the items not matching, or in other words, the net win-rate of the cluster x fighting other clusters. The denominator n refers to the number of clusters that not matches. 4) CV of supervised and unsupervised with knowledge-based-mcts in Dead-End. The data collected from MCTS are used for training the ANN, while the simulation time for MCTS are in 200 ms (millisecond, 1 s=1000ms). A. CV of supervised with knowledge-based-mcts. Table 1. Win-rate of DOG from MCTS-data against FSM-controlled CAT Strategy-A DOG Strategy-B DOG Strategy-C DOG FSM-controlled Strategy-A CAT FSM-controlled Strategy-B CAT FSM-controlled Strategy-C CAT 59% 24% 20% 31% 57% 70% % (Table 1 demonstrates that supervised is absolutely necessary.) 50

11 Table 2. Win-rate of DOG from MCTS-data against CAT Supervised Strategy-A DOG Strategy-B DOG Strategy-C DOG Strategy-A CAT Strategy-B CAT Strategy-C CAT 81% 1% 13% 12% 62% 23% 53% 0% 94% (Table 2 is a basis of CV from unsupervised (Table 4)) Table 3. Win-rate of single DOG from all the MCTS-data against different strategy s CAT Supervised Strategy-A CAT Strategy-B CAT Strategy-C CAT 11% 13% 7% (Table 3 demonstrates that supervised is absolutely necessary, because if the player strategy model is not identified, we have to use one ANN from all the MCTS-data to PK all kinds of players with dissimilar strategies, which results in low performance in game AI) B. CV of unsupervised with knowledge-based MCTS Table 4. Win-rate of DOG against CAT Unsupervised cluster-1 Strategy DOG cluster-2 Strategy DOG cluster-3 Strategy DOG cluster-1 cluster-2 cluster-3 76% 21% 12% 8% 71% 8% 20% 39% 84% (Table 4 demonstrates that the idea of CV from supervised (Table 2) could be successfully portable to CV from unsupervised.) Table 5. Win-rate of single DOG from all the MCTS-data against different strategy s CAT Unsupervised cluster-1 cluster-2 cluster-3 3% 1% 6% (Table 5. demonstrates that unsupervised is absolutely necessary, because if the player s strategy model is not identified, we have to use one ANN from all the MCTS-data to PK all kinds of players with dissimilar strategies, which results in low performance in game AI.) C. Evaluating performance of opponent game AI by supervised and unsupervised Table 6. Evaluation of performance of Opponent game AI by supervised and unsupervised Average Win-Rate Epsilon Supervised Unsupervised 79% 62% 77% 59% 51

12 Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches Suoju He, Yuan Gao, Jiajian Yang, Yiwen Fu, Xiao Liu (Table 6. demonstrates that performance of opponent AI by unsupervised brings the similar result as that of supervised. However the latter is more feasible in reality.) 5) CV of supervised and unsupervised with knowledge-based-uct in Dead-End. The data collected from UCT are used for training the ANN, while the simulation time for UCT are in 300 ms. A. CV of supervised with knowledge-based-uct. Table 7. Win-rate of DOG against FSM-controlled CAT Strategy-A DOG Strategy-B DOG Strategy-C DOG FSM-controlled Strategy-A CAT FSM-controlled Strategy-B CAT FSM-controlled Strategy-C CAT 62% 45% 0% 22% 66% 24% 10% 30% 60% (Means similarly as indicated in Table 1) Table 8. Win-rate of DOG against CAT Supervised Strategy-A DOG Strategy-B DOG Strategy-C DOG Strategy-A CAT Strategy-B CAT Strategy-C CAT 92% 44% 4% 12% 68% 22% 30% 12% 94% (Means similarly as indicated in Table 2) Table 9. Win-rate of single DOG from all the UCT-data against different strategy s CAT Supervised Strategy-A CAT Strategy-B CAT Strategy-C CAT 24% 33% 0% (Means similarly as indicated in Table 3) B. CV of un-supervised with knowledge-based UCT Table 10. Win-rate of DOG against CAT Unsupervised cluster-1 Strategy DOG cluster-2 Strategy DOG cluster-3 Strategy DOG cluster-1 cluster-2 cluster-3 62% 20% 6% 16% 95% 30% 8% 8% 66% (Means similarly as indicated in Table 4) 52

13 Table 11. Win-rate of single DOG from all the UCT-data against different strategy s CAT Unsupervised cluster-1 cluster-2 cluster-3 27% 77% 21% (Means similarly as indicated in Table 5) C. Evaluating performance of opponent game AI by supervised and un-supervised Table 12. Evaluation of performance of opponent game AI by supervised and unsupervised Average Win-Rate Epsilon supervised 84.67% 64% in the game Pac-Man Unsupervised 74.34% 59.67% (Means similarly as indicated in Table 6.) 1) Player s (Pac-Man s) three strategies. Player s three strategies only exhibit while choosing direction in the intersection. Strategy-X: in the intersection, the player either move straight ahead or make right turn all in the probability of 50%; however if it will meet the GHOST in the next step, it will have to backtrack. Strategy-Y: in the intersection, the player either move straight ahead or make left turn all in the probability of 50%; however if it will meet the GHOST in the next step, it will have to backtrack. Strategy-Z: in the intersection, the player chooses to make right turn; however if it will meet the GHOST in the next step, it will have to backtrack. 2) Attributes used for There are 22 attributes used for in Pac-Man, which are same as the 22 nodes in the inputs layer for ANN used for control the GHOST in the game Pac-Man [7]. For each strategy, collect the data from 50 gameplays. Supervised on the game Pac-Man could be found in [3] with good performances around 85%-90% by using different supervised learning algorithms. 3) CV of supervised and unsupervised with knowledge-based-mcts in Pac-Man This part is not implemented; however the expected result might be highly similar to those of 4) in (3.3.2). 4) CV of supervised and unsupervised with knowledge-based-uct in Pac-Man A. CV of supervised with knowledge-based-uct The data collected from UCT are used for training the ANN, while the limit of simulation time for UCT is 200 ms. 53

14 Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches Suoju He, Yuan Gao, Jiajian Yang, Yiwen Fu, Xiao Liu Table 13. Win-rate of GHOST against FSM-controlled PAC-MAN ANNcontrolled Strategy-X GHOST ANNcontrolled Strategy-Y GHOST ANNcontrolled Strategy-Z GHOST FSM-controlled Strategy-X PAC-MAN FSM-controlled Strategy-Y PAC-MAN FSM-controlled Strategy-Z PAC-MAN 82% 44% 36% 48% 91% 32% 52% 22% 100% (Means similarly as indicated in Table 1) Table 14. Win-rate of GHOST against PAC-MAN Supervised Strategy-X GHOST Strategy-Y GHOST Strategy-Z GHOST Strategy-X PAC-MAN Strategy-Y PAC-MAN Strategy-Z PAC-MAN 97% 51% 8% 49% 90% 31% 0% 0% 98% (Means similarly as indicated in Table 2) Table 15. Win-rate of single GHOST from all the UCT-data against different strategy s PAC-MAN Supervised Strategy-X PAC-MAN Strategy-Y PAC-MAN Strategy-Z PAC- MAN 23% 43% 5% (Means similarly as indicated in Table 3) B. CV of unsupervised with knowledge-based-uct Table 16. Win-rate of GHOST against PAC-MAN Unsupervised cluster-1 Strategy GHOST cluster-2 Strategy GHOST cluster-3 Strategy GHOST cluster-1 Strategy PAC- MAN cluster-2 Strategy PAC- MAN cluster-3 Strategy PAC- MAN 76% 43% 16% 48% 94% 40% 0% 0% 96% (Means similarly as indicated in Table 4) Table 17. Win-rate of single GHOST from all the UCT-data against different strategy s PAC-MAN 54

15 Unsupervised cluster-1 Strategy PAC- MAN cluster-2 Strategy PAC- MAN cluster-3 Strategy PAC- MAN 42% 35% 22% (Means similarly as indicated in Table 5) C. Evaluating performance of opponent game AI by supervised and unsupervised Table 18. Evaluation of performance of Opponent game AI by supervised and unsupervised Supervised Average Win-Rate Epsilon 95% 72% Unsupervised 89% 64% (Means similarly as indicated in Table 6) 3.4. Result Knowledge-based CI is more efficient than the straight CI, however, it requires to correlate with Knowledge-based CI, which require the game AI developer to have specific expertise and a lot work to be done. 4. Generating satisfactory game opponents by DDA from time-constrained-ci Opponent adaptation could be achieved from two different aspects. First, to generate more challenging opponent AI; the second is to generate satisfactory opponent AI which meet a player's skill level; it is actually the matter of creation of an even game. This section and the section next will discuss how to adapt the opponent from the second aspect, which is to find the way of generating satisfactory opponent AI. DDA from time-constrained-ci is to adjust the opponent s challenge level by adjusting time limit of simulation for time-constrained-ci Flow and Gameflow vs. player s satisfaction Gameflow is to apply Flow [10] to game development. According to Gameflow, an important precursor to a Flow experience or optimized satisfaction during the gameplay is when a match is created between the player s skills and the challenges associated with the task, with both being over a certain level [11]. Flow theory is further explained in detail in Figure 9 which indicates that different players might have different Flow Zones [12]: hardcore player s Flow happen when player s skill level is well below the opponent s challenge level; while for the normal player, Flow happen when player s skill level match the opponent s challenge level; for the novice player, Flow happen when player s skill level is well above the opponent s challenge level. For the normal player, It is hard to measure both the player s skill level and the game s challenge level quantitatively, so as to create a match between them; however, by using the CI approach of MCTS (from Figure 10), we could successfully create an even game whose win-rates of all the gameplays for game opponent are around 50%. 55

Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches Suoju He, Yuan Gao, Jiajian Yang, Yiwen Fu, Xiao Liu 4.2. Existing DDA Figure 9.

16 Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches Suoju He, Yuan Gao, Jiajian Yang, Yiwen Fu, Xiao Liu 4.2. Existing DDA Figure 9. Different players have different Flow Zones. The existing approach for generating satisfactory game opponent is through DDA (Dynamic Difficulty Adjustment) [13], which is relatively simple. It usually increases or decreases the game difficulty by the use of increase or decrease amount of opponents correspondingly [14]. Theoretically speaking, this DDA is not adjusting game difficulty through adjustment of opponent s intelligence. In this way, the players usually feel that they are not treated fairly or feel cheated [14] when they are beaten by increasing number of opponents, as they didn t have a feeling of opponent becoming smarter or their intelligence is augmented The pros and cons of DDA from time-constrained-ci The pros are: From Figure 10, we can infer that, the resulting performance of the opponent AI is determined by the length of simulation time of CI (CI here refers to MCTS) method: usually the longer limit of simulation time generates more intelligent game opponents with better performance. So in this way, we could adjust the opponent's intelligence or challenge level by adjusting the limit of simulation time of CI-controlled-NPC. In this research, we proposed two approaches of DDAs through the adjustment of the limit of CI simulation time, which are "time-constrained-ci" and "knowledge-based-timeconstrained-ci". The former is presented in much detail in the remaining Section; while later is presented in the next Section. As both of the approaches adjust the game challenge level from the nature of intelligence, so when are used, they make players feel more satisfactory, without feeling that they have been cheated. 56

17 Figure 10. Performance of MCTS-controlled NPCs against Strategy-A (Figure 8. in 3.3.1) player with time-constrained simulations for Dead-End. The cons are: Implementation of it requires game developers know-how of using CI approaches of MCTS or UCT to control NPC for a specific genre of game. In addition basic mathematics skill of regression is also required. It is based on CI, which usually consumes a large amount of system resources, such as CPU and RAM, so it works for standalone PC game, but is not applicable for multi-player online games Implementation of DDA by time-constrained-mcts in the game Dead-End From Subsection 4.1, it has already been known that for the normal players, they are experiencing optimized satisfaction when their skill level matches the game challenge level; or in other words, Gameflow happens when an even game (50% win-rate of NPCs) is created. In the following, we will demonstrate how to use DDA from time-constrained-ci to satisfy normal players, hardcore players, and novice players. The experiment is run in the computational environment of: Intel Pentium M processor 1.3GHz, RAM 1G, PF 443MB, CPU occupancy rate 1%-5%, running Threads 35. First plot the graph in dots in Figure 10 according to the data collected from the experiments. Find pattern of curve which is polynomial from Figure 10. Use the tool of MATLAB 7.0 to obtain the regression function from the plotted dots: y= *x^ *x (x stand for time-limit of simulation in milliseconds, y stand for win-rate in percentage). Find the value of time-limit of simulation (46ms in this case in Table 19) which has a win-rate of 50% from the above regression function. Reapply time-limit of simulation (46ms in this case from Table 19) to MCTS-controlled NPCs (DOGs) to PK the Strategy-A player (Figure 8 in 3.3.1). The resulted win-rate for opponent game AI is 49%, which is close to what we expected of 50%. So we can apply 46ms as time-limit of simulation to MCTS-controlled NPCs in order to create almost even game, with a win-rate around 50%. So just follow the above procedure, an almost even game worked for normal players could be created. However for the hardcore players, who prefer more challenge, to mostly optimize their satisfaction, the target win-rate could be set as 70% with 282ms as time-limit of simulation from the regression function, the resulted MCTS win-rate is 68% (in Table 19). For the novice players, who prefer less 57

18 Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches Suoju He, Yuan Gao, Jiajian Yang, Yiwen Fu, Xiao Liu challenge, to mostly optimize its satisfaction, the target win-rate could be set as 30% with 6ms as timelimit of simulation, the resulted MCTS win-rate is 31% (in Table 19). Table 19. Win-rate of both MCTS and ANN control of NPC Target Winrate Time-limit of MCTS Resulted MCTS Winrate ANN from MCTS data Win-rate 30% 6ms 31% 30% 50% 46ms 49% 48% 70% 282ms 70% 68% 4.5. Implementation of DDA by time-constrained-mcts in the game Pac-Man The procedure of doing DDA by time-constrained-mcts in the game Pac-Man is quite similar as that of Dead-End (4.4); however the regression function could be quite different as indicated in Figure 11. Figure 11. Performance of MCTS-controlled NPCs against Strategy-X ( 1). in 3.3.2) player with time-constrained simulations for Pac-Man. 58

19 4.6. Implementation of DDA by time-constrained-uct in the game Pac-Man Figure 12. Performance of UCT-controlled NPCs against Strategy-X ( 1). in 3.3.2) player with timeconstrained simulations for Pac-Man. The procedure of doing DDA by time-constrained-uct in the game Pac-Man is quite similar as that of Dead-End (4.4), however the regression function could be quite different as indicated in Figure Result From the above experiment, as we have already known that performance of the opponent AI is determined by the length of simulation time of CI (CI here refers to MCTS), so DDA could be generated from time-constrained-ci in different test-bed of Dead-End and Pac-Man. However time-constrained-ci, which is computationally intensive, works only for standalone PC game, so it is not applicable for multi-player online games. Next section will present DDA from knowledge-based-time-constrained-ci which is more computationally efficient and is more feasible for multi-player online games. 5. Generating satisfactory game opponents by DDA from knowledge-basedtime-constrained-ci DDA from time-constrained-ci method is based on CI, which usually consumes a large amount of system resources, such as CPU and RAM, is not applicable for DDA on multi-player online games. To make up this drawback, we propose the approach of DDA by knowledge-based-time-constrained-ci. DDA by knowledge-based-time-constrained-ci is to adjust the opponent s challenge level by adjusting different NPC, the ANN are trained from the data from The pros and cons of DDA by knowledge-based-time-constrained-ci The pros are : it is based on knowledge-based CI, which is resource saving with high performance. Detail discussion about the potential of knowledge-based-ci could be found in Subsection 3.1. The cons are: It requires correlating with knowledge-based-ci in order to generate satisfactory game opponents, in other words, player s strategy model must be identified first, and start with DDA. Implementation of it requires game developers the expertise of using of CI approaches of MCTS or UCT to control NPC for a specific genre of game. In addition, game developers are required to 59

20 Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches Suoju He, Yuan Gao, Jiajian Yang, Yiwen Fu, Xiao Liu be capable of creating ANN from CI-controlled NPC generated data and optimizing the ANN if required Implementation of DDA by knowledge-based-time-constrained-mcts in the game Dead-End The procedure of creating an even game (50% win-rate of NPCs) by the approach of DDA from knowledge-based-time-constrained-ci could be thought as an additional work of Subsection 4.4. In addition to the work done in Subsection 4.4, reapply time limit of simulation (46ms in this case in Table 19) to MCTS-controlled NPCs (DOGs) to PK the Strategy-A player(cat). The resulted win-rate for the opponent game AI is 49%. During this process, record every step data of the NPCs movement, then apply the data to train the ANN for Opponents (DOGs). The resulted win-rate for opponent game AI controlled by knowledge-based-time-constrained- MCTS (ANN trained from MCTS-created data) to PK Strategy-A player is 48%(in Table 19) which is acceptable as it is around expected value of 50%; however, if the resulted win-rate is much lower than expected target value of 50%, an optimization of the resulted ANN might be required. So in this way, we could create different ANNs from the time-constrained-mcts for different win-rates of opponents, in other words, different challenge levels of opponent controlled by ANNs. For the novice player, the target win-rate could be adjusted to lower than 50% (for example 30% in Table 19), the resulted win-rate for opponent game AI controlled by ANN is 30% which is quite acceptable; while for the hardcore player, the target Win-rate could be adjusted to higher than 50% (for example 70% in Table 19), resulted win-rate for opponent game AI controlled by ANN is 68% which is quite acceptable. So there are two ways to create DDA by knowledge-based-time-constrained-mcts or DDA by ANNs trained from the data of time-constrained-mcts: 1) to optimize the dots which consist of the curve of knowledge-based-time-constrained-mcts with time limit of simulation or the different ANNs, and make it more approximate to the curve of time-constrained-mcts; however, during the experiments, amount of entries of data used for training the ANN are found different to have a better performance of ANN. So in this way, it is hard to automatically create curve of knowledge-based-time-constrained-mcts by computer, as a result this approach is not feasible (explained in Figure 13). 2) To create the curve of knowledge-based-timeconstrained-mcts which is consisted of different ANNs, then create regression function from it (explained in Figure 14), then following the procedure of DDA by time-constrained-mcts which is presented in Subsection 4.4. Figure 13. Performance of NPCs controlled by MCTS/ ANN from MCTS-data against Strategy-A (Figure 8 in 3.3.1) player with time-constrained simulations. 60

21 Figure 14. Performance of NPCs controlled by ANN from MCTS-data against Strategy-X (Figure 8 in 3.3.1) player with time-constrained simulations Implementation of DDA by knowledge-based-time-constrained-mcts in the game Pac-Man Figure 15. Performance of NPCs controlled by ANN-from-time-constrained-MCTS-data against Strategy-X ( 1). in 3.3.2) player with time-constrained simulations. The procedure of implementation of DDA by knowledge-based-time-constrained-mcts in the game Pac-Man is quite similar as that of in Dead-End (5.2). However the regression function could be quite different as indicated in Figure Implementation of DDA by knowledge-based-time-constrained-uct in the game Pac-Man 61

22 Creating Challengeable and Satisfactory Game Opponent by the Use of CI Approaches Suoju He, Yuan Gao, Jiajian Yang, Yiwen Fu, Xiao Liu Figure 16. Performance of NPCs controlled by ANN-from-time-constrained-UCT-data against Strategy-X ( 1). in 3.3.2) player with time-constrained simulations. The procedure of implementation of DDA by knowledge-based-time-constrained-uct in the game Pac-Man is quite similar as that of in Dead-End (5.2). However the regression function could be quite different as indicated in Figure Result DDA from knowledge-based-time-constrained-ci, which is more computationally efficient and is more feasible for multi-player online games, could be possibly generated as that of DDA from timeconstrained-ci in different test-bed of Dead-End and Pac-Man. 6. Conclusion From the above discussion, we can conclude that straight CI really can generate challengeable and satisfactory game opponent; however, it is more system resources intensive and inefficient, so this approach is only appropriate for standalone PC game. Knowledge-based CI is not only appropriate for standalone PC game, but also good for multi-player online game; however a lot works are required to be done which involved supervised and unsupervised and ANN generation. 7. References [1] Alex J. Champandard, 10 Reasons the Age of Finite State Machines is Over, Retrieved from on 08/18/09. [2] Suoju He, et al. Game Player Strategy Pattern Recognition and How UCT Algorithms Apply Pre- Knowledge of Player s Strategy to Improve Opponent AI. (ISE'2008). [3] Suoju He, et al. Strategy-Based Player Modeling During Interactive Entertainment Sessions by Using Bayesian Classification. (ICNC'08). [4] Guillaume Chaslot, et al. Monte-Carlo Tree Search: A New Framework for Game AI. (BNAIC 2008). [5] Levente Kocsis and Csaba Szepesv ari. Bandit based monte-carlo planning. In 15th European Conference on Machine Learning (ECML06). [6] Yiwen Fu, Si Yang, Suoju He, et al. To Create Intelligent Adaptive Neuro-Controller of Game Opponent from UCT-Created Data. (ICNC-FSKD'09). 62

23 [7] Xiao Liu, Yao Li, Suoju He, Yiwen Fu, Jiajian Yang, Donglin Ji, Yang Chen. To Create Intelligent Adaptive Game Opponent by Using Monte-Carlo for the Game of Pac-Man. (ICNC-FSKD'09). [8] Jiajian Yang, Yuan Gao, Suoju He, et al. To Create Intelligent Adaptive Game Opponent by Using Monte-Carlo for Tree Search. (ICNC-FSKD'09). [9] Mitchell, T. M. Machine learning. McGraw Hill, New York, [10] CSIKSZENTMIHALYI, M Flow: The Psychology of Optimal Experience. Harper Perennial, New York. [11] SWEETSER, P. AND WYETH, P GameFlow: A model for evaluating player enjoyment in games. ACM Computers in Entertainment 3, 3. [12] Chen, J. Flow in games (and everything else), Communications of the ACM, v.50 n.4, April [13] Dynamic Difficulty Adjustment (DDA), retrieved from on 08/18/09. [14] Auto-dynamic difficulty, retrieved from on 08/18/09. 63

Automatic Game AI Design by the Use of UCT for Dead-End

Automatic Game AI Design by the Use of UCT for Dead-End Zhiyuan Shi, Yamin Wang, Suou He*, Junping Wang*, Jie Dong, Yuanwei Liu, Teng Jiang International School, School of Software Engineering* Beiing