Evolving Adaptive Play for the Game of Spoof. Mark Wittkamp

Size: px

Start display at page:

Download "Evolving Adaptive Play for the Game of Spoof. Mark Wittkamp"

Loreen Heath
6 years ago
Views:

1 Evolving Adaptive Play for the Game of Spoof Mark Wittkamp This report is submitted as partial fulfilment of the requirements for the Honours Programme of the School of Computer Science and Software Engineering, The University of Western Australia, 2006

2 Abstract For game playing in general it is important for players to be adaptive, this is particularly true for games where no optimal fixed strategy is known to exist. Adaptive artificial opponents capable of learning and opponent modelling are highly desirable in computer games. Typically, a great deal of a game s ability to maintain the interest of human players is provided by multiplayer functionality due to the unpredictable and changing game environment that this entails. It is reasonable to expect that artificial opponents mimicking the observable characteristics of human players through adaptive play would significantly benefit many games lastability. Spoof is a multiple player game of imperfect information for which the success of a player is largely dictated by its ability to build models of its opponent(s) so that their weaknesses may be identified and exploited. We present our approach to opponent modelling in the game of Spoof through the use of evolutionary algorithms, more specifically - genetic programming. Genetic programming involves a guided random search of the solution space to a given problem by evolving a population of candidate solutions which take the form of program trees. Genetic programming shows potential for games of imperfect information or other games where tree searching algorithms are often infeasible due to the games intractability. The suitability of genetic programming for opponent modelling is substantiated by comparison with a simple lookup-table approach for learning. We demonstrate that specialisation and opponent modelling is required for optimal play in the game of Spoof by contrasting evolved playing strategies with a number of fixed strategies comparable to those employed by most human players. Keywords: Games of Imperfection, Spoof, Genetic Programming, Opponent Modelling, Noise CR Categories: A.2, I.7.2 ii

3 Acknowledgements The author wishes to thank Dr. Luigi Barone from the University of Western Australia for his continued guidance and support in supervising this project, in particular for his work with regard to the CIG 2006 submission. Thanks are also extended to Dr. Lyndon While for his suggestions during the preliminary stages of this work. iii

4 Contents Abstract Acknowledgements ii iii 1 Introduction 1 2 Learning in Games Imperfect Information Games Need for Opponent Modelling and Adaption Previous Approaches Reinforcement Learning by Look-up Tables Evolutionary Algorithms for Opponent Modelling Evolutionary Algorithms Genetic Algorithms Evolutionary Programming Evolution Strategies Genetic Programming Representation Population Initialisation Fitness Selection Schemes Parsimony Genetic Operators GP System Parameters The Game of Spoof Rules of Spoof iv

5 4.2 Spoof Strategy Building Adaptive Spoof Players The Learning Environment Genetic Program Players Learning Experimental Results Static Opponents Look-up Table Learning Three Player Spoof, guessing 3rd Optimality Performance Results Strategy Analysis Learning Against Adapting Opponents Performance Results Number of Fitness Cases Direct Success Measure Conclusion Future Work A Original Honours Proposal 52 B Availability of total GP v

6 List of Tables 5.1 Game specific terminals used for three player spoof, guessing 3 rd Deterministic, non-adaptive Spoof opponents used in this study. c = the selected number of coins held by the player, n = the number of players in the game Non-deterministic, non-adaptive Spoof opponents used in this study. c = the selected number of coins held by the player, n = the number of players in the game Strategy GP5 3 learns an optimal strategy Maximum attainable performance at each deterministic table Performance of strategies at each table (guessing 3 rd ) Performance of strategies against adapting opponents (guessing 3 rd ) Play level achieved using direct versus pseudo-success measures Optimality strategy for GP5 3 (direct success evaluation) B.1 Total coin guess availability for GP vi

7 List of Figures 6.1 Fitness Profile for GP Fitness Profile for T Strategy GP Visual representation of strategy GP Visual representation of strategy T Fitness Profiles for GP4 3 with varying fitness measures Fitness Profile for the evolution of G2 3 (direct success evaluation) Comparable profile for the evolution of G vii

8 CHAPTER 1 Introduction The video game industry is an area of high and increasing profitability, with over $US 6 billion spent on console video game software in 2005 [2]. In order to attract buyers, there is increasing demand to design artificial computer players capable of entertaining humans. In general, attempts to create such players typically try to simulate human behaviour by encoding good features (strategies) employed by strong human competitors in an artificial opponent. This often results in overly simplistic and predictable opponents whose flaws are easily exploitable because they miss one crucial part of human play, the ability to learn about the game and adapt to their opponents. Desirable is the creation of an artificial opponent indistinguishable from a human player; one that is able to adopt various gameplaying strategies depending on the strategies employed by its opponent. In certain types of games like Bridge, Poker, and Scrabble, players do not have complete knowledge about the state of the game and must make value decisions about their relative strength using only the public information available to them. Such games are called games of imperfect information on account of the unknown information regarding the state of play (e.g. hidden opponent cards in Poker). The success of a player depends on their ability to handle this incomplete information and, indeed, correctly dealing with this incomplete information is essential for optimal performance. Due to the non-deterministic nature of these games, the task of determining satisfactory artificial opponents is extremely broad and difficult to program for in advance attempts at which often not exploiting the full potential for functionality but rather only a small subset conceived by the designer. Typically, the large branching factors of these games render standard search techniques (e.g. minimax) less useful. Spoof is a game of imperfect information played by two or more players. It is a simple guessing game requiring players to determine an unknown number using only partial knowledge received from the publicly announced guesses of the number made by other players (more information about the game of Spoof is available in Chapter 4). Like the games of Roshambo (rock-paper-scissors) and IPD, opponent modelling (construction of a model of an opponent s playing 1

9 style, typically in order to exploit inherent weaknesses in their play) in the game of Spoof is crucial. Given a model of an opponent s strategy, the model can be analysed to discover weaknesses and predictabilities in the opponent s strategy and a counter-strategy determined. A recently popularised method for solving combinatorial optimisation problems is evolutionary computation. Evolutionary computation is the term used to describe the different methods used in computer science that employ the principle of Darwinian natural selection as a tool to solve problems in computers. A population of candidate solutions evolves towards satisfactory solutions to a given problem by simulated evolution. Natural selection is modelled by a function that is used to assess the quality of these solutions (the fitness function). Rewarding those solutions that are more fit, Darwinian selection pressure drives individuals towards better solutions until the population evolves to solve the problem in question. Research in the field of evolutionary computation has witnessed successes in numerous application areas, including engineering, natural sciences, business and economics [5]. By utilising the inherent learning capabilities of natural selection, programs capable of learning and adapting in noisy amd dynamic environments are possible. In particular, the ability of these techniques to adapt to a changing environment seems well-suited to the application of developing game-playing strategies against different and possibly adapting opponents. Indeed, opponent modelling through the use of evolutionary computation techniques has led to some notable success in games of imperfect information [4, 6, 7, 10, 11]. Many published successes provide encouragement toward the use of evolutionary computation techniques in general as well as their particular application toward opponent modelling in games of imperfect information. Genetic programming is one form of evolutionary computation introduced by Koza [15], which defines genetic operators that directly manipulate tree structured computer programs. Genetic programming has been used extensively for a myriad of problems [16, 18, 19], including opponent modelling and strategy development in games [8, 14, 17]. In this paper, we examine the use of genetic programming techniques to create Spoof players capable of exploiting weaknesses in different opponent playing styles in order to develop successful strategies for play and compare this with a simple look-up table based approach. The members of our evolving population are program trees each representing a guessing strategy with which to play a particular game. Candidate solutions are subjected to evolutionary pressure, driving the discovery of successful strategies while less successful strategies are discarded. We analyse numerous game situations against opponents of varying playing 2

10 styles. We show that our approach achieves strategical optimality for almost all cases, with near-optimal strategies resulting for others. Our results confirm that specialisation is essential for optimal play. We test our approach against dynamic as well as static game scenarios (i.e. adaptive opponents). The strength of our genetic programming approach compared to the look-up table approach is most evident here. We further investigate the effects of noise on the performance of resulting strategies. We use a direct success evaluation technique in comparison to a more indirect, but intuitive evaluation mechanism. We also experiment with the level of noise we introduce in the evaluation process by varying the number of fitness cases used to evaluate individual strategies. 3

11 CHAPTER 2 Learning in Games A great deal of AI research is conducted around the topic of games. Games are a suitable testbed in which to pursue further artificial intelligence and machine learning because they involve similar problems encountered in real life. The difference being with games is that that they are much simpler and more clearly defined. Games have a finite number of rules and actions for players to make and they have some well understood goal. Successful approaches in games can often be applied to similar real life problems. Games can also be used as a benchmark with which to test new theoretical concepts and how their performance compares with other strategies. 2.1 Imperfect Information Games Games such as Bridge, Poker and Scrabble are games of imperfect information games in which not all the information about the state of the play is known (e.g. hidden opponent cards in Poker). Due to the non-deterministic nature of such games, the task of determining satisfactory strategies is extremely broad and difficult to program in advance. 2.2 Need for Opponent Modelling and Adaption Currently, once a game has been completed a lot of its replay value is afforded due to games multiplayer functionality. Other than the mentality of playing with your friends in a virtual world, a key reason for the lastability of such games is the variation and interactive experience that they offer. Having players capable of learning and countering game play strategies will help create a gaming experience capable of maintaining human players interest for longer. The more knowledge a player has concerning its environment, the better the strategies it shall be able to develop. In multiple player games, this knowledge 4

12 includes information about the other players. Opponent modelling is required in many games in order to maximise winnings against a variety of different opponents, where no general game-playing strategy can compete (e.g. Roshambo and IPD). Apart from this, opponent modelling in games is desirable even in situations where general game-playing strategies are effective. For example, a computer opponent for the game of Pong could be programmed to be perfect ; to always return the ball thus becoming unbeatable. Although a contrived example, this illustrates the case where a perfect player is not desirable and a more adaptive albeit less optimal player may aid in the game s entertainment value. Both related and important is the need for an player to exploit a (possibly implicit) model of its opponents, but also to continously update this model (and thereby its playing strategy). A learning opponent may have learnt how to exploit a certain type of human player but as this player varies their strategy (or a new opponent comes along), it is important to be able to redirect the players evolution toward the new optima that now exist. The ability of evolutionary algorithms to handle such environmental changes makes them a promising option for this sort of learning. 2.3 Previous Approaches Learning in games has been attempted in various different ways, however for brevity only a few will be mentioned here. Decision trees are often used to show the transition of game states given available actions. Often the types of machine learning mechanisms that can be utilised for a game will depend on the branching factor and depth of the decision tree Reinforcement Learning by Look-up Tables Named after animal learning, reinforcement learning involved learning actions based on experience. A reinforced learning agent gains information about its environment by exploring the effect of different actions given particular states. This information can then be exploited by the agent to achieve its goal. When a certain state-action pairing is found to be beneficial, i.e. the agent s goal has been achieved, then the agent remembers this. Usually, a reward value is associated with each state-action pair based on the success or failure of executing it. A simple form of reinforcement learning is through look-up tables that allow an artificial player to learn the pay-off for each action from each given state. The 5

13 look-up table approach is called so due to the process that such a player goes through when exploiting its data; it looks up the current game state, and selects the action with the greatest reward. Consider the deterministic game of tic-tac-toe. If an action leads to a victory from a particular state, then that state-action pair will have its weighting altered so that next time the player encounters this same state, the agent will have learnt which move to make (or not to make). Another approach may be to additionally allocate a positive weighting to every action made during the game because of the win that ultimately resulted. In this example, only the end result of the game is used to alter the table. The learning player learns solely by its ultimate objective winning the game. Occasionally, especially when playing against novice opponents, a learning player may win a game despite having made some bad move. This noise often drives agents further away from the optimal action, which becomes a more prevalant problem in non-deterministic games which normally introduce considerable noise into the evaluation process. For tic-tac-toe, a simple table-based learning mechanism is capable of producing optimal strategies due to the small number of game states that exist. Because look-up tables take into account all possible states and actions, such a strategy could not be utilised for a game such as Chess due to its intractability; it is currently impossible to store every possible game state for the game of Chess Evolutionary Algorithms for Opponent Modelling Evolutionary algorithms (EA) is a term used to describe the different methods in computer science that employ the principle of Darwinian natural selection as an optimisation tool to solve problems using computers. Using a population of candidate solutions and a means of assessing these solutions (the objective function), evolutionary computation techniques search through the space of possible solutions in an attempt to find one that is satisfactory for the problem to be solved. The objective function provides selection pressure which drives individuals towards more optimal solutions for the problem at hand (evolutionary algorithms are explained in detail in Chapter 3). The evolutionary algorithm paradigm has witnessed successes in numerous application areas, including engineering, natural sciences, business and economics [5]. Using the inherent learning capabilities of natural selection, it is possible for learning to take place in noisy and dynamic environments. These techniques seem well-suited to the application of developing game-playing strategies against a wide range of varied, potentially adapting opponents. Indeed, the application of evolutionary algorithms to the task of opponent modelling in games of im- 6

14 perfect information has led to some notable success. For example, Azaria and Sipper have produced a very strong player in human terms for the game of Backgammon purely through playing against itself [4]. Evolutionary approaches have also been applied to Poker by Barone and While [6, 7]. Their approach shows the importance of specialisation and adaptation in order to maximise winnings. Using evolutionary techniques for updating learned models of opponents, their approach has produced an evolving computer poker player capable of out-performing a simple, but competent, static player. Evolutionary approaches have also been applied to the traditional game of the Iterated Prisoner s Dilemma (IPD) [10,13]. The Iterated Prisoner s Dilemma is often used as a model of emergent behaviour between self-interested individuals. Axelrod s [3] work involved evolving game-playing strategies for the IPD. Although some general well-known methods for playing IPD exist (Tit-for-tat and Grim, for example), Axelrod showed that there exists no best strategy for playing the IPD in an evolving population of opponents because their success was dependent on the other strategies in the population. Genetic programming is a specific instance of evolutionary algorithm where computer programs play the role of individuals (genetic programming is explained in detail in Section 3.4). Genetic programming has been used extensively for a myriad of problems [16, 18, 19], including opponent modelling and strategy development in games [8, 14, 17]. These published successes, among others, provide encouragement toward the use of evolutionary computation techniques and their application toward opponent modelling in games of imperfect information. 7

15 CHAPTER 3 Evolutionary Algorithms Charles Darwin s theory of evolution [9] explains how complex organisms evolve from simpler organisms over time due to a process known as natural selection. An individual organism s genetic structure or genotype leads to certain observable characteristics of that individual (its phenotype) which contributes to how well it is able to survive. An individual of a population is affected by other members of the population (e.g., being attacked by predators, competing for food, and mating). Also, an individual is affected by its environment (e.g., the climate, access to fresh water, and the availability of food). The better an individual performs in the conditions imposed by the environment and other members of the population, the greater is its chance to live longer and create offspring (thus passing on its genetic information). The term evolutionary algorithm (EA) [5] refers to a number of computer techniques inspired by natural selection theory in order to converge upon satisfactory solutions within a solution set. EAs are applied to solving combinatorial optimization problems and so, can be viewed as a kind of searching algorithm. A guided random search sifts through the potential solution set for optimal solutions to a problem. EAs are non-biased search algorithms as they do not make any assumptions concerning the fitness landscape. Individuals survive and reproduce based on how well they fair according to some quality criteria, often referred to as the objective function [5, 15]. The objective function evaluates individuals and gives them a fitness score. The fitness measure provides the basis for competition which drives the evolution by guiding survival and reproduction within the population. Those individuals with a better fitness score will have a greater probability of being selected for reproduction. Offspring are generated by means of variation operators analogous to their biological equivalents, such as recombination and/or mutation. The general evolutionary algorithm begins by creating an initial (typically random) population of sample solutions termed generation zero. The entire population is evaluated by the objective function. While the termination criteria 8

16 has not been met, an offspring population P (t) is created by applying genetic operators to members of the current generation selected via the objective function [15]. The offspring population is evaluated and the next generation P (t + 1) is then selected from P (t) and some (possibly empty) subset of P (t). Many texts fail to mention that two rounds of selection typically occur per generation; one to decide which individuals reproduce, and one to decide which individuals are included in the next generation. This process continues until some termination criteria has been met. The genetic operators which bring about variation in offspring often also draw their influence from nature, for example recombination and mutation. Implementations of these operators are heavily dictated by both the problem domain and the chosen representation scheme of individuals. A number of fairly separate approaches to the field of evolutionary algorithms exist today genetic algorithms, evolutionary programming, evolution strategies, and genetic programming. Many variations of each of these approaches have been derived, with the major differences being the individuals representation, the design and application of genetic operators, and the method of selection. The remainder of this chapter will cover the basics of these approaches as well as a detailed discussion of genetic programming. 3.1 Genetic Algorithms Genetic algorithms [5] (GA) maintain a population of abstract representations of candidate solutions (called chromosomes). Generally GAs have their chromosomes as fixed length binary strings, however variable length strings and other representations are possible. Due to the fact that chromosomes are representations for individual candidate solutions, often the terms are used interchangeably. Recombination is normally considered as the driving force of the evolution process in GAs. The most common types of recombination are one-point crossover, two-point crossover, and uniform crossover. All of these forms of recombination involve 2 parents, however uniform crossover only produces 1 offspring whereas one-point and two-point crossover both produce 2. One-point crossover randomly determines a crossover point at which to split the two parents and recombine their resultant substrings to form 2 children. For example, for a crossover point chosen after the 4th bit, the two parents and will produce children and Two-point crossover works in the same way except 2 crossover points are selected so that the same parents in the previous example, given crossover points of 1 and 4, will produce the children and 9

17 The final type of crossover we will discuss is uniform crossover. In uniform crossover a new offspring is built one bit at a time. Each bit is stochastically selected from either of its 2 parents in the corresponding position. Assuming a fixed-length binary string representation, the mutation operator usually allows a probability for each bit in an individual s representation to be flipped (i.e. from 0 to 1, or from 1 to 0). Mutation is a necessary requirement to maintain diversity throughout the population, however it is usually not the driving force for change. A typical probability for mutation would be about 1/n where n is the string length (i.e. such that on average one bit gets flipped). Consider the situation where the globally optimal solution is the binary string Now what if every single member of our population has a 0 in its first position; it will be impossible to achieve the global optimum via crossover alone. 3.2 Evolutionary Programming Evolutionary programming (EP) [5] works by observing the world and evolving Finite State Machines (FSMs) able to form predictions based on those observations. A FSM or finite automaton is an abstract machine that has memory of which state it is in. Given an input, a FSM can change its state and/or return output. The FSM consists of a finite set of states and rules governing the transition between these states. Consider an environment where a sequence of integers are classified as being a square number; either false (0), or true (1). Thus, the binary sequence , describes the location of square numbers for the integers 1, 2, 3, 4, 5, 6, 7, 8, 9..., respectively. So, the aim is to produce a FSM that will correctly predict the next symbol in the sequence given a sequence of known symbols. E.g. given the sequence 110, a correct FSM would return 1 as its next output. The objective function for such could be a fitness of 1 for correct and 0 for incorrect. Usually, mutation is the only variation operator that is used in Evolutionary Programming. Each generation, each individual chosen for reproduction is mutated to create an offspring. A number of possible mutations may be applied at this stage, these include: adding or removing a state, changing a state s output symbol, changing a transition, or changing the starting state. Once the offspring have been produced, they are evaluated and some selection scheme dictates which individuals will make up the new generation. 10

18 3.3 Evolution Strategies Evolutionary strategies (ES) were initially devised to solve engineering design problems. The representation for individuals is typically a fixed-length realvalued vector, although variable length approaches exist [5]. Evolution strategies commonly uses Gaussian mutation as the primary genetic operator for evolution. Gaussian mutation generates an offspring from a single individual by adding a random value from a Gaussian distribution to each element of the individuals vector. Another operator often used in ESs is intermediate recombination. This involves 2 or more parents produce 1 new offspring created by taking the parents mean value of each vector element. Where ES differ from other methods is that the genetic operators act upon the phenotype directly. The real-valued vector representation of candidate solutions allows for a less rigid mutation and interpolation between individuals. 3.4 Genetic Programming Genetic programming (GP) [5,15] is an evolutionary algorithm approach to solving combinatorial optimization problems. The population of individuals undergoing evolution in this algorithm are themselves computer programs Representation The usual representation scheme of an individual is a tree structure called a LISP expression which is comprised of functions and terminals. LISP expressions can be used to represent complex program trees that can be made to handle multiple types, conditional statements, and iteration. Consider a simple integer arithmetic program (3 + 6)/2. Here the function nodes being used are addition (+) and division (/), both of which accept two arguments as input. The terminal nodes of this program tree are 3, 6 and 2; terminals, by definition, have no arguments. The root of the tree is / with both its arguments (3 + 6) and 2 branching from it. The left argument of the root function is itself a function (+) with terminal arguments being 3 and 6. Koza s closure requirement states that the input of all functions should be able to handle all terminals and the outputs of all functions [15]. The reason for this is covered in Section

19 3.4.2 Population Initialisation Koza describes three ways in which the random population is to be initialised prior to commencing evolution: full, grow, and ramped-half-and-half. Each of these methods are typically controlled such that no duplicate individuals are created in the starting generation. The full method creates a random population of individuals, each being of the same predetermined depth. Starting from the root node, a random function is chosen and until the maximum depth has been reached this process continues recursively for each of the branches of that function (i.e. its arguments). Upon reaching the maximum depth, random terminals are chosen rather than functions. The grow method creates a population of randomly composed individuals up to a specified depth. Starting from the root node, a node is randomly selected from all available functions and terminals. If it is a function, then this process is recursively continued for each of the function s branches up to the specified depth. If the maximum depth has been reached, then a random terminal is selected for that node. If the node becomes a terminal, then that branch will finish (possibly short of the maximum depth) and no further actions are required. This method provides a range of structures throughout the population up to the specified depth. The ramped-half-and-half method specifies a maximum depth, and the population is divided up equally into as many sections. Each depth level produces half the individuals using the grow method, and half the individuals using the full method. This generates a population with a diverse range of randomly sized and randomly structured individuals Fitness In order to drive the population towards optima, we require a way of comparing an individual s strength (or fitness) with respect to other individuals in the population. The raw fitness is the unaltered measure of how good (or bad) an individual fares with respect to the objective function. For example, if our programs describe a strategy for game playing then the number of games won could be a fitness function, in which case higher values are desirable. Another option would be to measure how far an individual deviates from some (perhaps unobtainable) ideal, in which case lower values are desirable. Koza [15] discusses a number of adjustments to be made to fitness values however for brevity these will not be detailed. 12

20 3.4.4 Selection Schemes Koza [15] uses a number of selection schemes to decide which individuals will reproduce as well as which individuals will be included in the next generation. The most common methods are: fitness proportionate selection, greedy overselection, and tournament selection. Selection methods that are applied to all members of the previous population plus all new offspring are known as elitist selection schemes. When only new offspring are considered by the selection scheme for the new generation it is called a generational selection scheme. Fitness proportionate selection is where fitter individuals have a greater probability of being selected, but their selection is not guaranteed. Greedy overselection involves skewing selection towards elite members of the population in hope of lowering the number of generations required for the algorithm to terminate. In tournament selection, a number of individuals are randomly chosen from a population to be included in a tournament. The fittest individual amongst those who entered the tournament is regarded as the winner and is then selected Parsimony Often when evolving solutions it is desirable to not only have a correct solution, but a parsimonious one as well. For example, consider the case where we wish to evolve an expression returning the value 1. One solution may simply be to return the constant 1. Another functionally perfect solution, although lacking parsimony, could be Having a fitness based on the external behaviour alone (i.e. based on phenotypic traits) would not provide any guidance for parsimony to evolve. A common approach to encourage parsimonious design is to have a less influential component to the objective function that rewards shorter solutions. For example, assume that raw fitness is measured as the deviation from the correct solution 1, so a fitness of 0 indicates a functionally correct solution. We may also include the length of our solution (indicated by the number of terminals and functions used) to be considered when evaluating an individual. Parsimony is of secondary importance and so we add to the raw fitness a fraction of the solution length (say, one hundredth), so that functionality will not be compromised in favor of simplicity, but selection will favour simpler functionally equivalent solutions. 13

21 3.4.6 Genetic Operators A number of genetic operators are used in genetic programming to enable program trees to evolve. Recall that the representation for an individual in Genetic Programming is a tree structure and also that the closure requirement ensures that all functions are able to handle as input any terminal and the result of any function. Although closure is not necessary for a GP to be successful, many of the operators (as they are described in this section) assume closure as a prerequisite. The driving force of evolution for genetic programming is commonly provided through asexual reproduction and cross-over a form of recombination analogous to sexual reproduction in organisms. Asexual reproduction is simply allowing an individual to pass completely unchanged into the next generation. Cross-over requires 2 parent individuals to combine their genotypes resulting in the creation of 2 new children individuals. After selecting 2 parents to take part in cross-over, the first step in producing children is to produce copies of each parent. The cross-over operator then selects a random sub-tree from each copy and swaps them with eachother, resulting in 2 new children based entirely from the 2 parents involved. A secondary genetic operator to Genetic Programming is mutation and, although other approaches exist, its purpose is primarily considered as introducing variation and diversity within the population, rather than driving it towards optimality. Mutation involves 1 parent producing 1 offspring, one usual method of mutation in genetic programming is as follows. A copy of the parent is made and a single node of the copied program tree is selected at random. This node is then replaced by a randomly generated tree. Another form of mutation, which does not alter the structure of the program, is to randomly select a node for replacement. If this node is a terminal, then it is replaced with some other randomly selected terminal; if a function node is chosen, it is replaced by some other randomly selected function (with the same number of arguments). As with cross-over, it is common for there to be restrictions concerning which nodes can be replaced and how large the generated sub-tree can be. Other operators less commonly used are editing, encapsulation, permutation and decimation. Typically when these operators are implemented they are applied less frequently than cross-over and mutation (i.e. not every generation). Earlier works in Genetic Programming have largely ignored such operators but current research is giving them more consideration. 14

22 3.4.7 GP System Parameters Once the terminal and function sets have been decided there still remain a number of parameters that must be decided upon before running a GP system. These decisions are very important as they greatly affect the quality of the resulting solution as well as the time taken to achieve that result. Unfortunately, there are no hard and fast rules in determining these required parameters. The population size must be decided upon. A larger population allows greater exploration per generation and increases the chance of evolving a solution, but having it too big is wasteful and will slow down the GP system. Generally speaking, the greater the complexity of the problem at hand, the greater the size of the population required to solve it. Although a fixed population is usually used there have been experiments conducted involving a changing population size; for example, an initially very large population size that drops after a number of generations. Termination criteria must be selected. Evolutionary algorithms do not have defined end points, the GP system must have some way of knowing when it should stop. A common choice is run the GP system until some satisfactory level of fitness has been achieved. Another would be to stop when it appears that the population has stopped improving. Often the algorithm is simply run for a defined number of generations. This is especially the case for research applications or when the user is performing trial runs to determine more suitable parameters! Assuming that a selection scheme and the genetic operators have been decided, the probabilities with which the genetic operators are applied must be decided. What will be the probability of cross-over,..of asexual reproduction,..of mutation? The problem of deciding many of the variables in genetic programming is no simple task. In fact, some have suggested using a second evolutionary algorithm to optimise the application of a first a concept known as meta-evolutionary optimisation [12]. 15

23 CHAPTER 4 The Game of Spoof Spoof is a multi-player game of imperfect information. This seemingly simple game has an extremely broad scope for potential strategy development. Minimising the information made available to opponents, bluffing, probability analysis, and opponent modelling are all elements which can be used to formulate playing strategies. 4.1 Rules of Spoof Spoof is played by two or more players. The game begins with players each selecting a number of tokens (typically coins) from 0 to 3 (called the player s selection), which remain hidden from all other players. In turn, each player attempts to guess the total number of coins held by all players (called the player s guess) with the constraint that no player may repeat a previous player s total, nor may a player guess a negative amount nor a value greater than 3 times the number of players. The winner of the game is the player who correctly guesses the total number of coins. The initial guessing order is generally determined by randomly selecting a player to guess first and work clockwise from that player for the remaining players. In the event that no player guesses the correct total, the game is deemed a draw and is typically repeated. The repeated game is usually altered such that the guessing order is shifted in some preconceived direction however for our experiments the game is repeated with the original guessing order unchanged. This simplified game of Spoof as we are considering for our analysis ensures that our learning players need only develop one particular guessing strategy at a time, and that this strategy need not deal with subsequent rounds of play. 16

24 4.2 Spoof Strategy At first thought, it may seem that the game is purely random and that little can be done other than to guess the maximum of the probability distribution of possible totals. However, as players announce their guesses, they may well be providing information about the number of coins they have selected. Also recall that guesses may not be repeated and that the position in the game a player is forced to act (announce a guess) induces a trade-off between what information is available and opportunity to guess a total. Guessing first means all possible totals are available to be guessed, but no information about the opponents selection is available from this game. Guessing last provides maximal information about the selections of the other players (and, assuming rational play, may well mean the total can be determined with a high degree of certainty), but the correct total is likely to have already been announced by another player. A clear trade-off arises acting first provides minimal information, but maximal opportunity; acting last provides maximal information, but minimal opportunity to guess the correct total. Consider a two player game where the first player guesses a total of 5. Assuming rational play, this player must have selected either 2 or 3 coins, otherwise a total of 5 would be impossible. The second player can now use this information in making their guess, and should announce a total of 2 or 3 plus their own selection. Using this approach, the player improves their chances of immediately winning the game (without replay) from 25% (with no information about the first player s selection) to 50% (with knowledge that the first player s selection is one of two possibilities). Similar analysis is possible for other game states in Spoof [1], but the analysis becomes increasingly complex as the number of players rises. Opponent modelling in the game of Spoof is crucial for optimal performance. For example, consider the problem of acting first in three player Spoof. A general strategy for acting in this position is to guess the number of coins one is holding plus 3 (as 3 is probabilistically the most likely outcome for the total of the remaining players coins). However, this strategy is only sound if both opponents choose their hidden coins uniformly randomly. Consider instead, if both opponents tend never to hold more than one coin. The previous strategy now performs poorly, and a better opponent-specific strategy should be used instead (a better strategy will be to guess 1 more than one s own selection). Indeed, experience shows that human players often do not select their coins randomly (preferring certain coin choices or patterns over others), and more typically, provide information about their selection in the way they guess. Our experience has shown that it is especially the case that human players use the same guessing algorithm time and time again. 17

25 It is also possible to play the game of Spoof with the specific aim of minimising the information we provide to our opponents. Consider a simple two player game in which we are the first guessing player. If we guess 3, then this will provide no information about our selection to the opponent. The reason for this is that, no matter what our selection may be, a total of 3 is always possible. This idea of giving up minimal information can easily be combined with opponent modelling strategies. For example, if we have learnt that our opponent usually holds either 2 or 3 coins, then one strategy would be to select either 0 or 1 coin and guess a total of 3 as before. 18

26 CHAPTER 5 Building Adaptive Spoof Players We use genetic programming to build models of opponents strategies in order to create a strong artificial Spoof player. Our work does not follow a traditional opponent modelling approach where a direct model of the opponent s strategy is built from experience and then analysed for weaknesses. Instead, we use a more indirect approach where evolution will implicitly build the model by evolving the best countering strategy over time (i.e. a model of the game environment, including all opponents therein). The aim, however, reminas the same to exploit weaknesses in an individual s strategy in order to maximise performance of our automated player. We experiment with a table-based approach so that we may compare both the learning performance and playing ability of our resulting Spoof players. 5.1 The Learning Environment Opponent specific information for the game of Spoof can be exploited in two ways. Information made available during a game (i.e. previous players guesses) can be used to determine which guess to make for that game. Information made available after a game is over (i.e. all players selection and guesses) can be used to determine both the selection and the guess to make for later games versus the same opponent(s). Information learnt can be applied across games because players are not anonymous, we can learn how particular opponents play and hope to take advantage of this in future games. Our adaptive players learn by observation alone; that is, they learn an implicit model of the opponents by formulating which guess to make. For all experiments, we do not allow our adaptive players to choose a coin selection, this has been set as uniformly random. One reason this was done was to ensure that our evolved players would be inherently less predictable than if they were to select coins non-randomly (note that predictability would only be of concern against adaptive opponentsnot be an issue). Also, learning opponent strategies for the 19

27 game of Spoof potentially involves exploring the game s set space. By fixing a random selection function (in conjunction with the pseudo-success measure to be described later) there is no need to have separate exploration and exploitation strategies. This is because all the required knowledge can be made available regardless of what guess was made (i.e. the correct total will always be revealed to players upon the games conclusion) Genetic Program Players Learning We employ a generational evolution system which means that each generation is entirely made up of the offspring of the previous one, that is, no members of one generation will have passage into the next; this allows the fittest individual to get worse from one generation to the next. While not strictly elitist, the approach we have used is to keep an external copy (or clone, in keeping with biological terminology) of the best individual seen thus far (which is returned as the solution for a run). This allows us the benefit of utilising the best seen individual, without allowing it to potentially dominate the search for other solutions. For coin selection, we force our player to always choose randomly. This of course may be a poor choice (being able to skew the probability distribution of totals may well be advantageous), but this approach prevents our player from being predictable. This also simplifies the problem to be solved, allowing evolutionary intelligence to focus upon learning guessing strategies to exploit opponents and maximise performance. For guess determination, we use the genetic programming paradigm to evolve an algorithm to make the guess. We use a population of candidate genetic programs that are evaluated to determine how well they play the game. Over time, evolutionary selection pressure drives the population towards good solutions. We use version 2b of GPsys [20]. Each candidate solution in the population consists of a program tree that determines the guess for the player. Program trees are mixed-type, using float and boolean types, with the root node constrained to evaluate to a float. This float value is cast down to an integer, forming the guess made by the player. When this integer value is invalid (the guess may have already been made by an earlier player), we automatically adjust the guess to the next closest valid integer, by checking above and below the desired value by an incrementally increasing amount (one more is tried before one less). This allows for less complex program trees, as they need not be burdened with the additional task of ensuring unique guesses. To enable our evolving player to make an informed guess, we equip the genetic 20

28 Table 5.1: Game specific terminals used for three player spoof, guessing 3 rd Variable Explanation p1guess The first player s publicly announced guess. p2guess The second player s publicly announced guess. CoinsHeld The number of coins selected by the player. NumPlayers The number of players in the game. For the majority of experiments in this study, this terminal is constant (3). programming system with a number of game specific terminals that can be used in a candidate solution s program tree. We include a number of game-specific terminals to be used in constructing our individual strategies: the number of players in the game, the number of coins that the player has selected for holding, and the announced guesses of each of the players that guess prior to this player. The terminals for a three player game of spoof when guessing 3 rd is detailed in Table 5.1. We also allowed the genetic programming system use of four numerical constants, 0, 1, 2, and 3 to represent the four possible coin-held values, standard arithmetic operators (addition, subtraction, multiplication, and division), standard comparison operators (greater-than, less-than, and equal-to), and boolean operator nodes (negation, conjunction, and disjunction). Also, a conditional selection mechanism (the if function) is included to select between sub-programs. Note that the if function expects three arguments, the first a boolean condition, and the second and third two sub-programs (the second parameter sub-program is evaluated if the first parameter evaluates to true, otherwise the third parameter sub-program in evaluated). Both parameter sub-programs must evaluate to floats. It should be noted that our approach restricts genetic operators, rather than adhering to Koza s closure requirement [15]; for example, only compatible subtrees are considered for cross-over. In all experiments, the depth of the candidate program trees were limited to 10 and the initial population was created through the ramped-half-and-half initialisation method for all depths 1 through to 10. A steady-state population of size 50 is used throughout the evolution. Evolution is limited to a span of 5000 generations. 21

Creating a Dominion AI Using Genetic Algorithms

Creating a Dominion AI Using Genetic Algorithms Abstract Mok Ming Foong Dominion is a deck-building card game. It allows for complex strategies, has an aspect of randomness in card drawing, and no obvious