Evolving Adaptive Play for the Game of Spoof. Mark Wittkamp

Size: px
Start display at page:

Download "Evolving Adaptive Play for the Game of Spoof. Mark Wittkamp"

Transcription

1 Evolving Adaptive Play for the Game of Spoof Mark Wittkamp This report is submitted as partial fulfilment of the requirements for the Honours Programme of the School of Computer Science and Software Engineering, The University of Western Australia, 2006

2 Abstract For game playing in general it is important for players to be adaptive, this is particularly true for games where no optimal fixed strategy is known to exist. Adaptive artificial opponents capable of learning and opponent modelling are highly desirable in computer games. Typically, a great deal of a game s ability to maintain the interest of human players is provided by multiplayer functionality due to the unpredictable and changing game environment that this entails. It is reasonable to expect that artificial opponents mimicking the observable characteristics of human players through adaptive play would significantly benefit many games lastability. Spoof is a multiple player game of imperfect information for which the success of a player is largely dictated by its ability to build models of its opponent(s) so that their weaknesses may be identified and exploited. We present our approach to opponent modelling in the game of Spoof through the use of evolutionary algorithms, more specifically - genetic programming. Genetic programming involves a guided random search of the solution space to a given problem by evolving a population of candidate solutions which take the form of program trees. Genetic programming shows potential for games of imperfect information or other games where tree searching algorithms are often infeasible due to the games intractability. The suitability of genetic programming for opponent modelling is substantiated by comparison with a simple lookup-table approach for learning. We demonstrate that specialisation and opponent modelling is required for optimal play in the game of Spoof by contrasting evolved playing strategies with a number of fixed strategies comparable to those employed by most human players. Keywords: Games of Imperfection, Spoof, Genetic Programming, Opponent Modelling, Noise CR Categories: A.2, I.7.2 ii

3 Acknowledgements The author wishes to thank Dr. Luigi Barone from the University of Western Australia for his continued guidance and support in supervising this project, in particular for his work with regard to the CIG 2006 submission. Thanks are also extended to Dr. Lyndon While for his suggestions during the preliminary stages of this work. iii

4 Contents Abstract Acknowledgements ii iii 1 Introduction 1 2 Learning in Games Imperfect Information Games Need for Opponent Modelling and Adaption Previous Approaches Reinforcement Learning by Look-up Tables Evolutionary Algorithms for Opponent Modelling Evolutionary Algorithms Genetic Algorithms Evolutionary Programming Evolution Strategies Genetic Programming Representation Population Initialisation Fitness Selection Schemes Parsimony Genetic Operators GP System Parameters The Game of Spoof Rules of Spoof iv

5 4.2 Spoof Strategy Building Adaptive Spoof Players The Learning Environment Genetic Program Players Learning Experimental Results Static Opponents Look-up Table Learning Three Player Spoof, guessing 3rd Optimality Performance Results Strategy Analysis Learning Against Adapting Opponents Performance Results Number of Fitness Cases Direct Success Measure Conclusion Future Work A Original Honours Proposal 52 B Availability of total GP v

6 List of Tables 5.1 Game specific terminals used for three player spoof, guessing 3 rd Deterministic, non-adaptive Spoof opponents used in this study. c = the selected number of coins held by the player, n = the number of players in the game Non-deterministic, non-adaptive Spoof opponents used in this study. c = the selected number of coins held by the player, n = the number of players in the game Strategy GP5 3 learns an optimal strategy Maximum attainable performance at each deterministic table Performance of strategies at each table (guessing 3 rd ) Performance of strategies against adapting opponents (guessing 3 rd ) Play level achieved using direct versus pseudo-success measures Optimality strategy for GP5 3 (direct success evaluation) B.1 Total coin guess availability for GP vi

7 List of Figures 6.1 Fitness Profile for GP Fitness Profile for T Strategy GP Visual representation of strategy GP Visual representation of strategy T Fitness Profiles for GP4 3 with varying fitness measures Fitness Profile for the evolution of G2 3 (direct success evaluation) Comparable profile for the evolution of G vii

8 CHAPTER 1 Introduction The video game industry is an area of high and increasing profitability, with over $US 6 billion spent on console video game software in 2005 [2]. In order to attract buyers, there is increasing demand to design artificial computer players capable of entertaining humans. In general, attempts to create such players typically try to simulate human behaviour by encoding good features (strategies) employed by strong human competitors in an artificial opponent. This often results in overly simplistic and predictable opponents whose flaws are easily exploitable because they miss one crucial part of human play, the ability to learn about the game and adapt to their opponents. Desirable is the creation of an artificial opponent indistinguishable from a human player; one that is able to adopt various gameplaying strategies depending on the strategies employed by its opponent. In certain types of games like Bridge, Poker, and Scrabble, players do not have complete knowledge about the state of the game and must make value decisions about their relative strength using only the public information available to them. Such games are called games of imperfect information on account of the unknown information regarding the state of play (e.g. hidden opponent cards in Poker). The success of a player depends on their ability to handle this incomplete information and, indeed, correctly dealing with this incomplete information is essential for optimal performance. Due to the non-deterministic nature of these games, the task of determining satisfactory artificial opponents is extremely broad and difficult to program for in advance attempts at which often not exploiting the full potential for functionality but rather only a small subset conceived by the designer. Typically, the large branching factors of these games render standard search techniques (e.g. minimax) less useful. Spoof is a game of imperfect information played by two or more players. It is a simple guessing game requiring players to determine an unknown number using only partial knowledge received from the publicly announced guesses of the number made by other players (more information about the game of Spoof is available in Chapter 4). Like the games of Roshambo (rock-paper-scissors) and IPD, opponent modelling (construction of a model of an opponent s playing 1

9 style, typically in order to exploit inherent weaknesses in their play) in the game of Spoof is crucial. Given a model of an opponent s strategy, the model can be analysed to discover weaknesses and predictabilities in the opponent s strategy and a counter-strategy determined. A recently popularised method for solving combinatorial optimisation problems is evolutionary computation. Evolutionary computation is the term used to describe the different methods used in computer science that employ the principle of Darwinian natural selection as a tool to solve problems in computers. A population of candidate solutions evolves towards satisfactory solutions to a given problem by simulated evolution. Natural selection is modelled by a function that is used to assess the quality of these solutions (the fitness function). Rewarding those solutions that are more fit, Darwinian selection pressure drives individuals towards better solutions until the population evolves to solve the problem in question. Research in the field of evolutionary computation has witnessed successes in numerous application areas, including engineering, natural sciences, business and economics [5]. By utilising the inherent learning capabilities of natural selection, programs capable of learning and adapting in noisy amd dynamic environments are possible. In particular, the ability of these techniques to adapt to a changing environment seems well-suited to the application of developing game-playing strategies against different and possibly adapting opponents. Indeed, opponent modelling through the use of evolutionary computation techniques has led to some notable success in games of imperfect information [4, 6, 7, 10, 11]. Many published successes provide encouragement toward the use of evolutionary computation techniques in general as well as their particular application toward opponent modelling in games of imperfect information. Genetic programming is one form of evolutionary computation introduced by Koza [15], which defines genetic operators that directly manipulate tree structured computer programs. Genetic programming has been used extensively for a myriad of problems [16, 18, 19], including opponent modelling and strategy development in games [8, 14, 17]. In this paper, we examine the use of genetic programming techniques to create Spoof players capable of exploiting weaknesses in different opponent playing styles in order to develop successful strategies for play and compare this with a simple look-up table based approach. The members of our evolving population are program trees each representing a guessing strategy with which to play a particular game. Candidate solutions are subjected to evolutionary pressure, driving the discovery of successful strategies while less successful strategies are discarded. We analyse numerous game situations against opponents of varying playing 2

10 styles. We show that our approach achieves strategical optimality for almost all cases, with near-optimal strategies resulting for others. Our results confirm that specialisation is essential for optimal play. We test our approach against dynamic as well as static game scenarios (i.e. adaptive opponents). The strength of our genetic programming approach compared to the look-up table approach is most evident here. We further investigate the effects of noise on the performance of resulting strategies. We use a direct success evaluation technique in comparison to a more indirect, but intuitive evaluation mechanism. We also experiment with the level of noise we introduce in the evaluation process by varying the number of fitness cases used to evaluate individual strategies. 3

11 CHAPTER 2 Learning in Games A great deal of AI research is conducted around the topic of games. Games are a suitable testbed in which to pursue further artificial intelligence and machine learning because they involve similar problems encountered in real life. The difference being with games is that that they are much simpler and more clearly defined. Games have a finite number of rules and actions for players to make and they have some well understood goal. Successful approaches in games can often be applied to similar real life problems. Games can also be used as a benchmark with which to test new theoretical concepts and how their performance compares with other strategies. 2.1 Imperfect Information Games Games such as Bridge, Poker and Scrabble are games of imperfect information games in which not all the information about the state of the play is known (e.g. hidden opponent cards in Poker). Due to the non-deterministic nature of such games, the task of determining satisfactory strategies is extremely broad and difficult to program in advance. 2.2 Need for Opponent Modelling and Adaption Currently, once a game has been completed a lot of its replay value is afforded due to games multiplayer functionality. Other than the mentality of playing with your friends in a virtual world, a key reason for the lastability of such games is the variation and interactive experience that they offer. Having players capable of learning and countering game play strategies will help create a gaming experience capable of maintaining human players interest for longer. The more knowledge a player has concerning its environment, the better the strategies it shall be able to develop. In multiple player games, this knowledge 4

12 includes information about the other players. Opponent modelling is required in many games in order to maximise winnings against a variety of different opponents, where no general game-playing strategy can compete (e.g. Roshambo and IPD). Apart from this, opponent modelling in games is desirable even in situations where general game-playing strategies are effective. For example, a computer opponent for the game of Pong could be programmed to be perfect ; to always return the ball thus becoming unbeatable. Although a contrived example, this illustrates the case where a perfect player is not desirable and a more adaptive albeit less optimal player may aid in the game s entertainment value. Both related and important is the need for an player to exploit a (possibly implicit) model of its opponents, but also to continously update this model (and thereby its playing strategy). A learning opponent may have learnt how to exploit a certain type of human player but as this player varies their strategy (or a new opponent comes along), it is important to be able to redirect the players evolution toward the new optima that now exist. The ability of evolutionary algorithms to handle such environmental changes makes them a promising option for this sort of learning. 2.3 Previous Approaches Learning in games has been attempted in various different ways, however for brevity only a few will be mentioned here. Decision trees are often used to show the transition of game states given available actions. Often the types of machine learning mechanisms that can be utilised for a game will depend on the branching factor and depth of the decision tree Reinforcement Learning by Look-up Tables Named after animal learning, reinforcement learning involved learning actions based on experience. A reinforced learning agent gains information about its environment by exploring the effect of different actions given particular states. This information can then be exploited by the agent to achieve its goal. When a certain state-action pairing is found to be beneficial, i.e. the agent s goal has been achieved, then the agent remembers this. Usually, a reward value is associated with each state-action pair based on the success or failure of executing it. A simple form of reinforcement learning is through look-up tables that allow an artificial player to learn the pay-off for each action from each given state. The 5

13 look-up table approach is called so due to the process that such a player goes through when exploiting its data; it looks up the current game state, and selects the action with the greatest reward. Consider the deterministic game of tic-tac-toe. If an action leads to a victory from a particular state, then that state-action pair will have its weighting altered so that next time the player encounters this same state, the agent will have learnt which move to make (or not to make). Another approach may be to additionally allocate a positive weighting to every action made during the game because of the win that ultimately resulted. In this example, only the end result of the game is used to alter the table. The learning player learns solely by its ultimate objective winning the game. Occasionally, especially when playing against novice opponents, a learning player may win a game despite having made some bad move. This noise often drives agents further away from the optimal action, which becomes a more prevalant problem in non-deterministic games which normally introduce considerable noise into the evaluation process. For tic-tac-toe, a simple table-based learning mechanism is capable of producing optimal strategies due to the small number of game states that exist. Because look-up tables take into account all possible states and actions, such a strategy could not be utilised for a game such as Chess due to its intractability; it is currently impossible to store every possible game state for the game of Chess Evolutionary Algorithms for Opponent Modelling Evolutionary algorithms (EA) is a term used to describe the different methods in computer science that employ the principle of Darwinian natural selection as an optimisation tool to solve problems using computers. Using a population of candidate solutions and a means of assessing these solutions (the objective function), evolutionary computation techniques search through the space of possible solutions in an attempt to find one that is satisfactory for the problem to be solved. The objective function provides selection pressure which drives individuals towards more optimal solutions for the problem at hand (evolutionary algorithms are explained in detail in Chapter 3). The evolutionary algorithm paradigm has witnessed successes in numerous application areas, including engineering, natural sciences, business and economics [5]. Using the inherent learning capabilities of natural selection, it is possible for learning to take place in noisy and dynamic environments. These techniques seem well-suited to the application of developing game-playing strategies against a wide range of varied, potentially adapting opponents. Indeed, the application of evolutionary algorithms to the task of opponent modelling in games of im- 6

14 perfect information has led to some notable success. For example, Azaria and Sipper have produced a very strong player in human terms for the game of Backgammon purely through playing against itself [4]. Evolutionary approaches have also been applied to Poker by Barone and While [6, 7]. Their approach shows the importance of specialisation and adaptation in order to maximise winnings. Using evolutionary techniques for updating learned models of opponents, their approach has produced an evolving computer poker player capable of out-performing a simple, but competent, static player. Evolutionary approaches have also been applied to the traditional game of the Iterated Prisoner s Dilemma (IPD) [10,13]. The Iterated Prisoner s Dilemma is often used as a model of emergent behaviour between self-interested individuals. Axelrod s [3] work involved evolving game-playing strategies for the IPD. Although some general well-known methods for playing IPD exist (Tit-for-tat and Grim, for example), Axelrod showed that there exists no best strategy for playing the IPD in an evolving population of opponents because their success was dependent on the other strategies in the population. Genetic programming is a specific instance of evolutionary algorithm where computer programs play the role of individuals (genetic programming is explained in detail in Section 3.4). Genetic programming has been used extensively for a myriad of problems [16, 18, 19], including opponent modelling and strategy development in games [8, 14, 17]. These published successes, among others, provide encouragement toward the use of evolutionary computation techniques and their application toward opponent modelling in games of imperfect information. 7

15 CHAPTER 3 Evolutionary Algorithms Charles Darwin s theory of evolution [9] explains how complex organisms evolve from simpler organisms over time due to a process known as natural selection. An individual organism s genetic structure or genotype leads to certain observable characteristics of that individual (its phenotype) which contributes to how well it is able to survive. An individual of a population is affected by other members of the population (e.g., being attacked by predators, competing for food, and mating). Also, an individual is affected by its environment (e.g., the climate, access to fresh water, and the availability of food). The better an individual performs in the conditions imposed by the environment and other members of the population, the greater is its chance to live longer and create offspring (thus passing on its genetic information). The term evolutionary algorithm (EA) [5] refers to a number of computer techniques inspired by natural selection theory in order to converge upon satisfactory solutions within a solution set. EAs are applied to solving combinatorial optimization problems and so, can be viewed as a kind of searching algorithm. A guided random search sifts through the potential solution set for optimal solutions to a problem. EAs are non-biased search algorithms as they do not make any assumptions concerning the fitness landscape. Individuals survive and reproduce based on how well they fair according to some quality criteria, often referred to as the objective function [5, 15]. The objective function evaluates individuals and gives them a fitness score. The fitness measure provides the basis for competition which drives the evolution by guiding survival and reproduction within the population. Those individuals with a better fitness score will have a greater probability of being selected for reproduction. Offspring are generated by means of variation operators analogous to their biological equivalents, such as recombination and/or mutation. The general evolutionary algorithm begins by creating an initial (typically random) population of sample solutions termed generation zero. The entire population is evaluated by the objective function. While the termination criteria 8

16 has not been met, an offspring population P (t) is created by applying genetic operators to members of the current generation selected via the objective function [15]. The offspring population is evaluated and the next generation P (t + 1) is then selected from P (t) and some (possibly empty) subset of P (t). Many texts fail to mention that two rounds of selection typically occur per generation; one to decide which individuals reproduce, and one to decide which individuals are included in the next generation. This process continues until some termination criteria has been met. The genetic operators which bring about variation in offspring often also draw their influence from nature, for example recombination and mutation. Implementations of these operators are heavily dictated by both the problem domain and the chosen representation scheme of individuals. A number of fairly separate approaches to the field of evolutionary algorithms exist today genetic algorithms, evolutionary programming, evolution strategies, and genetic programming. Many variations of each of these approaches have been derived, with the major differences being the individuals representation, the design and application of genetic operators, and the method of selection. The remainder of this chapter will cover the basics of these approaches as well as a detailed discussion of genetic programming. 3.1 Genetic Algorithms Genetic algorithms [5] (GA) maintain a population of abstract representations of candidate solutions (called chromosomes). Generally GAs have their chromosomes as fixed length binary strings, however variable length strings and other representations are possible. Due to the fact that chromosomes are representations for individual candidate solutions, often the terms are used interchangeably. Recombination is normally considered as the driving force of the evolution process in GAs. The most common types of recombination are one-point crossover, two-point crossover, and uniform crossover. All of these forms of recombination involve 2 parents, however uniform crossover only produces 1 offspring whereas one-point and two-point crossover both produce 2. One-point crossover randomly determines a crossover point at which to split the two parents and recombine their resultant substrings to form 2 children. For example, for a crossover point chosen after the 4th bit, the two parents and will produce children and Two-point crossover works in the same way except 2 crossover points are selected so that the same parents in the previous example, given crossover points of 1 and 4, will produce the children and 9

17 The final type of crossover we will discuss is uniform crossover. In uniform crossover a new offspring is built one bit at a time. Each bit is stochastically selected from either of its 2 parents in the corresponding position. Assuming a fixed-length binary string representation, the mutation operator usually allows a probability for each bit in an individual s representation to be flipped (i.e. from 0 to 1, or from 1 to 0). Mutation is a necessary requirement to maintain diversity throughout the population, however it is usually not the driving force for change. A typical probability for mutation would be about 1/n where n is the string length (i.e. such that on average one bit gets flipped). Consider the situation where the globally optimal solution is the binary string Now what if every single member of our population has a 0 in its first position; it will be impossible to achieve the global optimum via crossover alone. 3.2 Evolutionary Programming Evolutionary programming (EP) [5] works by observing the world and evolving Finite State Machines (FSMs) able to form predictions based on those observations. A FSM or finite automaton is an abstract machine that has memory of which state it is in. Given an input, a FSM can change its state and/or return output. The FSM consists of a finite set of states and rules governing the transition between these states. Consider an environment where a sequence of integers are classified as being a square number; either false (0), or true (1). Thus, the binary sequence , describes the location of square numbers for the integers 1, 2, 3, 4, 5, 6, 7, 8, 9..., respectively. So, the aim is to produce a FSM that will correctly predict the next symbol in the sequence given a sequence of known symbols. E.g. given the sequence 110, a correct FSM would return 1 as its next output. The objective function for such could be a fitness of 1 for correct and 0 for incorrect. Usually, mutation is the only variation operator that is used in Evolutionary Programming. Each generation, each individual chosen for reproduction is mutated to create an offspring. A number of possible mutations may be applied at this stage, these include: adding or removing a state, changing a state s output symbol, changing a transition, or changing the starting state. Once the offspring have been produced, they are evaluated and some selection scheme dictates which individuals will make up the new generation. 10

18 3.3 Evolution Strategies Evolutionary strategies (ES) were initially devised to solve engineering design problems. The representation for individuals is typically a fixed-length realvalued vector, although variable length approaches exist [5]. Evolution strategies commonly uses Gaussian mutation as the primary genetic operator for evolution. Gaussian mutation generates an offspring from a single individual by adding a random value from a Gaussian distribution to each element of the individuals vector. Another operator often used in ESs is intermediate recombination. This involves 2 or more parents produce 1 new offspring created by taking the parents mean value of each vector element. Where ES differ from other methods is that the genetic operators act upon the phenotype directly. The real-valued vector representation of candidate solutions allows for a less rigid mutation and interpolation between individuals. 3.4 Genetic Programming Genetic programming (GP) [5,15] is an evolutionary algorithm approach to solving combinatorial optimization problems. The population of individuals undergoing evolution in this algorithm are themselves computer programs Representation The usual representation scheme of an individual is a tree structure called a LISP expression which is comprised of functions and terminals. LISP expressions can be used to represent complex program trees that can be made to handle multiple types, conditional statements, and iteration. Consider a simple integer arithmetic program (3 + 6)/2. Here the function nodes being used are addition (+) and division (/), both of which accept two arguments as input. The terminal nodes of this program tree are 3, 6 and 2; terminals, by definition, have no arguments. The root of the tree is / with both its arguments (3 + 6) and 2 branching from it. The left argument of the root function is itself a function (+) with terminal arguments being 3 and 6. Koza s closure requirement states that the input of all functions should be able to handle all terminals and the outputs of all functions [15]. The reason for this is covered in Section

19 3.4.2 Population Initialisation Koza describes three ways in which the random population is to be initialised prior to commencing evolution: full, grow, and ramped-half-and-half. Each of these methods are typically controlled such that no duplicate individuals are created in the starting generation. The full method creates a random population of individuals, each being of the same predetermined depth. Starting from the root node, a random function is chosen and until the maximum depth has been reached this process continues recursively for each of the branches of that function (i.e. its arguments). Upon reaching the maximum depth, random terminals are chosen rather than functions. The grow method creates a population of randomly composed individuals up to a specified depth. Starting from the root node, a node is randomly selected from all available functions and terminals. If it is a function, then this process is recursively continued for each of the function s branches up to the specified depth. If the maximum depth has been reached, then a random terminal is selected for that node. If the node becomes a terminal, then that branch will finish (possibly short of the maximum depth) and no further actions are required. This method provides a range of structures throughout the population up to the specified depth. The ramped-half-and-half method specifies a maximum depth, and the population is divided up equally into as many sections. Each depth level produces half the individuals using the grow method, and half the individuals using the full method. This generates a population with a diverse range of randomly sized and randomly structured individuals Fitness In order to drive the population towards optima, we require a way of comparing an individual s strength (or fitness) with respect to other individuals in the population. The raw fitness is the unaltered measure of how good (or bad) an individual fares with respect to the objective function. For example, if our programs describe a strategy for game playing then the number of games won could be a fitness function, in which case higher values are desirable. Another option would be to measure how far an individual deviates from some (perhaps unobtainable) ideal, in which case lower values are desirable. Koza [15] discusses a number of adjustments to be made to fitness values however for brevity these will not be detailed. 12

20 3.4.4 Selection Schemes Koza [15] uses a number of selection schemes to decide which individuals will reproduce as well as which individuals will be included in the next generation. The most common methods are: fitness proportionate selection, greedy overselection, and tournament selection. Selection methods that are applied to all members of the previous population plus all new offspring are known as elitist selection schemes. When only new offspring are considered by the selection scheme for the new generation it is called a generational selection scheme. Fitness proportionate selection is where fitter individuals have a greater probability of being selected, but their selection is not guaranteed. Greedy overselection involves skewing selection towards elite members of the population in hope of lowering the number of generations required for the algorithm to terminate. In tournament selection, a number of individuals are randomly chosen from a population to be included in a tournament. The fittest individual amongst those who entered the tournament is regarded as the winner and is then selected Parsimony Often when evolving solutions it is desirable to not only have a correct solution, but a parsimonious one as well. For example, consider the case where we wish to evolve an expression returning the value 1. One solution may simply be to return the constant 1. Another functionally perfect solution, although lacking parsimony, could be Having a fitness based on the external behaviour alone (i.e. based on phenotypic traits) would not provide any guidance for parsimony to evolve. A common approach to encourage parsimonious design is to have a less influential component to the objective function that rewards shorter solutions. For example, assume that raw fitness is measured as the deviation from the correct solution 1, so a fitness of 0 indicates a functionally correct solution. We may also include the length of our solution (indicated by the number of terminals and functions used) to be considered when evaluating an individual. Parsimony is of secondary importance and so we add to the raw fitness a fraction of the solution length (say, one hundredth), so that functionality will not be compromised in favor of simplicity, but selection will favour simpler functionally equivalent solutions. 13

21 3.4.6 Genetic Operators A number of genetic operators are used in genetic programming to enable program trees to evolve. Recall that the representation for an individual in Genetic Programming is a tree structure and also that the closure requirement ensures that all functions are able to handle as input any terminal and the result of any function. Although closure is not necessary for a GP to be successful, many of the operators (as they are described in this section) assume closure as a prerequisite. The driving force of evolution for genetic programming is commonly provided through asexual reproduction and cross-over a form of recombination analogous to sexual reproduction in organisms. Asexual reproduction is simply allowing an individual to pass completely unchanged into the next generation. Cross-over requires 2 parent individuals to combine their genotypes resulting in the creation of 2 new children individuals. After selecting 2 parents to take part in cross-over, the first step in producing children is to produce copies of each parent. The cross-over operator then selects a random sub-tree from each copy and swaps them with eachother, resulting in 2 new children based entirely from the 2 parents involved. A secondary genetic operator to Genetic Programming is mutation and, although other approaches exist, its purpose is primarily considered as introducing variation and diversity within the population, rather than driving it towards optimality. Mutation involves 1 parent producing 1 offspring, one usual method of mutation in genetic programming is as follows. A copy of the parent is made and a single node of the copied program tree is selected at random. This node is then replaced by a randomly generated tree. Another form of mutation, which does not alter the structure of the program, is to randomly select a node for replacement. If this node is a terminal, then it is replaced with some other randomly selected terminal; if a function node is chosen, it is replaced by some other randomly selected function (with the same number of arguments). As with cross-over, it is common for there to be restrictions concerning which nodes can be replaced and how large the generated sub-tree can be. Other operators less commonly used are editing, encapsulation, permutation and decimation. Typically when these operators are implemented they are applied less frequently than cross-over and mutation (i.e. not every generation). Earlier works in Genetic Programming have largely ignored such operators but current research is giving them more consideration. 14

22 3.4.7 GP System Parameters Once the terminal and function sets have been decided there still remain a number of parameters that must be decided upon before running a GP system. These decisions are very important as they greatly affect the quality of the resulting solution as well as the time taken to achieve that result. Unfortunately, there are no hard and fast rules in determining these required parameters. The population size must be decided upon. A larger population allows greater exploration per generation and increases the chance of evolving a solution, but having it too big is wasteful and will slow down the GP system. Generally speaking, the greater the complexity of the problem at hand, the greater the size of the population required to solve it. Although a fixed population is usually used there have been experiments conducted involving a changing population size; for example, an initially very large population size that drops after a number of generations. Termination criteria must be selected. Evolutionary algorithms do not have defined end points, the GP system must have some way of knowing when it should stop. A common choice is run the GP system until some satisfactory level of fitness has been achieved. Another would be to stop when it appears that the population has stopped improving. Often the algorithm is simply run for a defined number of generations. This is especially the case for research applications or when the user is performing trial runs to determine more suitable parameters! Assuming that a selection scheme and the genetic operators have been decided, the probabilities with which the genetic operators are applied must be decided. What will be the probability of cross-over,..of asexual reproduction,..of mutation? The problem of deciding many of the variables in genetic programming is no simple task. In fact, some have suggested using a second evolutionary algorithm to optimise the application of a first a concept known as meta-evolutionary optimisation [12]. 15

23 CHAPTER 4 The Game of Spoof Spoof is a multi-player game of imperfect information. This seemingly simple game has an extremely broad scope for potential strategy development. Minimising the information made available to opponents, bluffing, probability analysis, and opponent modelling are all elements which can be used to formulate playing strategies. 4.1 Rules of Spoof Spoof is played by two or more players. The game begins with players each selecting a number of tokens (typically coins) from 0 to 3 (called the player s selection), which remain hidden from all other players. In turn, each player attempts to guess the total number of coins held by all players (called the player s guess) with the constraint that no player may repeat a previous player s total, nor may a player guess a negative amount nor a value greater than 3 times the number of players. The winner of the game is the player who correctly guesses the total number of coins. The initial guessing order is generally determined by randomly selecting a player to guess first and work clockwise from that player for the remaining players. In the event that no player guesses the correct total, the game is deemed a draw and is typically repeated. The repeated game is usually altered such that the guessing order is shifted in some preconceived direction however for our experiments the game is repeated with the original guessing order unchanged. This simplified game of Spoof as we are considering for our analysis ensures that our learning players need only develop one particular guessing strategy at a time, and that this strategy need not deal with subsequent rounds of play. 16

24 4.2 Spoof Strategy At first thought, it may seem that the game is purely random and that little can be done other than to guess the maximum of the probability distribution of possible totals. However, as players announce their guesses, they may well be providing information about the number of coins they have selected. Also recall that guesses may not be repeated and that the position in the game a player is forced to act (announce a guess) induces a trade-off between what information is available and opportunity to guess a total. Guessing first means all possible totals are available to be guessed, but no information about the opponents selection is available from this game. Guessing last provides maximal information about the selections of the other players (and, assuming rational play, may well mean the total can be determined with a high degree of certainty), but the correct total is likely to have already been announced by another player. A clear trade-off arises acting first provides minimal information, but maximal opportunity; acting last provides maximal information, but minimal opportunity to guess the correct total. Consider a two player game where the first player guesses a total of 5. Assuming rational play, this player must have selected either 2 or 3 coins, otherwise a total of 5 would be impossible. The second player can now use this information in making their guess, and should announce a total of 2 or 3 plus their own selection. Using this approach, the player improves their chances of immediately winning the game (without replay) from 25% (with no information about the first player s selection) to 50% (with knowledge that the first player s selection is one of two possibilities). Similar analysis is possible for other game states in Spoof [1], but the analysis becomes increasingly complex as the number of players rises. Opponent modelling in the game of Spoof is crucial for optimal performance. For example, consider the problem of acting first in three player Spoof. A general strategy for acting in this position is to guess the number of coins one is holding plus 3 (as 3 is probabilistically the most likely outcome for the total of the remaining players coins). However, this strategy is only sound if both opponents choose their hidden coins uniformly randomly. Consider instead, if both opponents tend never to hold more than one coin. The previous strategy now performs poorly, and a better opponent-specific strategy should be used instead (a better strategy will be to guess 1 more than one s own selection). Indeed, experience shows that human players often do not select their coins randomly (preferring certain coin choices or patterns over others), and more typically, provide information about their selection in the way they guess. Our experience has shown that it is especially the case that human players use the same guessing algorithm time and time again. 17

25 It is also possible to play the game of Spoof with the specific aim of minimising the information we provide to our opponents. Consider a simple two player game in which we are the first guessing player. If we guess 3, then this will provide no information about our selection to the opponent. The reason for this is that, no matter what our selection may be, a total of 3 is always possible. This idea of giving up minimal information can easily be combined with opponent modelling strategies. For example, if we have learnt that our opponent usually holds either 2 or 3 coins, then one strategy would be to select either 0 or 1 coin and guess a total of 3 as before. 18

26 CHAPTER 5 Building Adaptive Spoof Players We use genetic programming to build models of opponents strategies in order to create a strong artificial Spoof player. Our work does not follow a traditional opponent modelling approach where a direct model of the opponent s strategy is built from experience and then analysed for weaknesses. Instead, we use a more indirect approach where evolution will implicitly build the model by evolving the best countering strategy over time (i.e. a model of the game environment, including all opponents therein). The aim, however, reminas the same to exploit weaknesses in an individual s strategy in order to maximise performance of our automated player. We experiment with a table-based approach so that we may compare both the learning performance and playing ability of our resulting Spoof players. 5.1 The Learning Environment Opponent specific information for the game of Spoof can be exploited in two ways. Information made available during a game (i.e. previous players guesses) can be used to determine which guess to make for that game. Information made available after a game is over (i.e. all players selection and guesses) can be used to determine both the selection and the guess to make for later games versus the same opponent(s). Information learnt can be applied across games because players are not anonymous, we can learn how particular opponents play and hope to take advantage of this in future games. Our adaptive players learn by observation alone; that is, they learn an implicit model of the opponents by formulating which guess to make. For all experiments, we do not allow our adaptive players to choose a coin selection, this has been set as uniformly random. One reason this was done was to ensure that our evolved players would be inherently less predictable than if they were to select coins non-randomly (note that predictability would only be of concern against adaptive opponentsnot be an issue). Also, learning opponent strategies for the 19

27 game of Spoof potentially involves exploring the game s set space. By fixing a random selection function (in conjunction with the pseudo-success measure to be described later) there is no need to have separate exploration and exploitation strategies. This is because all the required knowledge can be made available regardless of what guess was made (i.e. the correct total will always be revealed to players upon the games conclusion) Genetic Program Players Learning We employ a generational evolution system which means that each generation is entirely made up of the offspring of the previous one, that is, no members of one generation will have passage into the next; this allows the fittest individual to get worse from one generation to the next. While not strictly elitist, the approach we have used is to keep an external copy (or clone, in keeping with biological terminology) of the best individual seen thus far (which is returned as the solution for a run). This allows us the benefit of utilising the best seen individual, without allowing it to potentially dominate the search for other solutions. For coin selection, we force our player to always choose randomly. This of course may be a poor choice (being able to skew the probability distribution of totals may well be advantageous), but this approach prevents our player from being predictable. This also simplifies the problem to be solved, allowing evolutionary intelligence to focus upon learning guessing strategies to exploit opponents and maximise performance. For guess determination, we use the genetic programming paradigm to evolve an algorithm to make the guess. We use a population of candidate genetic programs that are evaluated to determine how well they play the game. Over time, evolutionary selection pressure drives the population towards good solutions. We use version 2b of GPsys [20]. Each candidate solution in the population consists of a program tree that determines the guess for the player. Program trees are mixed-type, using float and boolean types, with the root node constrained to evaluate to a float. This float value is cast down to an integer, forming the guess made by the player. When this integer value is invalid (the guess may have already been made by an earlier player), we automatically adjust the guess to the next closest valid integer, by checking above and below the desired value by an incrementally increasing amount (one more is tried before one less). This allows for less complex program trees, as they need not be burdened with the additional task of ensuring unique guesses. To enable our evolving player to make an informed guess, we equip the genetic 20

28 Table 5.1: Game specific terminals used for three player spoof, guessing 3 rd Variable Explanation p1guess The first player s publicly announced guess. p2guess The second player s publicly announced guess. CoinsHeld The number of coins selected by the player. NumPlayers The number of players in the game. For the majority of experiments in this study, this terminal is constant (3). programming system with a number of game specific terminals that can be used in a candidate solution s program tree. We include a number of game-specific terminals to be used in constructing our individual strategies: the number of players in the game, the number of coins that the player has selected for holding, and the announced guesses of each of the players that guess prior to this player. The terminals for a three player game of spoof when guessing 3 rd is detailed in Table 5.1. We also allowed the genetic programming system use of four numerical constants, 0, 1, 2, and 3 to represent the four possible coin-held values, standard arithmetic operators (addition, subtraction, multiplication, and division), standard comparison operators (greater-than, less-than, and equal-to), and boolean operator nodes (negation, conjunction, and disjunction). Also, a conditional selection mechanism (the if function) is included to select between sub-programs. Note that the if function expects three arguments, the first a boolean condition, and the second and third two sub-programs (the second parameter sub-program is evaluated if the first parameter evaluates to true, otherwise the third parameter sub-program in evaluated). Both parameter sub-programs must evaluate to floats. It should be noted that our approach restricts genetic operators, rather than adhering to Koza s closure requirement [15]; for example, only compatible subtrees are considered for cross-over. In all experiments, the depth of the candidate program trees were limited to 10 and the initial population was created through the ramped-half-and-half initialisation method for all depths 1 through to 10. A steady-state population of size 50 is used throughout the evolution. Evolution is limited to a span of 5000 generations. 21

Creating a Dominion AI Using Genetic Algorithms

Creating a Dominion AI Using Genetic Algorithms Creating a Dominion AI Using Genetic Algorithms Abstract Mok Ming Foong Dominion is a deck-building card game. It allows for complex strategies, has an aspect of randomness in card drawing, and no obvious

More information

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Scott Watson, Andrew Vardy, Wolfgang Banzhaf Department of Computer Science Memorial University of Newfoundland St John s.

More information

CS510 \ Lecture Ariel Stolerman

CS510 \ Lecture Ariel Stolerman CS510 \ Lecture04 2012-10-15 1 Ariel Stolerman Administration Assignment 2: just a programming assignment. Midterm: posted by next week (5), will cover: o Lectures o Readings A midterm review sheet will

More information

CPS331 Lecture: Genetic Algorithms last revised October 28, 2016

CPS331 Lecture: Genetic Algorithms last revised October 28, 2016 CPS331 Lecture: Genetic Algorithms last revised October 28, 2016 Objectives: 1. To explain the basic ideas of GA/GP: evolution of a population; fitness, crossover, mutation Materials: 1. Genetic NIM learner

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence CS482, CS682, MW 1 2:15, SEM 201, MS 227 Prerequisites: 302, 365 Instructor: Sushil Louis, sushil@cse.unr.edu, http://www.cse.unr.edu/~sushil Non-classical search - Path does not

More information

A Note on General Adaptation in Populations of Painting Robots

A Note on General Adaptation in Populations of Painting Robots A Note on General Adaptation in Populations of Painting Robots Dan Ashlock Mathematics Department Iowa State University, Ames, Iowa 511 danwell@iastate.edu Elizabeth Blankenship Computer Science Department

More information

Submitted November 19, 1989 to 2nd Conference Economics and Artificial Intelligence, July 2-6, 1990, Paris

Submitted November 19, 1989 to 2nd Conference Economics and Artificial Intelligence, July 2-6, 1990, Paris 1 Submitted November 19, 1989 to 2nd Conference Economics and Artificial Intelligence, July 2-6, 1990, Paris DISCOVERING AN ECONOMETRIC MODEL BY. GENETIC BREEDING OF A POPULATION OF MATHEMATICAL FUNCTIONS

More information

Artificial Intelligence. Minimax and alpha-beta pruning

Artificial Intelligence. Minimax and alpha-beta pruning Artificial Intelligence Minimax and alpha-beta pruning In which we examine the problems that arise when we try to plan ahead to get the best result in a world that includes a hostile agent (other agent

More information

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask Set 4: Game-Playing ICS 271 Fall 2017 Kalev Kask Overview Computer programs that play 2-player games game-playing as search with the complication of an opponent General principles of game-playing and search

More information

Game-Playing & Adversarial Search

Game-Playing & Adversarial Search Game-Playing & Adversarial Search This lecture topic: Game-Playing & Adversarial Search (two lectures) Chapter 5.1-5.5 Next lecture topic: Constraint Satisfaction Problems (two lectures) Chapter 6.1-6.4,

More information

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi Learning to Play like an Othello Master CS 229 Project Report December 13, 213 1 Abstract This project aims to train a machine to strategically play the game of Othello using machine learning. Prior to

More information

Meta-Heuristic Approach for Supporting Design-for- Disassembly towards Efficient Material Utilization

Meta-Heuristic Approach for Supporting Design-for- Disassembly towards Efficient Material Utilization Meta-Heuristic Approach for Supporting Design-for- Disassembly towards Efficient Material Utilization Yoshiaki Shimizu *, Kyohei Tsuji and Masayuki Nomura Production Systems Engineering Toyohashi University

More information

BIEB 143 Spring 2018 Weeks 8-10 Game Theory Lab

BIEB 143 Spring 2018 Weeks 8-10 Game Theory Lab BIEB 143 Spring 2018 Weeks 8-10 Game Theory Lab Please read and follow this handout. Read a section or paragraph completely before proceeding to writing code. It is important that you understand exactly

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence CS482, CS682, MW 1 2:15, SEM 201, MS 227 Prerequisites: 302, 365 Instructor: Sushil Louis, sushil@cse.unr.edu, http://www.cse.unr.edu/~sushil Games and game trees Multi-agent systems

More information

GENETIC PROGRAMMING. In artificial intelligence, genetic programming (GP) is an evolutionary algorithmbased

GENETIC PROGRAMMING. In artificial intelligence, genetic programming (GP) is an evolutionary algorithmbased GENETIC PROGRAMMING Definition In artificial intelligence, genetic programming (GP) is an evolutionary algorithmbased methodology inspired by biological evolution to find computer programs that perform

More information

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game?

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game? CSC384: Introduction to Artificial Intelligence Generalizing Search Problem Game Tree Search Chapter 5.1, 5.2, 5.3, 5.6 cover some of the material we cover here. Section 5.6 has an interesting overview

More information

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment BLUFF WITH AI CS297 Report Presented to Dr. Chris Pollett Department of Computer Science San Jose State University In Partial Fulfillment Of the Requirements for the Class CS 297 By Tina Philip May 2017

More information

Evolution of Sensor Suites for Complex Environments

Evolution of Sensor Suites for Complex Environments Evolution of Sensor Suites for Complex Environments Annie S. Wu, Ayse S. Yilmaz, and John C. Sciortino, Jr. Abstract We present a genetic algorithm (GA) based decision tool for the design and configuration

More information

Machine Learning in Iterated Prisoner s Dilemma using Evolutionary Algorithms

Machine Learning in Iterated Prisoner s Dilemma using Evolutionary Algorithms ITERATED PRISONER S DILEMMA 1 Machine Learning in Iterated Prisoner s Dilemma using Evolutionary Algorithms Department of Computer Science and Engineering. ITERATED PRISONER S DILEMMA 2 OUTLINE: 1. Description

More information

CHAPTER 3 HARMONIC ELIMINATION SOLUTION USING GENETIC ALGORITHM

CHAPTER 3 HARMONIC ELIMINATION SOLUTION USING GENETIC ALGORITHM 61 CHAPTER 3 HARMONIC ELIMINATION SOLUTION USING GENETIC ALGORITHM 3.1 INTRODUCTION Recent advances in computation, and the search for better results for complex optimization problems, have stimulated

More information

Exploitability and Game Theory Optimal Play in Poker

Exploitability and Game Theory Optimal Play in Poker Boletín de Matemáticas 0(0) 1 11 (2018) 1 Exploitability and Game Theory Optimal Play in Poker Jen (Jingyu) Li 1,a Abstract. When first learning to play poker, players are told to avoid betting outside

More information

Game Theory and Randomized Algorithms

Game Theory and Randomized Algorithms Game Theory and Randomized Algorithms Guy Aridor Game theory is a set of tools that allow us to understand how decisionmakers interact with each other. It has practical applications in economics, international

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

Game Playing. Philipp Koehn. 29 September 2015

Game Playing. Philipp Koehn. 29 September 2015 Game Playing Philipp Koehn 29 September 2015 Outline 1 Games Perfect play minimax decisions α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information 2 games

More information

AI Approaches to Ultimate Tic-Tac-Toe

AI Approaches to Ultimate Tic-Tac-Toe AI Approaches to Ultimate Tic-Tac-Toe Eytan Lifshitz CS Department Hebrew University of Jerusalem, Israel David Tsurel CS Department Hebrew University of Jerusalem, Israel I. INTRODUCTION This report is

More information

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I Adversarial Search and Game- Playing C H A P T E R 6 C M P T 3 1 0 : S P R I N G 2 0 1 1 H A S S A N K H O S R A V I Adversarial Search Examine the problems that arise when we try to plan ahead in a world

More information

Contents. MA 327/ECO 327 Introduction to Game Theory Fall 2017 Notes. 1 Wednesday, August Friday, August Monday, August 28 6

Contents. MA 327/ECO 327 Introduction to Game Theory Fall 2017 Notes. 1 Wednesday, August Friday, August Monday, August 28 6 MA 327/ECO 327 Introduction to Game Theory Fall 2017 Notes Contents 1 Wednesday, August 23 4 2 Friday, August 25 5 3 Monday, August 28 6 4 Wednesday, August 30 8 5 Friday, September 1 9 6 Wednesday, September

More information

An evaluation of how Dynamic Programming and Game Theory are applied to Liar s Dice

An evaluation of how Dynamic Programming and Game Theory are applied to Liar s Dice An evaluation of how Dynamic Programming and Game Theory are applied to Liar s Dice Submitted in partial fulfilment of the requirements of the degree Bachelor of Science Honours in Computer Science at

More information

LANDSCAPE SMOOTHING OF NUMERICAL PERMUTATION SPACES IN GENETIC ALGORITHMS

LANDSCAPE SMOOTHING OF NUMERICAL PERMUTATION SPACES IN GENETIC ALGORITHMS LANDSCAPE SMOOTHING OF NUMERICAL PERMUTATION SPACES IN GENETIC ALGORITHMS ABSTRACT The recent popularity of genetic algorithms (GA s) and their application to a wide range of problems is a result of their

More information

Automating a Solution for Optimum PTP Deployment

Automating a Solution for Optimum PTP Deployment Automating a Solution for Optimum PTP Deployment ITSF 2015 David O Connor Bridge Worx in Sync Sync Architect V4: Sync planning & diagnostic tool. Evaluates physical layer synchronisation distribution by

More information

CS 771 Artificial Intelligence. Adversarial Search

CS 771 Artificial Intelligence. Adversarial Search CS 771 Artificial Intelligence Adversarial Search Typical assumptions Two agents whose actions alternate Utility values for each agent are the opposite of the other This creates the adversarial situation

More information

Ar#ficial)Intelligence!!

Ar#ficial)Intelligence!! Introduc*on! Ar#ficial)Intelligence!! Roman Barták Department of Theoretical Computer Science and Mathematical Logic So far we assumed a single-agent environment, but what if there are more agents and

More information

Creating a Poker Playing Program Using Evolutionary Computation

Creating a Poker Playing Program Using Evolutionary Computation Creating a Poker Playing Program Using Evolutionary Computation Simon Olsen and Rob LeGrand, Ph.D. Abstract Artificial intelligence is a rapidly expanding technology. We are surrounded by technology that

More information

Chapter 3 Learning in Two-Player Matrix Games

Chapter 3 Learning in Two-Player Matrix Games Chapter 3 Learning in Two-Player Matrix Games 3.1 Matrix Games In this chapter, we will examine the two-player stage game or the matrix game problem. Now, we have two players each learning how to play

More information

Genetic Algorithms in MATLAB A Selection of Classic Repeated Games from Chicken to the Battle of the Sexes

Genetic Algorithms in MATLAB A Selection of Classic Repeated Games from Chicken to the Battle of the Sexes ECON 7 Final Project Monica Mow (V7698) B Genetic Algorithms in MATLAB A Selection of Classic Repeated Games from Chicken to the Battle of the Sexes Introduction In this project, I apply genetic algorithms

More information

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game Outline Game Playing ECE457 Applied Artificial Intelligence Fall 2007 Lecture #5 Types of games Playing a perfect game Minimax search Alpha-beta pruning Playing an imperfect game Real-time Imperfect information

More information

Lecture Notes on Game Theory (QTM)

Lecture Notes on Game Theory (QTM) Theory of games: Introduction and basic terminology, pure strategy games (including identification of saddle point and value of the game), Principle of dominance, mixed strategy games (only arithmetic

More information

Genetic Programming Approach to Benelearn 99: II

Genetic Programming Approach to Benelearn 99: II Genetic Programming Approach to Benelearn 99: II W.B. Langdon 1 Centrum voor Wiskunde en Informatica, Kruislaan 413, NL-1098 SJ, Amsterdam bill@cwi.nl http://www.cwi.nl/ bill Tel: +31 20 592 4093, Fax:

More information

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Davis Ancona and Jake Weiner Abstract In this report, we examine the plausibility of implementing a NEAT-based solution

More information

Creating a New Angry Birds Competition Track

Creating a New Angry Birds Competition Track Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference Creating a New Angry Birds Competition Track Rohan Verma, Xiaoyu Ge, Jochen Renz Research School

More information

Mehrdad Amirghasemi a* Reza Zamani a

Mehrdad Amirghasemi a* Reza Zamani a The roles of evolutionary computation, fitness landscape, constructive methods and local searches in the development of adaptive systems for infrastructure planning Mehrdad Amirghasemi a* Reza Zamani a

More information

Using Fictitious Play to Find Pseudo-Optimal Solutions for Full-Scale Poker

Using Fictitious Play to Find Pseudo-Optimal Solutions for Full-Scale Poker Using Fictitious Play to Find Pseudo-Optimal Solutions for Full-Scale Poker William Dudziak Department of Computer Science, University of Akron Akron, Ohio 44325-4003 Abstract A pseudo-optimal solution

More information

Advanced Microeconomics: Game Theory

Advanced Microeconomics: Game Theory Advanced Microeconomics: Game Theory P. v. Mouche Wageningen University 2018 Outline 1 Motivation 2 Games in strategic form 3 Games in extensive form What is game theory? Traditional game theory deals

More information

2048: An Autonomous Solver

2048: An Autonomous Solver 2048: An Autonomous Solver Final Project in Introduction to Artificial Intelligence ABSTRACT. Our goal in this project was to create an automatic solver for the wellknown game 2048 and to analyze how different

More information

Genetic Programming of Autonomous Agents. Senior Project Proposal. Scott O'Dell. Advisors: Dr. Joel Schipper and Dr. Arnold Patton

Genetic Programming of Autonomous Agents. Senior Project Proposal. Scott O'Dell. Advisors: Dr. Joel Schipper and Dr. Arnold Patton Genetic Programming of Autonomous Agents Senior Project Proposal Scott O'Dell Advisors: Dr. Joel Schipper and Dr. Arnold Patton December 9, 2010 GPAA 1 Introduction to Genetic Programming Genetic programming

More information

Smart Grid Reconfiguration Using Genetic Algorithm and NSGA-II

Smart Grid Reconfiguration Using Genetic Algorithm and NSGA-II Smart Grid Reconfiguration Using Genetic Algorithm and NSGA-II 1 * Sangeeta Jagdish Gurjar, 2 Urvish Mewada, 3 * Parita Vinodbhai Desai 1 Department of Electrical Engineering, AIT, Gujarat Technical University,

More information

Biologically Inspired Embodied Evolution of Survival

Biologically Inspired Embodied Evolution of Survival Biologically Inspired Embodied Evolution of Survival Stefan Elfwing 1,2 Eiji Uchibe 2 Kenji Doya 2 Henrik I. Christensen 1 1 Centre for Autonomous Systems, Numerical Analysis and Computer Science, Royal

More information

An intelligent Othello player combining machine learning and game specific heuristics

An intelligent Othello player combining machine learning and game specific heuristics Louisiana State University LSU Digital Commons LSU Master's Theses Graduate School 2011 An intelligent Othello player combining machine learning and game specific heuristics Kevin Anthony Cherry Louisiana

More information

A Review on Genetic Algorithm and Its Applications

A Review on Genetic Algorithm and Its Applications 2017 IJSRST Volume 3 Issue 8 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology A Review on Genetic Algorithm and Its Applications Anju Bala Research Scholar, Department

More information

An Adaptive Learning Model for Simplified Poker Using Evolutionary Algorithms

An Adaptive Learning Model for Simplified Poker Using Evolutionary Algorithms An Adaptive Learning Model for Simplified Poker Using Evolutionary Algorithms Luigi Barone Department of Computer Science, The University of Western Australia, Western Australia, 697 luigi@cs.uwa.edu.au

More information

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search COMP19: Artificial Intelligence COMP19: Artificial Intelligence Dr. Annabel Latham Room.05 Ashton Building Department of Computer Science University of Liverpool Lecture 1: Game Playing 1 Overview Last

More information

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms Felix Arnold, Bryan Horvat, Albert Sacks Department of Computer Science Georgia Institute of Technology Atlanta, GA 30318 farnold3@gatech.edu

More information

Programming Project 1: Pacman (Due )

Programming Project 1: Pacman (Due ) Programming Project 1: Pacman (Due 8.2.18) Registration to the exams 521495A: Artificial Intelligence Adversarial Search (Min-Max) Lectured by Abdenour Hadid Adjunct Professor, CMVS, University of Oulu

More information

Free Cell Solver. Copyright 2001 Kevin Atkinson Shari Holstege December 11, 2001

Free Cell Solver. Copyright 2001 Kevin Atkinson Shari Holstege December 11, 2001 Free Cell Solver Copyright 2001 Kevin Atkinson Shari Holstege December 11, 2001 Abstract We created an agent that plays the Free Cell version of Solitaire by searching through the space of possible sequences

More information

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements CS 171 Introduction to AI Lecture 1 Adversarial search Milos Hauskrecht milos@cs.pitt.edu 39 Sennott Square Announcements Homework assignment is out Programming and experiments Simulated annealing + Genetic

More information

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur Module 3 Problem Solving using Search- (Two agent) 3.1 Instructional Objective The students should understand the formulation of multi-agent search and in detail two-agent search. Students should b familiar

More information

Cooperative Behavior Acquisition in A Multiple Mobile Robot Environment by Co-evolution

Cooperative Behavior Acquisition in A Multiple Mobile Robot Environment by Co-evolution Cooperative Behavior Acquisition in A Multiple Mobile Robot Environment by Co-evolution Eiji Uchibe, Masateru Nakamura, Minoru Asada Dept. of Adaptive Machine Systems, Graduate School of Eng., Osaka University,

More information

2 person perfect information

2 person perfect information Why Study Games? Games offer: Intellectual Engagement Abstraction Representability Performance Measure Not all games are suitable for AI research. We will restrict ourselves to 2 person perfect information

More information

Fault Location Using Sparse Wide Area Measurements

Fault Location Using Sparse Wide Area Measurements 319 Study Committee B5 Colloquium October 19-24, 2009 Jeju Island, Korea Fault Location Using Sparse Wide Area Measurements KEZUNOVIC, M., DUTTA, P. (Texas A & M University, USA) Summary Transmission line

More information

ADVERSARIAL SEARCH. Chapter 5

ADVERSARIAL SEARCH. Chapter 5 ADVERSARIAL SEARCH Chapter 5... every game of skill is susceptible of being played by an automaton. from Charles Babbage, The Life of a Philosopher, 1832. Outline Games Perfect play minimax decisions α

More information

Introduction to Genetic Algorithms

Introduction to Genetic Algorithms Introduction to Genetic Algorithms Peter G. Anderson, Computer Science Department Rochester Institute of Technology, Rochester, New York anderson@cs.rit.edu http://www.cs.rit.edu/ February 2004 pg. 1 Abstract

More information

A Genetic Algorithm for Solving Beehive Hidato Puzzles

A Genetic Algorithm for Solving Beehive Hidato Puzzles A Genetic Algorithm for Solving Beehive Hidato Puzzles Matheus Müller Pereira da Silva and Camila Silva de Magalhães Universidade Federal do Rio de Janeiro - UFRJ, Campus Xerém, Duque de Caxias, RJ 25245-390,

More information

Approaching The Royal Game of Ur with Genetic Algorithms and ExpectiMax

Approaching The Royal Game of Ur with Genetic Algorithms and ExpectiMax Approaching The Royal Game of Ur with Genetic Algorithms and ExpectiMax Tang, Marco Kwan Ho (20306981) Tse, Wai Ho (20355528) Zhao, Vincent Ruidong (20233835) Yap, Alistair Yun Hee (20306450) Introduction

More information

An Evolutionary Approach to the Synthesis of Combinational Circuits

An Evolutionary Approach to the Synthesis of Combinational Circuits An Evolutionary Approach to the Synthesis of Combinational Circuits Cecília Reis Institute of Engineering of Porto Polytechnic Institute of Porto Rua Dr. António Bernardino de Almeida, 4200-072 Porto Portugal

More information

Evolutions of communication

Evolutions of communication Evolutions of communication Alex Bell, Andrew Pace, and Raul Santos May 12, 2009 Abstract In this paper a experiment is presented in which two simulated robots evolved a form of communication to allow

More information

ARTIFICIAL INTELLIGENCE (CS 370D)

ARTIFICIAL INTELLIGENCE (CS 370D) Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-5) ADVERSARIAL SEARCH ADVERSARIAL SEARCH Optimal decisions Min algorithm α-β pruning Imperfect,

More information

Vesselin K. Vassilev South Bank University London Dominic Job Napier University Edinburgh Julian F. Miller The University of Birmingham Birmingham

Vesselin K. Vassilev South Bank University London Dominic Job Napier University Edinburgh Julian F. Miller The University of Birmingham Birmingham Towards the Automatic Design of More Efficient Digital Circuits Vesselin K. Vassilev South Bank University London Dominic Job Napier University Edinburgh Julian F. Miller The University of Birmingham Birmingham

More information

Solving and Analyzing Sudokus with Cultural Algorithms 5/30/2008. Timo Mantere & Janne Koljonen

Solving and Analyzing Sudokus with Cultural Algorithms 5/30/2008. Timo Mantere & Janne Koljonen with Cultural Algorithms Timo Mantere & Janne Koljonen University of Vaasa Department of Electrical Engineering and Automation P.O. Box, FIN- Vaasa, Finland timan@uwasa.fi & jako@uwasa.fi www.uwasa.fi/~timan/sudoku

More information

CMU-Q Lecture 20:

CMU-Q Lecture 20: CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro ICE-CREAM WARS http://youtu.be/jilgxenbk_8 2 GAME THEORY Game theory is the formal study of conflict and cooperation in (rational) multi-agent

More information

2. The Extensive Form of a Game

2. The Extensive Form of a Game 2. The Extensive Form of a Game In the extensive form, games are sequential, interactive processes which moves from one position to another in response to the wills of the players or the whims of chance.

More information

Evolutionary Programming Optimization Technique for Solving Reactive Power Planning in Power System

Evolutionary Programming Optimization Technique for Solving Reactive Power Planning in Power System Evolutionary Programg Optimization Technique for Solving Reactive Power Planning in Power System ISMAIL MUSIRIN, TITIK KHAWA ABDUL RAHMAN Faculty of Electrical Engineering MARA University of Technology

More information

LECTURE 26: GAME THEORY 1

LECTURE 26: GAME THEORY 1 15-382 COLLECTIVE INTELLIGENCE S18 LECTURE 26: GAME THEORY 1 INSTRUCTOR: GIANNI A. DI CARO ICE-CREAM WARS http://youtu.be/jilgxenbk_8 2 GAME THEORY Game theory is the formal study of conflict and cooperation

More information

Principles of Computer Game Design and Implementation. Lecture 20

Principles of Computer Game Design and Implementation. Lecture 20 Principles of Computer Game Design and Implementation Lecture 20 utline for today Sense-Think-Act Cycle: Thinking Acting 2 Agents and Virtual Player Agents, no virtual player Shooters, racing, Virtual

More information

COMP219: Artificial Intelligence. Lecture 13: Game Playing

COMP219: Artificial Intelligence. Lecture 13: Game Playing CMP219: Artificial Intelligence Lecture 13: Game Playing 1 verview Last time Search with partial/no observations Belief states Incremental belief state search Determinism vs non-determinism Today We will

More information

Heads-up Limit Texas Hold em Poker Agent

Heads-up Limit Texas Hold em Poker Agent Heads-up Limit Texas Hold em Poker Agent Nattapoom Asavareongchai and Pin Pin Tea-mangkornpan CS221 Final Project Report Abstract Our project aims to create an agent that is able to play heads-up limit

More information

The Co-Evolvability of Games in Coevolutionary Genetic Algorithms

The Co-Evolvability of Games in Coevolutionary Genetic Algorithms The Co-Evolvability of Games in Coevolutionary Genetic Algorithms Wei-Kai Lin Tian-Li Yu TEIL Technical Report No. 2009002 January, 2009 Taiwan Evolutionary Intelligence Laboratory (TEIL) Department of

More information

Genetic Algorithms with Heuristic Knight s Tour Problem

Genetic Algorithms with Heuristic Knight s Tour Problem Genetic Algorithms with Heuristic Knight s Tour Problem Jafar Al-Gharaibeh Computer Department University of Idaho Moscow, Idaho, USA Zakariya Qawagneh Computer Department Jordan University for Science

More information

Local Search: Hill Climbing. When A* doesn t work AIMA 4.1. Review: Hill climbing on a surface of states. Review: Local search and optimization

Local Search: Hill Climbing. When A* doesn t work AIMA 4.1. Review: Hill climbing on a surface of states. Review: Local search and optimization Outline When A* doesn t work AIMA 4.1 Local Search: Hill Climbing Escaping Local Maxima: Simulated Annealing Genetic Algorithms A few slides adapted from CS 471, UBMC and Eric Eaton (in turn, adapted from

More information

COMP SCI 5401 FS2015 A Genetic Programming Approach for Ms. Pac-Man

COMP SCI 5401 FS2015 A Genetic Programming Approach for Ms. Pac-Man COMP SCI 5401 FS2015 A Genetic Programming Approach for Ms. Pac-Man Daniel Tauritz, Ph.D. November 17, 2015 Synopsis The goal of this assignment set is for you to become familiarized with (I) unambiguously

More information

CS221 Project Final Report Gomoku Game Agent

CS221 Project Final Report Gomoku Game Agent CS221 Project Final Report Gomoku Game Agent Qiao Tan qtan@stanford.edu Xiaoti Hu xiaotihu@stanford.edu 1 Introduction Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally

More information

Variance Decomposition and Replication In Scrabble: When You Can Blame Your Tiles?

Variance Decomposition and Replication In Scrabble: When You Can Blame Your Tiles? Variance Decomposition and Replication In Scrabble: When You Can Blame Your Tiles? Andrew C. Thomas December 7, 2017 arxiv:1107.2456v1 [stat.ap] 13 Jul 2011 Abstract In the game of Scrabble, letter tiles

More information

Evolving Behaviour Trees for the Commercial Game DEFCON

Evolving Behaviour Trees for the Commercial Game DEFCON Evolving Behaviour Trees for the Commercial Game DEFCON Chong-U Lim, Robin Baumgarten and Simon Colton Computational Creativity Group Department of Computing, Imperial College, London www.doc.ic.ac.uk/ccg

More information

INTERACTIVE DYNAMIC PRODUCTION BY GENETIC ALGORITHMS

INTERACTIVE DYNAMIC PRODUCTION BY GENETIC ALGORITHMS INTERACTIVE DYNAMIC PRODUCTION BY GENETIC ALGORITHMS M.Baioletti, A.Milani, V.Poggioni and S.Suriani Mathematics and Computer Science Department University of Perugia Via Vanvitelli 1, 06123 Perugia, Italy

More information

UMBC 671 Midterm Exam 19 October 2009

UMBC 671 Midterm Exam 19 October 2009 Name: 0 1 2 3 4 5 6 total 0 20 25 30 30 25 20 150 UMBC 671 Midterm Exam 19 October 2009 Write all of your answers on this exam, which is closed book and consists of six problems, summing to 160 points.

More information

Game Theory: The Basics. Theory of Games and Economics Behavior John Von Neumann and Oskar Morgenstern (1943)

Game Theory: The Basics. Theory of Games and Economics Behavior John Von Neumann and Oskar Morgenstern (1943) Game Theory: The Basics The following is based on Games of Strategy, Dixit and Skeath, 1999. Topic 8 Game Theory Page 1 Theory of Games and Economics Behavior John Von Neumann and Oskar Morgenstern (1943)

More information

Real-time Grid Computing : Monte-Carlo Methods in Parallel Tree Searching

Real-time Grid Computing : Monte-Carlo Methods in Parallel Tree Searching 1 Real-time Grid Computing : Monte-Carlo Methods in Parallel Tree Searching Hermann Heßling 6. 2. 2012 2 Outline 1 Real-time Computing 2 GriScha: Chess in the Grid - by Throwing the Dice 3 Parallel Tree

More information

CYCLIC GENETIC ALGORITHMS FOR EVOLVING MULTI-LOOP CONTROL PROGRAMS

CYCLIC GENETIC ALGORITHMS FOR EVOLVING MULTI-LOOP CONTROL PROGRAMS CYCLIC GENETIC ALGORITHMS FOR EVOLVING MULTI-LOOP CONTROL PROGRAMS GARY B. PARKER, CONNECTICUT COLLEGE, USA, parker@conncoll.edu IVO I. PARASHKEVOV, CONNECTICUT COLLEGE, USA, iipar@conncoll.edu H. JOSEPH

More information

Optimizing the State Evaluation Heuristic of Abalone using Evolutionary Algorithms

Optimizing the State Evaluation Heuristic of Abalone using Evolutionary Algorithms Optimizing the State Evaluation Heuristic of Abalone using Evolutionary Algorithms Benjamin Rhew December 1, 2005 1 Introduction Heuristics are used in many applications today, from speech recognition

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

The next several lectures will be concerned with probability theory. We will aim to make sense of statements such as the following:

The next several lectures will be concerned with probability theory. We will aim to make sense of statements such as the following: CS 70 Discrete Mathematics for CS Fall 2004 Rao Lecture 14 Introduction to Probability The next several lectures will be concerned with probability theory. We will aim to make sense of statements such

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificial Intelligence Spring 2007 Lecture 7: CSP-II and Adversarial Search 2/6/2007 Srini Narayanan ICSI and UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell or

More information

Yale University Department of Computer Science

Yale University Department of Computer Science LUX ETVERITAS Yale University Department of Computer Science Secret Bit Transmission Using a Random Deal of Cards Michael J. Fischer Michael S. Paterson Charles Rackoff YALEU/DCS/TR-792 May 1990 This work

More information

Evolving Digital Logic Circuits on Xilinx 6000 Family FPGAs

Evolving Digital Logic Circuits on Xilinx 6000 Family FPGAs Evolving Digital Logic Circuits on Xilinx 6000 Family FPGAs T. C. Fogarty 1, J. F. Miller 1, P. Thomson 1 1 Department of Computer Studies Napier University, 219 Colinton Road, Edinburgh t.fogarty@dcs.napier.ac.uk

More information

Evolutionary Computation and Machine Intelligence

Evolutionary Computation and Machine Intelligence Evolutionary Computation and Machine Intelligence Prabhas Chongstitvatana Chulalongkorn University necsec 2005 1 What is Evolutionary Computation What is Machine Intelligence How EC works Learning Robotics

More information

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel Foundations of AI 6. Adversarial Search Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard & Bernhard Nebel Contents Game Theory Board Games Minimax Search Alpha-Beta Search

More information

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS Thong B. Trinh, Anwer S. Bashi, Nikhil Deshpande Department of Electrical Engineering University of New Orleans New Orleans, LA 70148 Tel: (504) 280-7383 Fax:

More information

Game Theory: From Zero-Sum to Non-Zero-Sum. CSCI 3202, Fall 2010

Game Theory: From Zero-Sum to Non-Zero-Sum. CSCI 3202, Fall 2010 Game Theory: From Zero-Sum to Non-Zero-Sum CSCI 3202, Fall 2010 Assignments Reading (should be done by now): Axelrod (at website) Problem Set 3 due Thursday next week Two-Person Zero Sum Games The notion

More information

Artificial Intelligence Search III

Artificial Intelligence Search III Artificial Intelligence Search III Lecture 5 Content: Search III Quick Review on Lecture 4 Why Study Games? Game Playing as Search Special Characteristics of Game Playing Search Ingredients of 2-Person

More information

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play NOTE Communicated by Richard Sutton TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play Gerald Tesauro IBM Thomas 1. Watson Research Center, I? 0. Box 704, Yorktozon Heights, NY 10598

More information

Reinforcement Learning Applied to a Game of Deceit

Reinforcement Learning Applied to a Game of Deceit Reinforcement Learning Applied to a Game of Deceit Theory and Reinforcement Learning Hana Lee leehana@stanford.edu December 15, 2017 Figure 1: Skull and flower tiles from the game of Skull. 1 Introduction

More information