The Co-Evolvability of Games in Coevolutionary Genetic Algorithms

Size: px

Start display at page:

Download "The Co-Evolvability of Games in Coevolutionary Genetic Algorithms"

Alaina Paul
5 years ago
Views:

1 The Co-Evolvability of Games in Coevolutionary Genetic Algorithms Wei-Kai Lin Tian-Li Yu TEIL Technical Report No January, 2009 Taiwan Evolutionary Intelligence Laboratory (TEIL) Department of Electrical Engineering National Taiwan University No.1, Sec. 4, Roosevelt Rd., Taipei, Taiwan

2 The Co-Evolvability of Games in Coevolutionary Genetic Algorithms Wei-Kai Lin, Tian-Li Yu National Taiwan University Abstract This paper investigates the ability of coevolutionary genetic algorithm to solve games. Specifically, it focuses on two-player, zero-sum and symmetric games with both pure and mixed strategies. Games with mixed strategies are challenging for coevolution since the Nash strategy does not yield a higher payoff. On the other hand, games with pure strategies are more co-evolvable especially when the diversity is maintained in the population. Empirically, adopting niching techniques such as restricted tournament selection helps coevolution. Finally, this paper demonstrates the existence of games that require an exponential population size with respect to the size of the game. These issues are presented empirically using simple artificial games. 1 Introduction Real-world applications of coevolution on game-playing have grown popular recently (Angeline & Pollack, 1993; Azaria & Sipper, 2005a; Azaria & Sipper, 2005b; Chellapilla & Fogel, 1999; Ong, Quek, Tan, & Tay, 2007; Pollack & Blair, 1998; Pollack, Blair, & Land, 1996; Reynolds, 1994; Tesauro, 1992; Tesauro, 1995). Instead of evaluating a solution with an absolute fitness function in evolutionary algorithms, coevolutionary algorithms evaluate the solution by interacting with each others in the population. To evolve game players, a coevolutionary algorithm let those players play with each other instead of playing with some fixed opponents as trainers. The coevolutionary concept combined with different machine learning methods such as temporal difference learning (Tesauro, 1992; Tesauro, 1995), hill-climbing (Pollack & Blair, 1998; Pollack, Blair, & Land, 1996), and genetic programming (Azaria & Sipper, 2005a; Azaria & Sipper, 2005b) yields strong backgammon players that beating those results with a fixed trainer. For computer checker players, using pairwise competition in a population of neural networks, the best-evolved computer player is also able to defeat expert-level players (Chellapilla & Fogel, 1999). The application of coevolution to learn Chinese Chess strategies (Ong, Quek, Tan, & Tay, 2007) is explored, and there are good traits discovered. Some simple games, such as tic-tac-toe (Angeline & Pollack, 1993) and tag (Reynolds, 1994), are also simulated. The coevolutionary approach on the evolution of game players is more natural than using fixed evaluating functions since a player is considered strong if it beats many others. Evolving a player with some fixed training opponents is more possible to exploit the weakness of the trainers, but in general the resulted player may not show the same competence while playing with other players if the trainers are not properly chosen. Although many coevolutionary applications have been developed, the theoretical aspects of coevolutionary algorithms are still vague (Bucci & Pollack, 2002; de Jong & Pollack, 2004; Ficici, Melnik, & Pollack, 2005). Bucci and Pollack (2002) have discussed some coevolutionary issues such as collusion, where a pair of competing players apparently have an increasing score but are not really progressing. Ficici, Melnik, and Pollack (2005) revealed that given a two-player symmetric game with only two pure strategies and a Nash equilibrium as point attractor, a coevolutionary algorithm with only selection operator does not always converge. This paper aims to discuss the co-evolvablity of games. The evolvability is an important theoretical topic for evolutionary algorithms (Feldman, 2008; Valiant, 2006; Valiant, 2007), but there 1

3 are only few literatures about the evolvability of coevolutionary algorithms (Hammami, Kuroda, Zhao, & Saito, 2000). With a given game, the problem turns into whether the coevolution works and to what extent the coevolution can do. To be more precise, we focus on the issue that what kind of games are more or less co-evolvable. This paper is composed of five parts: (1) the background knowledge of game theory, and (2) discussing the co-evolvability of mixed strategy games based on game theory, and (3) the co-evolvability of pure strategy games with some techniques that helps coevolution, and (4) an artificial game that is less co-evolvable, and (5) the conclusions and future works. 2 Background The theory of games is well-studied in the mathematical and economical aspects (Dutta, 1999; Nash, 1951; von Neumann & Morgenstern, 1944). In this section, we formally address the games that we are interested in. We review the concepts of pure strategies and mixed strategies, and introduce Nash equilibrium as an optimal solution. Finally, the assumptions through out this paper are described. 2.1 The Normal-Form Game Normal form is a way to describe a game in game theory, and it is represented by a matrix (von Neumann & Morgenstern, 1944). A normal-form game has a set of players, a set of strategies for each player and a payoff function denoting the reward for each player. Definition 1 A normal-form game is a triple (I, S, P), where I = {1, 2, 3... n} (1) is a set of players, and S = {S i S i = {s 1 i, s 2 i,... s k i i }, 1 i n} (2) is the set contains all strategy sets S i for any player i, and P = {p 1, p 2, p 3,... p n } (3) is the set of the payoff function p i for each player i, where the payoff function p i p i : S 1 S 2 S 3... S n R (4) gives the payoff of player i with the strategies of all players given. 2.2 The Pure Strategies and Mixed Strategies Each strategy in S i is a pure strategy, which is deterministic and the player i plays it with certainty. For the sack of simplicity, a strategy stands for a pure strategy in this paper if not explicitly stated. A player i with a pure strategy set S i can also play with probability. This strategy that playing with probability is called a mixed strategy. Definition 2 A mixed strategy on a pure strategy set S i = {s 1 i, s2 i,... sk i i } is a probability vector such that the pure strategy s j i is played with a probability xj i. x i = (x 1 i, x 2 i, x 3 i,... x k i i ) (5) Clearly, the total probability is k i j=1 xj i = 1. A pure strategy can also be viewed as a mixed strategy that always plays a single strategy. Since a mixed strategy is a random variable, the payoff functions of mixed strategies are then defined by the expectation of the original payoff. Definition 3 Let x 1, x 2, x 3,... x n be the mixed strategies of all n players. The expected payoff π i of any given player i can be written as the following equation: π i ( x 1,... x n ) = x j1 1 x j x jn n p i (s j 1 1, s j 2 2,..., s jn n ) (6) j 1,j 2,...j n 2

4 2.3 Nash Equilibrium Nash has proved that the existence of mixed strategy Nash equilibrium in any finite game(nash, 1951). A Nash equilibrium is a mixed strategy profile that all players do not have the incentive to change their mixed strategy. Definition 4 Let ( x 1, x 2, x 3,... x n ) be a mixed strategy profile, where x i is a mixed strategy of player i. This profile is a Nash equilibrium if for all player i and for all x i x i. π i ( x 1,... x i,... x n ) π i ( x 1,... x i,... x n ) (7) All players are assumed to be rational and selfish, and each player only maximizes its payoff. In a Nash equilibrium, any player do not have the incentive to change its mixed strategy since changing does not yield a higher payoff. Furthuremore, a strategy in a Nash equilibrium is a Nash strategy. An optimal or perfect strategy is also used in the perspective of optimization. 2.4 Two-Player, Symmetric, Zero-Sum and Turn-Based Games We focus on the coevolution of the strategies of two-player, zero-sum, symmetric and turn-based games. Many natural games, like Chess or Go, exhibits these properties. A two-player game involves only two players and the player set I = {1, 2}. Using two-player instead of multi-player games is just for the simplicity through the discussion. Followed by this two-player property, zero-sum and symmetric games can be defined. The zero-sum condition is then states that the winning of one player implies the losing of the others in the game. A symmetric game requires that all players are equivalent. Given the above two-player property, zero-sum and symmetric properties can be defined as follows. Definition 5 A game with two players such that for all x 1, x 2 is zero-sum. Definition 6 A game with two players such that for all x 1, x 2 is symmetric. π 1 ( x 1, x 2 ) = π 2 ( x 1, x 2 ) (8) π 1 ( x 1, x 2 ) = π 2 ( x 2, x 1 ) (9) Finally, turn-based games proceed by the two players making decisions in turn. Such games have a set of rules that defining who is the player to move and what move is valid. Using the concept of state transition, a turn-based game can be modeled as (Ω, s, T, A, f), where Ω is the set of all possible game states, s Ω is a starting state, and T : Ω 2 Ω is a one-to-many mapping that defines all valid transitions for each state, and A Ω is the set of final states that indicates the terminating states of the game, and f : A R n is the function that decides winning/losing or scores of all n players. Note that a state in a board game usually consists of the configuration of the board and the information that denoting which one to play in this turn. It follows that a strategy is a function h : Ω Ω such that h(a) T (a) for all a Ω. A turn-based two-player game must has one player moves first, and then the other one moves secondly. The strategy to play first and second are different and asymmetric. Therefore, we flip a fair coin to decide which one plays first, just like many real-world games. Another issue in this state-transition concept is that By the way, there may have some cycles in the transition graph, which results in an infinite loop. To ensure the game has finite strategies and terminates in finite steps, we further assume there is no cycle in this transition graph. 3

5 3 Mixed Strategy Games, Less Co-Evolvable This section describes a difficulty in co-evolving mixed strategies for two-player, zero-sum and symmetric games. This difficulty comes from the fact that there are some non-nash strategies that have the same expected payoff while playing with a Nash strategy. We now show this in a formal way. Given a two-player, zero-sum and symmetric game with mixed strategies, let ( x, y ) be the Nash equilibrium. For a zero-sum and symmetric game, for all mixed strategies x. This implies π 1 ( x, x) = π 2 ( x, x) = π 1 ( x, x) (10) π 1 ( x, x) = π 2 ( x, x) = 0, (11) and it follows that π 1 ( x, y ) = 0 or ( x, y ) is not a Nash equilibrium. That is, all Nash equilibrium are equivalent. We assume that there is an unique Nash equilibrium without lose of generality, and then show that there exist another mixed strategy that ties with the Nash strategy. Lemma 1 Let ( x, x ) be a Nash equilibrium in a two-player, zero-sum and symmetric game. If x is not a pure strategy, then there exists u x such that π 1 ( x, u) = π( x, x ) = 0 (12) Proof 1 If x = (x 1, x 2,..., x n) is not a pure strategy, then x i < 1 for all i. Let x i > 0 and a be a mixed strategy such that a i = 1. Consider the expected payoff π 1 ( x, a). If π 1 ( x, a) < 0, then π 2 ( x, a) > 0 = π 2 ( x, x ) (13) and x is not a Nash strategy since a yields a higher payoff than x for player 2. If π 1 ( x, a) > 0, then we can find a mixed strategy b with b j = 1 and x j > 0 such that π 1 ( x, b) < 0. If not, the expected payoff π 1 ( x, x ) is the weighted sum of those π 1 ( x, b) and must be greater than 0. The existence of such b still violates the assumption of Nash equilibrium since π 2 ( x, b) > 0 = π 2 ( x, x ). (14) Therefore, we have π( x, a) = 0 for all such a, and all mixed strategies u which are composed from a and b have the property π( x, u) = 0. In other words, a Nash strategy does not get a higher expected payoff and has no more probability to reproduce while co-evolving. That is, the mixed Nash strategy does not get a higher payoff than its non-nash opponent strategy. For an example, the Nash strategy of rock-paper-scissors is playing all three gestures with equal probability 1/3, but this strategy always has a probability of 1/3 to win, lose or tie regardless of its opponent. Since the Nash strategy does not yield a higher payoff, we believe this is a challenge for coevolutionary algorithms to solve games with mixed strategy. 4 Pure Strategy Games, More Co-Evolvable In this section, we consider a game that 1. is two-player, zero-sum, symmetric and turn-based, and 2. has a finite number of strategies and a finite number of steps to terminate. With enough computation power, this game can be solved by backward induction (von Neumann & Morgenstern, 1944). That is, to express all states in the whole game in a game tree, where each leaf node is a terminating state, apply a mini-max search (Russell & Norvig, 2003) for the whole tree and the best action to take in each state can be found. The table that lists which action to 4

6 Figure 1: The simple state game with 2 turns. do in each state is then the perfect strategy or Nash strategy. By this mini-max algorithm, there exists a score for each state a Ω, where the score is the worst case payoff if the game starts from the given state a. Although these games are simple, solving them with a coevolutionary algorithm is not always trivial. In this section, we firstly propose a needle-in-a-haystack game, which should be hard for all black-box optimization algorithms. After that, an artificial turn-based game is proposed and experiments are conducted. 4.1 A Needle-in-a-Haystack Game Imagine a two-player, zero-sum and symmetric game with strategies {s 1, s 2,... s n, s }, where the payoff function of player 1 is 1 x = s and x y p 1 (x, y) = 1 y = s and x y. (15) 0 otherwise and the payoff function of player 2 is then the negation of p 1. In this game, s is the Nash strategy and all other strategies are equal and worse than s. For such a game, a coevolutionary algorithm does not has any information about the Nash equilibrium before s is reached, and the problem requires exhaustive search to solve. This problem to coevolution is just like the needle-in-a-haystack problem to black-box optimization algorithms, and the optimal can not be found efficiently. In general, if the probability to win the other in average is exponentially small with respect to the size of the game, then such a game is difficult to solve and is not interesting since we always get a tie. Thus we do not consider this kind of games. 4.2 A Simple State Game The example game is defined as follows: the starting state is a flag placed on a straight line at position 0. For both players, there are two actions in each state: moving the flag right (increasing) or left (decreasing) one unit. Let the two players take turns, and terminate in a given number of turns t. The score is then decided by the position of the flag, the former player wins if the position is positive, and the latter wins if negative, otherwise two player draw even. Figure 1 is a game with t = 2, and each state (p, i) in the game includes the position p of the flag and the number of turns i taken. In this game with t turns, the flag has 2t + 1 possible positions in the terminal states. The number of states is 2t 2 + 3t + 1 due to a state includes the position and the number of moves done. Since there are two actions for each non-terminal state, there are total 2 2t2 +t strategies. The perfect strategy is always move right for the former and always move left for the latter. However, all strategies that choosing the correct action in the middle 4t 1 states are perfect. Although this game is simple, it resembles real-world game in some aspects. Here we focus on non-transitivity and don t care states. 1. Non-transitivity. There exist a set of strategies s 1, s 2,... s k such that s 1 s 2... s k s 1, (16) 5

7 Coevolutionary GA with RTS 1. Sample a random population P uniformly. 2. Evaluate the fitness of each individual x P by playing games with a set of opponents, which is randomly chosen from the same population P. 3. Generate a new population P by applying a crossover operator on the original population P. 4. Evaluate the fitness of each individual x P by playing games with a set of opponents, which is randomly chosen from the same population P. 5. For each offspring x in P, perform the restricted tournament selection: (a) Find y W such that minimizes the distance between x and y, where W P and W = w is a random sample set with window size w. (b) Replace y with x if x has a higher fitness, otherwise discard x 6. Stop if terminating condition satisfied, otherwise go to step 2 Figure 2: Steps of the coevolutionary genetic algorithm with restricted tournament selection (RTS). where denotes that a strategy never loses the other and at least has one way to win. 2. The existence of don t care states in the perfect strategy. There are two types of states are don t care in this game: non-visited states and no-difference states. Once those middle states are fixed, those states in the left would never be reached. On the other hand, those right side states are don t care due to both actions leads to win. The non-transitivity may lead a coevolutionary algorithm to cycle through those strategies and unlikely to converge. Those don t care states can also confuse the algorithm while co-evolving. Therefore, experiments are conducted to investigate the ability of coevolution. 4.3 The Coevolutionary Genetic Algorithm with Niching A coevolutionary genetic algorithm (GA) is applied on this simple game, with each strategy encoded with a chromosome, where the chromosome length is the number of non-terminal states in the game. The chromosome is a binary string with 1 means a correct action in the corresponding state, 0 otherwise. In genetic algorithms, the importance of niching technique has been recognized for a long time, especially for multimodal problems (Goldberg & Richardson, 1987). For coevolutionary algorithms, the need of keeping diversity, or the issue of loss of gradient, are less emphasized (de Jong & Pollack, 2004; Watson & Pollack, 2001). To resolve this issue, we propose a coevolutionary genetic algorithm with restricted tournament selection (RTS) (Harik, 1995). The algorithm is based on simple GA, but the fitness is evaluated by pairwise competition between individuals in the same population. That is, a pair of individuals play a game, and the score of the player is added to its fitness value. Using this fitness, the restricted tournament selection is applied (Figure 2). 4.4 Investigating the Niching Technique with the Simple Game The algorithms are performed with a three-turn game with population size 500. The coevolution with and without RTS are repeated 1000 times independently, and the probability of having a perfect strategy are recorded for each generation (Figure 3). Without niching, the algorithm reaches a perfect strategy in a shorter time, but has failed to keep it in the long run. This is due to the convergence time for different genes differs a lot, and then those bits with longer convergence time diverges since the corresponding states are no longer visited. 6

8 Perfect strategy Perfect strategy no niching using RTS Number of generations no niching using RTS Number of generations Figure 3: The three turn simple state game with population size 500. Each action in a state is encoded as one bit in the chromosome. The probability of having a perfect strategy in the population for each generation are averaged over 1000 independent runs. Compared to the one with RTS, the no-niching algorithm grows faster in the beginning but starts to diverge in the long run. Without niching, some states can not be reached after those corresponding bits converged, and then these unreachable states do not progress any more and just randomly walk. 4.5 The Optimal Number of Opponents to Score an Individual While scoring a chromosome, using a random sample set instead of a single opponent would have a more accurate estimation of whether this chromosome is good or not. As an optimization algorithm, the performance is important. An accurate estimation could improve the performance, but using a larger number of opponents to evaluate a chromosome does also increase the number of function evaluation. A series of experiments is conducted. Figure 4a shows the number of evaluation required for different number of opponents with the population size properly chosen for each number of opponents. The population sizes are obtained using bisection, which finds the minimum population size that allow the coevolution to success in 30 of total 30 independent runs (Figure 4b). With a properly set population size, the number of function evaluation used is linear to the number of opponents (Figure 4a), which is consistent with the optimal sampling size for ordinary GA (Yu & Lin, 2008). In other words, using only a single opponent is better is the population size is large enough. However, using more than one opponent can be more efficient if the population size is fixed (Figure 4c). 5 A Less Co-Evolvable Pure Strategy Game In real-world, even a small game has a large number of states. For an example, tic-tac-toe has only a board of size 9 but there are thousands of possible states. The number of strategies is in the order of a exponential function of the number of states, which is usually intractable. Heuristic functions are used to help searching algorithms, such as minimax search, to make decisions. Coevolutionary algorithms also works on heuristic functions and optimizes these heuristics. The another challenge for coevolution is that, does it find the optimal heuristic? According to our results from state mapping games, a correct action in a state is made only when there exist some opponents in the population that helps reach that state. Consider a game with board size l, assume the number of states is O(c l ), where c is a constant. To co-evolve a perfect strategy for an unknown problem, we need to drive the algorithm to go through each state and evolve the heuristics. That is, the required number of opponent is O(c l ) in the worst case, which means the required population size is exponentially large (or the running time is exponentially long). Consider a board game with l positions, we represent a strategy using some heuristics that 7

9 Number of function evaluations 12 x exp: 6 turns regression: 6 turns exp: 4 turns regression: 4 turns The number of opponent to score an individual (a) The total number of function evaluations versus the number of opponents 450 Region I Region II Population size The number of opponent to score an individual (b) The required population size versus the number of opponents Region I Region II Performance The number of opponent to score an individual (c) The performance versus the number of opponents using a fixed population size 200 Figure 4: (a) The number of function evaluation used to find the perfect strategy in the 6-turn simple state game with different sampling size. (b) For each sampling size, the required population size that found by bisection. (c) The performance with a fixed population size and the number of function evaluations are restricted to 500,000. Note that in region I the performance decreases since the population size is not large enough, and it also decreases in region II since the number of function evaluation is not enough. 8

10 Figure 5: The bit-flipping game with board size 6. Note that the same state may have more than one node in this game tree. encoded with l bits. This does not encode all possible strategies since there are 2 cl strategies if the game has c l states and two actions for each state. However, it may still takes a exponential number of opponents to achieve a perfect strategy in the worst case. 5.1 A Bit-Flipping Game To demonstrate the above issue, we propose a game with a weight-based heuristic that takes exponentially large population size to find the optimal. The initial state is a linear string with l 1s and l 0s interleaved. The valid actions of the former player are flipping consecutive 0s to 1s, and the valid actions for the latter player are flipping 1s to 0s. The player that turns the string to all 0s or all 1s wins (Figure 5). Since these actions always reduces the number of consecutive 0 or 1 segments, this game must ends in 2l 1 steps. In this game, each state can be represented by (b, p), where b is a possible board position and p {A, B} denotes which one to play in this turn. This game has O(2 l ) states, and the perfect strategy is always to flip a side segment if possible. A heuristic to play this game is created: using one bit for each position, indicating the weight of the corresponding position and the score of a state is the sum of those l weights. For two actions with the same score, we always choose the left most one. The perfect strategy can be described using this weight heuristic, that is, the weight 0, 0,... 0, 1 is the optimal heuristic and also the perfect strategy. A coevolutionary GA with RTS was applied to this problem, and the required population size for different problem size l are computed using bisection (Figure 6). The experimental result shows the population size grows exponentially as expected Population size experimental results * l) O(e Problem size (l) Figure 6: The required population size for the flip game of different sizes obtained through bisection. As expected, the required population size grows as the number of states increase in a exponential scale. 9

11 6 Conclusions In summary, this paper investigates the problems and challenges for coevolutionary game players. Using the state-transition concept, the game is easier to solve while the actions in each state is encoded. Even for this easier case, keeping the diversity in the population is important due to the fact that the action in a state does not converge if it is not visited. Followed by this concept, the need in population size to co-evolve the perfect strategy is studied. For a game with exponentially large number of states, it may requires an exponentially large population size to find the optimal strategy. This is also true for co-evolving players through some heuristics. A simple game is demonstrated using the naïve weight heuristic, and the result consists with above argument. Although the results in this paper are based on artificial games, we believe that these results should be applied to other two-player, zero-sum and symmetric games. To conclude, we propose 1. It is challenging for coevolutionary algorithms to find the mixed strategy equilibrium. We suggest that the mixed strategy games is not co-evolvable. 2. Pure-strategy game players are more co-evolvable, and niching is a helpful technique. 3. The optimal number of opponents used to evaluate a strategy exists. More specifically, a smaller number of opponents is better if the population size is properly sized, while it is larger if the population size is smaller since an accurate estimation is needed. This is consistent to the result of sampling size in GA (Yu & Lin, 2008). 4. Some pure strategy games are less co-evolvable, and they require exponentially large population sizes. There is still plenty of room for coevolutionary issues. For example, the population size and convergence time still lack a rigorous study even for a simpler and more co-evolvable game. It is important to model the time and space complexity in terms of the required population size and the convergence time, and we can classify those co-evolvable games formally by these complexities. Also, it will be nice if we can determine the optimal number of opponents in a theoretical way or automatically during the coevolving process. References Angeline, P. J., & Pollack, J. B. (1993). Competitive environments evolve better solutions for complex tasks. In Proceedings of the 5th International Conference on Genetic Algorithms (pp ). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Azaria, Y., & Sipper, M. (2005a, September). Gp-gammon: Genetically programming backgammon players. Genetic Programming and Evolvable Machines, 6 (3), Azaria, Y., & Sipper, M. (2005b, 30 March - 1 April). Gp-gammon: Using genetic programming to evolve backgammon players. In Keijzer, M., Tettamanzi, A., Collet, P., van Hemert, J. I., & Tomassini, M. (Eds.), Proceedings of the 8th European Conference on Genetic Programming, Volume 3447 of Lecture Notes in Computer Science (pp ). Lausanne, Switzerland: Springer. Bucci, A., & Pollack, J. B. (2002). Order-theoretic analysis of coevolution problems: Coevolutionary statics. In Proceedings of the GECCO-2002 Workshop on Coevolution: Understanding Coevolution (pp ). Chellapilla, K., & Fogel, D. (1999, Nov). Evolving neural networks to play checkers without relying on expert knowledge. Neural Networks, IEEE Transactions on, 10 (6), de Jong, E. D., & Pollack, J. B. (2004, June). Ideal evaluation from coevolution. Evolutionary Computation, 12 (2), Dutta, P. K. (1999). Strategies and games: theory and practice. MIT Press. Feldman, V. (2008). Evolvability from learning algorithms. In STOC 08: Proceedings of the 40th annual ACM symposium on Theory of computing (pp ). New York, NY, USA: ACM. Ficici, S., Melnik, O., & Pollack, J. (2005, December). A game-theoretic and dynamical-systems analysis of selection methods in coevolution. Evolutionary Computation, IEEE Transactions on, 9 (6), Goldberg, D. E., & Richardson, J. (1987). Genetic algorithms with sharing for multimodal function optimization. In Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application (pp ). Hillsdale, NJ, USA: L. Erlbaum Associates Inc. 10

12 Hammami, O., Kuroda, K., Zhao, Q., & Saito, K. (2000, Jan.). Coevolvable hardware platform for automatic hardware design of neural networks. Industrial Technology Proceedings of IEEE International Conference on, 1, vol.2. Harik, G. R. (1995). Finding multimodal solutions using restricted tournament selection. In Proceedings of the 6th International Conference on Genetic Algorithms (pp ). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Nash, J. (1951, September). Non-cooperative games. The Annals of Mathematics, 54 (2), Ong, C., Quek, H., Tan, K., & Tay, A. (2007, April). Discovering chinese chess strategies through coevolutionary approaches. Computational Intelligence and Games, CIG IEEE Symposium on, Pollack, J. B., & Blair, A. D. (1998, September). Co-evolution in the successful learning of backgammon strategy. Machine Learning, 32 (3), Pollack, J. B., Blair, A. D., & Land, M. (1996). Coevolution of a backgammon player. In Proceedings Artificial Life V (pp ). MIT Press. Reynolds, C. W. (1994, 6-8 July). Competition, coevolution and the game of tag. In Brooks, R. A., & Maes, P. (Eds.), Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems (pp ). MIT, Cambridge, MA, USA: MIT Press. Russell, S., & Norvig, P. (2003). Artificial intelligence: A modern approach (2 ed.). Prentice Hall. Tesauro, G. (1992). Practical issues in temporal difference learning. Mach. Learn., 8 (3-4), Tesauro, G. (1995). Temporal difference learning and td-gammon. Commun. ACM, 38 (3), Valiant, L. G. (2006). Evolvability. Electronic Colloquium on Computational Complexity (ECCC), 6 (120). Valiant, L. G. (2007). Evolvability. In Proceedings of 32nd International Symposium on Mathematical Foundations of Computer Science (pp ). von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton University Press. Watson, R. A., & Pollack, J. B. (2001). Coevolutionary dynamics in a minimal substrate. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO-2001 (pp ). Morgan Kaufmann. Yu, T.-L., & Lin, W.-K. (2008). Optimal sampling of genetic algorithms on polynomial regression. In GECCO 08: Proceedings of the 10th annual conference on Genetic and evolutionary computation (pp ). New York, NY, USA: ACM. 11

Chapter 3 Learning in Two-Player Matrix Games

Chapter 3 Learning in Two-Player Matrix Games 3.1 Matrix Games In this chapter, we will examine the two-player stage game or the matrix game problem. Now, we have two players each learning how to play