Opponent Models and Knowledge Symmetry in Game-Tree Search

Opponent Models and Knowledge Symmetry in Game-Tree Search Jeroen Donkers Institute for Knowlegde and Agent Technology Universiteit Maastricht, The Netherlands donkers@cs.unimaas.nl Abstract In this paper we investigate the role of symmetry in knowledge of the opponent during gameplaying by computers. The games under consideration are board games like Chess: large-scale two-player zero-sum games of perfect information. The solution of such games is easily defined but since these board games are large and complex, it is not feasible to obtain an actual solution. Research in Artificial Intelligence has resulted in efficient heuristic search methods that approximate solutions. The standard search methods assume pure competitive preferences with inherent knowledge symmetry. With the development of Opponent-Model search, a knowledge asymmetry is introduced: only one of the players is assumed to have a precise model of the opponent. We propose to add knowledge symmetry to opponent modelling and allow both players to have a model of each other and commonly know this. This leads to a switch from zero-sum to nonzerosum games. New search methods will have to be defined and investigated. We provide some suggestions for future research. 1 Introduction Within Artificial Intelligence, playing board games like Chess by computers has always been a topic of interest. Although some games with imperfect information are treated (e.g., Backgammon and Poker), most of the board games studied in AI are two-player zero-sum games of perfect information in extensive form. From a game-theoretic point of view, these games are trivial since their solution is easily defined [11, 12]. Moreover, backward induction produces a subgame-perfect equilibrium (SPE) in these games [8]. Unfortunately, the size of the Chess game-tree is too large to perform backward induction within a reasonable amount of time. In the more than fifty years of research that has been done on computer game-playing, the emphasis was on developing heuristic methods that estimate the solution as good as possible, within the time allowed by tournament rules. The normal practice of playing Chess and other board games by computers is to solve a series of reduced games during a game instead of one large game, one for every turn in the game [4]. (This procedure is called search in computer game-playing.) The reduced games are constructed using three mechanisms: (1) Histories are truncated at a certain length (depending on the available search time). The true (but unknown) game-values of the truncated histories are replaced by static heuristic evaluation values. (The length of the resulting histories is called the search depth in computer gameplaying.) (2) Some histories are removed from the reduced game. This is called forward pruning in computer game-playing. (3) All histories in the reduced games start with the moves that already 1

have been played in the game (the game history). Figure 1 illustrates the mechanism. The original game (in extensive form) is represented by the dashed triangle and the subsequent reduced games are represented by the grey triangles A, B, and C. The history of actual moves played is indicated by the bold zigzag line. At the first move, player 1 solves reduced game A, then player 2 solves reduced game B, and so on. Figure 1: Playing a game (dashed triangle) by solving a series of reduced games (triangles A, B, C). The resulting reduced games can be solved by backward induction (called Minimax search in computer game-playing). In order to find the best move to play next, in the most cases it is not necessary to obtain a complete solution since only the game-theoretic value of the root node of the reduced game needs to be determined. Often, large portions of the game tree do not influence the game-theoretic value of the root node, and can be disregarded (which is called pruning). Pruning is the basis of the well-known ff-fi search procedure [7]. Many enhancements and improvements have been developed for ff-fi search, resulting in more pruning and therefore in an increment of the size of the reduced games that can be handled in the available search time. Although this approach has proven to be successful in Chess (e.g., Deep Blue), in other games (e.g. Go) the approach is less profitable. Furthermore, it is not clear whether solving a sequence of these reduced games is the best approach to win a game. In Section 2 we indicate and discuss the knowledge symmetry in Minimax search. Section 3 introduces Opponent-Model search and Section 4 describes a recursive extension of this search method. The combination of knowledge symmetry and opponent modelling in nonzero-sum games is proposed in Section 5. Finding solutions for large nonzero-sum games is subject of Section 6. Using opponent models introduces risk. This is briefly discussed in Section 7. The paper ends with conclusions in Section 8. 2 Knowledge symmetry in Minimax search Board games like Chess are modelled as zero-sum games, since they are mostly considered completely competitive. In a tournament setting, however, it can sometimes be advantageous not to win a certain game in order to obtain better chances in a next round. Some computer programs have a provision (called the contempt factor) that adjusts the value of a draw, but the game is still being treated as a zero-sum game. In the normal practice of Minimax search, the reduced games are assumed to be completely competitive too. A single static heuristic evaluation function is used to produce a pay-off value for all histories for both players at once. As we will see below, it is not necessary for the reduced games to be completely competitive. 2

The competitive approach to reduced games assumes knowledge symmetry: both players use the same evaluation function and both players commonly know that the other uses this evaluation function in a rational way too [9]. In a real tournament game, it is highly unlikely that the two players share the same heuristic evaluation function. Still, most programs use the Minimax approach. An explanation is that Minimax acts as a kind of worst-case defence: a program is defending against the worst possible opponent it can imagine. 3 Opponent-Model search In 1993, Opponent-Model search was proposed [1, 6]. The motivation of this approach was that the difference in evaluation function should be taken into consideration. When the opponent is using a weaker evaluation function (whatever this means), and a player knows the opponent s function, then this knowledge can be exploited. OM search is based on knowledge asymmetry. Assume that player 1 uses function v 1 to evaluate histories, and that player 2 uses another function v 2. OM search assumes that player 2 is unaware of player 1 using v 1 and that player 2 thinks that player 1 uses Minimax search with the same evaluation function, v 2. Player 1 (using OM search) knows both the own evaluation function v 1 and the evaluation function v 2 of player 2, and knows the opponent s state of knowledge. The OM-search procedure is a variant of backward induction and works as follows (a more detailed description can be found in [4]). For histories h where player 2 is to move, first standard backward induction (e.g. ff-fi search), is used to solve the subgame at h with evaluation function v 2. The value v(h) for player 1 of the subgame at h is then taken as the value 1 v(h + m Λ ) of the subgame at h + m Λ, where m Λ is the move selected by player 2. For histories h where player 1 is to move, the value v(h) of the subgame is defined as the maximum v(h + m i ) over all moves m i available at h. Figure 2 gives a small example of Opponent-Model search. The squares indicate player 1, the circles player 2. Inside the squares and circles are the payoffs and subgame values in the case that no opponent model is used: the Minimax approach. The values V 1 and V 2 indicate the payoff for player 1 and the payoffs that player 1 assumes that player 2 uses. The values v 2 give the subgame values for player 2, who is using Minimax. The values v 1 are the values that player 1 gets when using the opponent model. In the Minimax approach, player 1 would select the left branch, leading to a value of 7. When the opponent model is used, player 1 selects the right branch. This happens because player 1 We use notation h + m to indicate the extension of history h with move m. Figure 2: An example comparing Opponent Model search and Minimax search. 3

2 selects the left branch at the right node on the basis of payoffs V 2, leading to a value of 8 for player 1. The solution is indicated by the dashed line. The procedure means that the player using OM search is in fact aware of the opponent s strategy. Obviously, OM search is risky when the opponent model as described in the previous paragraph is not correct. 4 Recursive Opponent-Model search In OM search, one of the player has an explicit opponent model of the other player. M* search, also developed in 1993 [1, 2], is a recursive extension of OM search. In this approach, both players can have an opponent model of each other, but the knowledge is asymmetric. M* assumes that player 1 can have a model of player 2, like in OM search. However, unlike OM search, player 2 can be aware of this fact and can have a model of player 1 that includes the model that player 1 has of player 2. Moreover, player 1 can even have a more general opponent model of player 2 that includes player 2 s complete model. The opponent models can be nested to any depth but one of the players is always one step ahead of the other. 5 Knowlegde Symmetry and Opponent Modelling Both OM search and M* search have distributed the knowledge asymmetry among the players since player 1 knows more about player 2 than vise versa. Therefore, a natural next step after OM search and M* search is to add knowledge symmetry to opponent models. We propose an approach to opponent modelling in which both players are aware of their opponent s evaluation function that may differ from their own. Like in Minimax search, both players are aware of each other s knowledge and rational behaviour. It means that we assume the reduced games to be nonzero-sum games of perfect information. Why is it justified to assume that two players have a different evaluation function and at the same time know each others function? Shouldn t both players select the strongest function of the two and use Minimax? One answer would be the common interest of the players during certain stages of the game. This can lead to nonzero-sum game, even if both players interpret the positions similarly. A Chess example may clarify this [3, 4]. It is well known that computer Chess programs play relatively better in open positions (i.e., positions having many open lines) than in closed positions, see figure 3. If two computer programs are matched, it is in their common interest to reach an open position. Suppose that the evaluation function v 1 for player 1 assigns a value X + Y to a given Chess position P. X represents the openness of a position and Y represents all other factors, including material count, king safety, et cetera. The higher the value of X, the more player 1 prefers the position. In the Minimax approach, the evaluation function v 2 for player 2 is the opposite of v 1, so it assigns the opposite value to the same position: (X + Y ). It means that the higher the value of X for a position, the less player 2 prefers the position. In other words, Minimax assumes that player 2 tries to avoid an open position. To model the true preference of player 2, it seems more natural that evaluation function v 2 assigns value X Y to the position. Then player 2 prefers positions with high values for X as well as player 1. As a consequence, the sum of the two payoffs (X + Y ) + (X Y ) = 2X is not zero anymore, hence the game is nonzero-sum. Both players interpret the position in the same way, since they assign 4

Figure 3: An open Chess position (left) and a closed Chess position (right). an equal value to X for the openness of the position, which is their common interest, and an equal value to Y for the competitive part of the evaluation. When players interpret and evaluate the positions in different ways, the nonzero-sum property of the game is obvious. However, even in that case, common interest can be defined implicitly. Suppose that player 1 assigns value A to a position and player 2 assigns value B to the same position. The common-interest part of the evaluation (C) will then be equal to C = (A + B)=2 and the competitive part (S) will be equal to S = (A B)=2. The value for player 1 is C + S = A and the value for player 2 is C S = B. 6 Solutions for large nonzero-sum games Switching from zero-sum reduced games to nonzero-sum reduced games means that a solution becomes more difficult to define and to obtain. It is known from game theory that every finite game of perfect information has at least one subgame perfect equilibrium (SPE) and that backward induction can be used to find the SPEs [8]. In contrast with zero-sum games in which all SPEs have the same value, in nonzero-sum games, SPEs can co-exist with different pairs of values for the players. This occurs when one player can select between two or more optimal moves that each have a different value for the other player. Which move a player will select in such a case depends on the willingness of the player to co-operate. Figure 4 shows an example of how two SPEs in one game co-exist, asking for a method to select between them. The meaning of the squares and circles is equal to those in figure 2. The number inside the squares and circles indicate the Minimax approach. Below the leaf nodes are the payoffs expressed in a competitive part (S) and a common-interest part (C). The values for V 1 and V 2 are C + S and C S, respectively. Note that player 2 is maximizing payoffs in this approach, like player 1. The values next to the other nodes give the subgame values for the SPE in which player 2 selects the left branch at the right node. This SPE is a co-operative equilibrium: player 1 receives a higher payoff (8) than in the Minimax approach (7). Would player 2 select the left branch, then player 1 selects the left branch at the top node, leading to a payoff of 7 for player 1 and a payoff of 1 for player 2. So, although player 2 is locally indifferent in selecting left or right branch at the right node, globally player 2 is better of selecting the right branch. For small nonzero-sum games, it is not difficult to obtain all SPEs and select one of them, but for reduced games of the size that for instance Chess programs solve during play, the task is non-trivial. Many techniques that programs use to enhance the search efficiency depend on the zero-sum property of a game (e.g., ff-fi pruning). Similar techniques, applied to nonzero-sum games, are expected 5

Figure 4: An example comparing a nonzero-sum approach and Minimax search. to be less effective for two reasons: (1) subgames have two values instead of one and the relation between the two values is in general not known, (2) in case of multiple SPEs, information on all SPEs has to be taken into account. With an additional demand upon the payoff functions, however, matters may improve. We expect that if the sum of the payoffs for two players is bounded (i.e., value C is bounded), the values of different SPEs will be bounded too. Such a bound could be used in an ff-fi like method to prune the search tree. We expect that the smaller the bound on the payoff sums, the more pruning can be applied. If the bound is zero, the game is a zero-sum game. Bounds on the payoff sum have been applied to pruning in n-player games in the Max n procedure [10], but it is likely that in the case of two players, higher efficiency can be achieved. Selection between multiple SPEs during game-tree search is a new and challenging issue. Some SPEs require co-operation between the players, others are more competitive. It is an open question how willingness of an opponent to co-operate could be measured and applied in a search method. 7 Risk in opponent modelling Previous research indicated that Opponent-Model search gives rise to two sorts of risk [4, 5, 6]. The first sort of risk is caused by an incorrect model of the opponent. Such an incorrect model will possibly lead to an incorrect prediction of the opponent s strategy and cause suboptimal behaviour. This can cause results that are worse than the results reached by not using an opponent model. The second sort of risk is caused by an estimation error in the own payoff function. If a player estimates a position too advantageous, and the opponent has a correct estimation, OM search will be attracted to such a position, again leading to a suboptimal result. The larger the error, the stronger the attraction. The two sorts of risk are also expected to occur when using nonzero-sum games for opponent modelling. The first sort is obvious, since an erroneous opponent model will lead to wrong SPEs. The second sort is more delicate. When the common-interest part (value C in section 5) of the payoff functions is estimated too high for a certain position, the SPEs will be attracted to this position. Also in this case holds: the larger the error, the stronger the attraction. Estimation errors in the competitive part (value S in section 5) will not lead to this kind of attraction because they are filtered out by a Minimax-like mechanism. 6

8 Conclusion Introducing opponent models together with knowledge symmetry enhances game-playing by computers since common interest between players can be modelled. It requires, however, the development of new search techniques and the adaptation of existing search enhancements. Future research must indicate whether the advantage of the enhanced model is larger than the expected loss in search efficiency and the effect of the expected risks. References [1] D. Carmel and S. Markovitch. Learning models of opponent s strategies in game playing. In Proceedings AAAI Fall Symposion on Games: Planning and Learning, pages 140 147, Raleigh, NC, 1993. [2] D. Carmel and S. Markovitch. The M Λ algorithm: Incorporating opponent models in adversary search. Technical Report CIS9402, Technion, Haifa, Israel, 1994. [3] D. Carmel and S. Markovitch. Pruning algorithms for multi-model adversary search. Artificial Intelligence, 99(2):325 355, 1998. [4] H.H.L.M. Donkers. Nosce Hostem Searching with opponent models. PhD thesis, Universiteit Maastricht, Maastricht, The Netherlands, 2003. [5] H.H.L.M. Donkers, J.W.H.M. Uiterwijk, and H.J. van den Herik. Admissibility in opponentmodel search. Information Sciences, 154(3-4):119 140, 2003. [6] H. Iida, J.W.H.M. Uiterwijk, H.J. van den Herik, and I.S. Herschberg. Potential applications of opponent-model search. Part 1: the domain of applicability. ICCA Journal, 16(4):201 208, 1993. [7] D.E. Knuth and R.W. Moore. An analysis of alpha-beta pruning. Artificial Intelligence, 6(4):293 326, 1975. [8] H.W. Kuhn. Extensive games and the problem of information. In H.W. Kuhn and A.W. Tucker, editors, Contributions to Game Theory, Volume II, Annals of Mathematics Studies, 28, pages 193 216, Princeton, NJ, 1953. Princeton University Press. [9] A. Rubinstein. Comments on the interpretation of game theory. Econometrica, 59:909 924, 1991. [10] N. Sturtevant and R.E. Korf. On pruning methods for multi-player games. In Proceedings AAAI-00, pages 201 207, Austin, TX, 2000. [11] J. von Neumann and O. Morgenstern. Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, 1944. [12] E. Zermelo. Über eine Anwendung der Mengenlehre auf die Theorie des Schachspiels. In Proceedings of the Fifth Congress of Mathematicians (Cambridge 1912), pages 501 504. Cambridge University Press, 1913. 7