Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 3 Fall PDF Free Download

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 3 Fall 21 Peter Bro Miltersen November 1, 21 Version 1.3

3 Extensive form games (Game Trees, Kuhn Trees) The strategic form of a game is a very general and clean way of representing a game, but it is not a very convenient one. Suppose for instance, that we want to represent the game tic-tac-toe in the strategic form. We need a matrix with a row for every possible way of playing tic-tac-toe as X and a column for every possible way of player tic-tac-toe as O. t may not be very obvious how to enumerate the possible ways of playing tic-tac-toe. We shall define an alternative representation of a game, the extensive form or game tree or Kuhn tree representation which will make it more explicit what those possible ways of playing a game is. The extensive form of a game thereby gives a better visualization of the game. Also, as we shall see, it provides a much more compact representation of games such as, say, tic-tac-toe, than the strategic form. This compactness is clearly important for computational purposes. A game in extensive form is a rooted tree. Each node of the tree is also called a position. Each position belongs to exactly one player, or to nature. f a position belongs to nature, a fixed probability distribution is assoicated to its outgoing arcs. For each player, a partition (i.e., an equivalence relation) is given on the set positions belonging to that player. The equivalence classes are called information sets. ntuitively, the player cannot distinguish between the nodes in an information set. n each position of the game belonging to a player, the outgoing arcs are also known as the actions of that position. ntuitively, a player must choose between one of these actions whenever he finds himself in that position. Each action has a name. f two positions are in the same information set, the set of action names in those postions should coincide. On he other hand, if two positions are not in the same information sets, we require that the actions names in those positions are disjoint sets (this will be convenient later on). Example: Basic Endgame in Poker The rules of the game are: (1) Both players put $1 into the pot. (2) Player 1 is dealt a card (he inspects it, but keeps it hidden from player 2). A heart a winning card for Player 1. So it is a winning card with probability 1 4 and a losing card with probability 3 4. 2

(3) Player 1 either bets or he checks. f he bets, he puts $2 more into the pot. f player 1 bets: (4) Player 2 either calls or folds. f he folds, he loses $1, no matter what card player 1 has. f he calls, he adds $2 to the pot. (5) f player 1 has hearts, he wins the pot. He also wins if he bets and player 2 folds. Otherwise player 2 wins. We draw a Kuhn tree for this game (Figure 1). To each vertex of the game tree, we attach a label indicating which payer is to move from that position. The random card that is deal in the beginning, we generally refer to as moves by nature and use the label N. At each terminal vertex, we write the numerical value of player 1 s winnings (= player 2 s losses, because we are in a zero-sum game). Player 2 does not know player 1 s card, that is when it is his turn to move, he does not know at which of his two possible positions he is. To indicate this on the diagram we are encirling the two positions in a closed curve to indicate that these two vertices constitute an infomation set. The two vertices at which player 1 is to move constitute two separate infomation sets since he has inspected the card, and know at which positions he is. n general, information sets could descibe situations in which one player has forgotten a move he has made earlier in the game or some information he once knew (see Player 1 s information set in figure 2). However, we do not allow this in our course. We only deal with games of perfect recall, which are games in which players remember all past information they once knew and all past moves they made. Formally, a game tree satisfies the perfect recall condition if, for all nodes x and y belonging to the same information h set belonging to Player j the following is true: The sequence of action names performed by Player j on the path from the root of the tree to x is identical to the sequence of action names performed by Player j from the root of the tree to y. Note that this also implies that the sequences of information sets belonging to that player encountered on those two paths coincide, as actions in different information sets are required to have different names. There is a conceptual reason for demanding the perfect recall condition: Arguably, if a player forgets information, this should not be part of a model of the game, it should be part of a model of the player. There is also a computational 3

Figure 1: Game tree reason: Most computational problems associated with game without perfect recall, such as value computation, turns out to be NP-hard. 3.1 Converting from the extensive form to the strategic form We can convert a game from extensive form to strategic form. This conversion procedure can be regarded as the definition of the semantics of the notion of an extensive form game. Definition 1 The strategic form game coresponding to an extensive form game is the following. Let K i be the set of information sets h belonging to player i in the extensive form game. Then we let the strategy space for player i in the strategic form game be the set S i = h Ki set of action names in h We also need to define the payoff functions. Note that a strategy profile (x 1, x 2,..., x l ) with x i S i can be viewed as a set of selected actions, one for each information set of the game. Given a strategy profile, we now consider the following random process: We put a pebble in the root of the game. f the 4

Figure 2: Game tree, not perfect recall pebble is in a position belonging to nature, we take a random sample from the probability distribution on the outgoing arcs indicated at the position, and move the pebble along the randomly chosen arc. f the pebble is in a position belonging to a player, we take the outgoing arc corresponding to the action chosen by the strategy profile. The payoff u i (x 1, x 2,..., x l ) for Player i is defined to be the expected value of the payoff of Player i found in the leaf of the tree where the pebble ends up. We can convert our Basic Endgame in Poker from extensive form to strategic form. Player 1 has two information sets, in each set he must make choice from among two options. He therefore has 2 2 = 4 pure strategies. We may denote them by (b,b): bet with a winning card or a losing card. (b,c): bet with a winning card, check with a losing card. (c,b): check with a winning card, bet with a losing card. (c,c): check with a winning card or a losing card. Therefore, S 1 = {(b, b), (b, c), (c, b), (c, c)}. Player 2 has only one information set. C: if player 1 bets, call. F: if player 1 bets, fold. 5

Therefore, S 2 = {(C, F )}. The payoff function on two strategies is the expected payoff when strategies are played against each other in the tree, as explained formally in the definition. Supose Player 1 uses (b,b) and Player 2 uses C. Then the expected payoff is u 1 ((b, b), C) = 1 4 (3) + 3 4 ( 3) = 3 2 This gives the upper left entry in the following matrix. The other entries may be computed similarly. C F (b, b) 3 1 2 (b, c) 1 2 (c, b) 2 1 (c, c) 1 1 2 2 n this example the payoff matrix is manageable. But in general, the blowup in size when going from extensive form to strategic form is exponential. Say, suppose Player 1 has 1 information sets, each with a choice between two actions. Then the number of rows in the matrix of the corresponding matrix game is 2 1. So, we often like to represent a game in extensive form. 3.2 Converting Extensive Form Games into Strategic Form As an example of such a conversion, we consider an example from last lecture, namely the basic endgame of poker. Figure 3 shows the game tree constructed last lecture. f we want to solve this game (finding the value and a maximin strategy for both players), one way to do so is to convert the game into strategic form and then solve it using linear programming. The corresponding strategic form (constructed last lecture) is given by the matrix C F b b 3 1 2 b c 1 2 c b 2 1 c c 1 1 2 2 When solving this game, one might first want to reduce the matrix by using the notion of dominance. We say that one row r 1 weakly dominates another 6

Figure 3: Extensive form of basic endgame of poker. non-identical row r 2 if each entry in r 1 is larger than or equal to the corresponding entry in r 2. ntuitively any probability mass put into r 2 by a strategy can be moved to r 1 instead since each entry gives at least the same payoff. t is therefore safe to remove the dominated row, since an optimal strategy not using the dominated row exists. For our matrix game we see that row 3 is weakly dominated by row 1 (a payoff of 3 is always better than 2 2 while the payoff of 1 does not change anything). We therefore remove row 3. Similarly, row 4 is weakly dominated by row 2. We end up with C F b b 3 1 2 b c 1 2 This game is easily solved using linear programming, and gives us ( 1, 5,, ) 6 6 (matching the four rows in the original matrix) as the optimal mixed strategy for Player 1 and ( 1, 1 ) as the optimal mixed strategy for Player 2. The value 2 2 of the game is 1. ntuitively 4 b c seems like the best strategy (bet when having a heart, check elsewise), and not surprisingly we therefore use this strategy 5 out of 6 times. However, it would not make sense to use this every time, since Player 2 would then change his strategy to always fold when Player 1 bets, causing Player 1 to lose more money. Player 1 therefore needs to bluff occasionally not very surprising to poker players. Since this example did not very well illustrate the fact that such a conversion gives an exponential blowup in the number of nodes in the tree, we consider another example. This time two players roll a die and Player 2 tries to get 7

Figure 4: A dice game in extensive form. a higher number than Player 1, who starts. The extensive form (or at least some of it) of the game is seen in Figure 4. Each state has six outgoing actions corresponding to each possible roll. Player 1 tells a number to Player 2 after studying his die (possible lying) and then Player 2 decides what to do. A corresponding matrix game would have a row for each possible pure strategy, thus giving 6 6 = 46656 rows in the matrix, as seen below. 1 1 1 (3) 1 (4) 1 (5) 1 (6) 1 1 1 (3) 1 (4) 1 (5) 2 (6). 6 6 6 (3) 6 (4) 6 (5) 6 (6) 3.3 Representing and Finding Solutions As seen in the previous subsection, converting from extensive form to strategic form gives an exponential blowup, thus possible resulting in an LP practically infeasible to solve. Another related problem is the representation of the result. For an n m game matrix, the optimal solution is given as an n-tuple with probabilities for each pure strategy (summing to 1) specifying the mixed strategy. This also gives the exponential blowup. We therefore seek other ways to both represent solutions and to find them. First we address the problem of giving the solution in a more compact way. Definition 2 A behavior strategy is a map from information sets of a player to probability distributions on actions of those information sets, or stated differently it is 8

an assignment of probabilities of actions belonging to a player (where they sum to 1 for each information set). This strategy corresponds to delaying the decision of which action to take until the involved information set is reached when traversing the game tree. See the red numbers on Figure 3 for a specific behavior strategy. Mixed strategies force us to consider all options from the beginning, giving us quite a few more possibilities. Playing the game according to the behavior strategy is done by traversing the game tree and letting each player take an action when reaching their information set according to the probability distribution on the actions belonging to the information set. The following theorem by Kuhn tells us that for games of perfect recall (no forgetful players), mixed and behavior strategies can express precisely the same strategies. Theorem 3 (Kuhn 1953) For an extensive form game of perfect recall of an arbitrary number of players, mixed strategies and behavior strategies are behaviorally equivalent. Here behaviorally equivalent means that playing a mixed or behavior strategy cannot be distinguished by somebody viewing from the outside. They simulate each other perfectly. Since the size of a behavior strategy is bounded by the number of edges in the tree, such strategies are preferred when dealing with games of extensive form. We have now represented the solution in a more compact way and move on to consider the following problem. Algorithmic problem Given two-player, zero-sum games in extensive form, compute value and maximin/minimax behavior strategies. We present here three possible algorithms, where only the last one is a polynomial time algorithm. Algorithm 1: 1. Convert to strategic form (exponential time). 9

2. Compute maximin/minimax mixed strategies (exponential time, since the size of the matrix is already exponential). 3. Convert to behaviorally equivalent maximin/minimax strategies (as given in the constructive proof of 3). Algorithm 1 uses the theory already known, but has exponential running time in the size of the game tree. Algorithm 2: 1. Write Nash equation conditions (for each information set) as a mathematical program, roughly the size of the tree. 2. Solve the program. This algorithm is somewhat better than Algorithm 1, since the program does not suffer from the exponential blowup. However, solving such games can be hard, since the resulting program is not linear: Variables of the program (the probabilities used for the behavior strategy) often are multiplied by each other, as seen in the toy example in Figure 5, where Player 1 has more than one choice along the path to γ and β. The Nash equations will therefore involve p D p d where p D is the behavior probability of D and d is the behavior probability of d. Such terms can consist of an arbitrary number of multiplications, corresponding to the number of choices along the path. 3.4 A Polynomial Time Algorithm The last algorithm is due to Keller, Megiddo and von Stengel. n order for this algorithm to work, we need to define two new helpful constructions, sequence form and realization plan. Definition 4 The sequence form of two-player, zero-sum extensive games is given by the following two items. Sets S i of sequences for each player, i = 1, 2. Formally a set of sequences for Player i is the set of all paths from the root to all other nodes, taking out the actions for Player i. 1

Figure 5: A toy example. A payoff matrix with a row for every σ S 1 and a column for every τ S 2. The entries a στ of the matrix is given by a στ = weight(l), where for a leaf l leaves l consistent with σ and τ weight(l) = payoff(l) e is chance edge on path from root to l Here p e is the probability of the chance edge e. This definition is best viewed through an example or two. Let us again consider the basic endgame of poker (Figure 3) and the toy game from Figure 5. For the poker game, we have S 1 = {ɛ, b, c, b, c}, S 2 = {ɛ, C, F }. as the sets of sequences. For the game in Figure 5 we get p e. S 1 = S 2 = {ɛ, D, U, Dd, Du}, {ɛ, L, R}. 11

For basic endgame of poker we get the following payoff matrix. ɛ C F ɛ b 3 1 4 4 c 1 4 b 9 4 c 3 4 n this example, each pair of sequences only leads to one leaf, so the sum consists of only one term for each entry. The pairs not leading to a leaf has the entry. Definition 5 A realisation plan for a player is an assignment of real number to his sequences, r : (S 1 S 2 ) R. This number is called the realisation weight of the sequence. The realisation plan corresponding to a behavior strategy assigns to each sequence the product of behavior probabilities of that sequence. One way to view realization weights is that they simply correspond to a change of variables (from the behavior strategy probabilities) that makes the non-linear program of Algorithm 2 into a linear one! As an example we again consult the game from Figure 5, and find the realisation weights of the two sequences Dd and D. The red numbers in the figure are the behavior probabilities. 3 4 r(dd) =.2.9 =.18, r(d) =.2, p(d) = r(dd) r(d) =.18.2 =.9, where p(d) denotes the probability given to the action d in the behavior strategy. Note that we can go back and forth between behavior strategies and realisation plans by simple multiplication and division (unless we divide by, but that will never be an issue, since the path containing this action will never be taken). The next lemma connects realization plans and behavior strategies. Lemma 6 For a two-player, zero-sum game in extensive form the following holds. 12

1. The set of realisation plans of Player 1 corresponding to some behavior strategy is a bounded non-empty polytope X = {x Ex = e, x }. 2. The set of realisation plans of Player 2 corresponding to some behavior strategy is a bounded non-empty polytope Y = {y F y = f, y }. 3. The expected payoff to Player 1 when he plays by x and Player 2 plays by y is x T Ay, where A is the sequence form payoff matrix. The matrices E and F and the vectors e and f are constructed using the fact that the probability mass entering a node must be equal to the probability mass leaving the node. For our game from Figure 5 we therefore have the following equations for Player 1. x ɛ = 1, x D + x U = x ɛ, x Dd + x Du = x D, x ɛ, x D, x U, x Dd, x Du. The first three lines corresponds to Ex = e. A formal proof of Lemma 6 is omitted since all three items are straightforward. t is, however, a very good exercise to go through the details of the proof and also verify a few examples! The next theorem follows naturally. Theorem 7 For a two-player, zero-sum game in extensive form with payoff matrix A (from the sequence form), the maximin realisation plan, r, is given by r = arg max min x X y Y xt Ay. Finally, we are ready to give Algorithm 3. Algorithm 3: (Koller, Megiddo, von Stengel 1996) 13

1. Convert the game to sequence form. n particular, compute the payoff matrix A and the matrices and vectors E, e, F, f defining the valid realization plans. 2. Compute the maximin expression of Theorem 7 using linear programming (possible due to the proof of the generalised maximin theorem). Since the number of sequences is linear in the number of nodes, we avoid the exponential blowup when constructing the payoff matrix. This gives us a polynomial time algorithm in the number of nodes, which is useful for solving games of extensive form. The existence of such an algorithm was an open problem for quite a while. Using the Algorithm we find the linear programs for our two running examples, see Table 1. Basic endgame of poker Game from Figure 5 Variables x ɛ, x b, x c, x b, x c (the realisation x ɛ, x D, x U, x Dd, x Du (the realisation weights) weights) q (the value) q (the value) q h (representing the contribution to q h (representing the contribution to the value from plays through the information set owned by Player 2, h) the value from plays through the information set owned by Player 2, h) Program max q max q subject to subject to ɛ : q q h + 1x 4 c 3x 4 c ɛ : q q h + α x u C : q h 3x 4 b 9x 4 b L : q h γ x Dd + β x Du F : q h 1x 4 b + 3x 4 b R : q h δ x D x ɛ = 1 x ɛ = 1 x b + x c = x ɛ x D + x U = x ɛ x b x c = x ɛ x Dd + x Du = x D x ɛ, x b, x c, x b, x c x ɛ, x D, x U, x Dd, x Du Table 1: Using Algorithm 3 on two examples. The intuition behind the first constraint in the poker game is that the value is bounded by the contribution to the value through h plus the contribution from not going through h (Player 1 taking action c and c). The same applies to the other example. 14

n general, the linear programs arising are quite intuitive. The reader is invited to try some more examples, in particular examples with more information sets belonging to Player 2. 3.5 Finding pure minimax behavior strategies Below is the game we considered in the last section. Chance 1 4 3 4 b 1 c b 1 6 c 5 6 1 1 C 1 2 F 1 2 C 1 2 F 1 2 3 1-3 1 Figure 6: Basic endgame in poker For this game we found the unique maximin/minimax strategies for player1/player2. These are shown as probabilities on the actions. n general we would like to find pure maximin/minimax strategies if they exist. As an example we now look at a game which is a slight modification of the above. n fact it is a more detailed model of the same real life game, where assume that Player 1 gets a random card out of a 24-card deck (9 up to Ace of each suit), and that any hearts are good for Player 1. The tree is too big to draw, but the top of it is shown in Figure 7. This game has several maximin behavior strategies. Some of them are pure. For instance, a maximin strategy for player 1 is: f bet f f ace bet f ace check 15

Chance A K 1 9.. same rules as above.. Figure 7: Modified poker game n this way we let the card decide the randomness we needed before where player 1 in the last case should bet with probability 1 and check with probability 5. n general we have the following: 6 6 Computational Problem 1 Given a two-player zero-sum extensive form game with perfect recall, does it have a pure maximin behavior strategy 1? We are going to show that the problem is NP-hard. n fact, it is strongly NP-hard. We remind what strongly NP-hard means: Definition 8 (Strongly NP-hard) A problem is strongly NP-hard if it is NP-hard when numbers in the instances are represented using unary notation rather than binary or decimal. (For example, unary(8)= 11111111) Proposition 9 Computational problem 1 is strongly NP-hard, even if chance nodes are restricted to uniform distribution. Proof The proof is done by a reduction from Exact Binpacking. Recall the definition of this problem: Exact Binpacking: Given positive integers a 1,..., a n and an integer K 2, can we partition {1, 2,..., n} in K parts, 1,... K such that i j a i is the same for all j? 1 We note here that in this context it does not make a big difference if we are talking about a plan or a strategy. The only difference is whether we specify behavior in every information set or not. n a plan we don t specify behavior in nodes that are not reachable with the given choices. 16

We know that this problem is NP-hard (in fact strongly NP-hard) so doing a reduction from this yields the result. n the reduction we use a gadget, M(K) which is the k k matrix game with payoffs everywhere, except at the diagonal where the payoff is -1. M(K) describes a game where player 1 thinks of a number between and K 1 and player 2 makes a guess about which number it is. f he guesses correct player 1 gives him a dollar. This game has value 1 and the maximin as well as the minimax strategy is the 3 uniform distribution on the strategy space (note by the way that this game is not symmetric. M is symmetric and for a game to be symmetric, its matrix must be skewsymmetric). We can also make an extensive form game strategically equivalent to M(K), illustrated here for K = 3. 1 2 1 2 1 2 1 2-1 -1-1 Figure 8: Extensive game equivalent to M(3) The reduction goes as follows: Given an instance of Exact Binpacking in Three Bins, A = {a 1,..., a n, K}, we construct the following game, G(A), illustrated in Figure 9, assuming K = 3. Lemma 1 G(A) has a pure maximin strategy A is a yes instance. Proof Assuming A is a yes instance, i.e. it can indeed be devided into three equally large parts, we want to convince ourselves that G(A) has a pure maximin strategy. This proof is the same as for the modified poker game example: Player 1 gets the randomness he needs from the bin corresponding to the item the chance node informs him of. Now we assume that G(A) has a pure maximin strategy and we have to construct a partition of A. We do this by putting items in bins matching player 1 s choice in the strategy. For example if in node where the edge a from the parent has probability j n the choice is we put item a i=1 ai j into bin and so on. The claim is, that this is a correct partition. Assume for 17

Chance a 1 ai a 2 ai a j ai a n 1 ai a n ai 1 2 1 2 1 2. 1 2.. 1 2. 1 2-1 -1-1 -1 Figure 9: Game tree for G(A) contradiction that it is not, that is ( i a i = i 1 a i = i 2 a i ). Then n i=1 a i j, wlog j = such that i a i >. But then the pure strategy of 3 player 2 can always choose which makes player 1 s payoff < 1. But this 3 means that it is not a pure maximin strategy, and we have the contradiction! With the proof of Lemma 1 we have completed the reduction and thereby the proof of the proposition. The following fact is not very difficult to show and is left as an exercise: Fact 11 (Hansen, Miltersen, Sørensen, COCOON 7) For two-player zero-sum games of perfect recall without chance nodes, existence of pure maximin strategies can be determined in linear time by a tree traversal. 18

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 3 Fall 2010