Extensive Form Games: Backward Induction and Imperfect Information Games

Size: px

Start display at page:

Download "Extensive Form Games: Backward Induction and Imperfect Information Games"

Gavin Barnett
6 years ago
Views:

1 Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10 Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 1

2 Lecture Overview 1 Recap 2 Backward Induction 3 Imperfect-Information Extensive-Form Games 4 Perfect Recall Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 2

3 I promised to revisit this Question: is there a problem having a i, a i A i in the constraint that is, are we requiring that the constraint hold in both directions? p(a)u i (a) p(a)u i (a i, a i ) i N, a i, a i A i a A a i a p(a) 0 p(a) = 1 a A a A a i a a A Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 3

4 I promised to revisit this Question: is there a problem having a i, a i A i in the constraint that is, are we requiring that the constraint hold in both directions? p(a)u i (a) p(a)u i (a i, a i ) i N, a i, a i A i a A a i a p(a) 0 p(a) = 1 a A a A a i a a A Answer: yes, it was wrong. The version above fixes the problem, changing the second sum so that it s identical to the first. Note that the constraint can equivalently be written as [u i (a) u i (a i, a i )]p(a) 0. a A a i a Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 3

5 Introduction The normal form game representation does not incorporate any notion of sequence, or time, of the actions of the players The extensive form is an alternative representation that makes the temporal structure explicit. Two variants: perfect information extensive-form games a game tree consisting of choice nodes and terminal nodes choice nodes labeled with players, and each outgoing edge labeled with an action for that player terminal nodes labeled with utilities imperfect-information extensive-form games we ll get to this today Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 4

6 Pure Strategies Overall, a pure strategy for a player in a perfect-information game is a complete specification of which deterministic action to take at every node belonging to that player. efinition Let G = (N, A, H, Z, χ, ρ, σ, u) be a perfect-information extensive-form game. Then the pure strategies of player i consist of the cross product χ(h) h H,ρ(h)=i Using this definition, we recover the old definitions of mixed strategies, best response, Nash equilibrium,... Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 5

7 0) Recap Backward Induction Imperfect-Information Extensive-Form Games Perfect Recall (2,0) (0,0) (1,1) (0,0) (0,2) Induced Normal Form Figure 5.1 The Sharing game. t the definition contains a subtlety. An agent s strategy requires a decision ce node, regardless of whether or not it is possible to reach that node given oice nodes. In the Sharing game above the situation is straightforward three pure strategies, and player 2 has eight (why?). But now consider the in Figure we can convert an extensive-form game into normal form C E F 1 (3,8) A (8,3) 1 B (5,5) 2 G H (2,10) (1,0) CE CF E F AG 3, 8 3, 8 8, 3 8, 3 AH 3, 8 3, 8 8, 3 8, 3 BG 5, 5 2, 10 5, 5 2, 10 BH 5, 5 1, 0 5, 5 1, 0 Figure 5.2 A perfect-information game in extensive form. define a complete strategy for this game, each of the players must choose each of his two choice nodes. Thus we can enumerate the pure strategies s as follows., G), (A, H), (B, G), (B, H)} Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 6

8 Subgame Perfection efine subgame of G rooted at h: the restriction of G to the descendents of H. efine set of subgames of G: subgames of G rooted at nodes in G s is a subgame perfect equilibrium of G iff for any subgame G of G, the restriction of s to G is a Nash equilibrium of G Notes: since G is its own subgame, every SPE is a NE. this definition rules out non-credible threats Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 7

9 Lecture Overview 1 Recap 2 Backward Induction 3 Imperfect-Information Extensive-Form Games 4 Perfect Recall Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 8

10 Centipede Game 5 Reasoning and Computing with the Extensive Form 1 A 2 A 1 A 2 A 1 A (3,5) (1,0) (0,2) (3,1) (2,4) (4,3) Figure 5.9 Play this as a fun game... The centipede game place. In other words, you have reached a state to which your analysis has given a probability of zero. How should you amend your beliefs and course of action based on this measure-zero event? It turns out this seemingly small inconvenience actually raises a fundamental problem in game theory. We will not develop the subject further here, but let us only mention that there exist different accounts of this situation, and they depend on the probabilistic assumptions made, on what is common knowledge (in Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 9

11 Recap than possiblybackward finding ainduction Nash equilibriumimperfect-information that involves non-credible Extensive-Form threats) Games but also Perfect Recall this procedure is computationally simple. In particular, it can be implemented as a single depth-first traversal of the game tree, and thus requires time linear in the size of the game representation. Recall in contrast that the best known methods for finding Nash equilibria of general games require time exponential in the size of the normal form; Idea: remember Identify as well thethat equilibria the inducedin normal theform bottom-most of an extensive-form trees, game and is adopt exponentially larger than the original representation. Computing Subgame Perfect Equilibria these as one moves up the tree function BACKWARINUCTION (node h) returns u(h) if h Z then return u(h) best util forall a χ(h) do util at child BACKWARINUCTION(σ(h, a)) if util at child ρ(h) > best util ρ(h) then best util util at child return best util // h is a terminal node Figure 5.6: Procedure for finding the value of a sample (subgame-perfect) Nash equilibrium of a perfect-information extensive-form game. The algorithm BACKWARINUCTION is described in Figure 5.6. The variable util at child labels is aeach vectornode denoting with the utility a vector for each ofplayer realat numbers. the child node; util at child ρ(h) denotes the element of this vector corresponding to the utility for player ρ(h) (the This labeling can be seen as an extension of the game s utility player who gets to move at node h). Similarly best util is a vector giving utilities for each player. function to the non-terminal nodes Observe thatthe this procedure equilibrium does not strategies: return an equilibrium take thestrategy best action for each at of the each node. n players, but rather describes how to label each node with a vector of n real numbers. This labeling can be seen as an extension of the game s utility function to the non- Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 10 util at child is a vector denoting the utility for each player the procedure doesn t return an equilibrium strategy, but rather

12 good news: not only are we guaranteed to find a subgame-perfect equilibrium (rather Recap Backward Induction Imperfect-Information Extensive-Form Games Perfect Recall than possibly finding a Nash equilibrium that involves non-credible threats) but also this procedure is computationally simple. In particular, it can be implemented as a single depth-first traversal of the game tree, and thus requires time linear in the size of the game representation. Recall in contrast that the best known methods for finding Nash equilibria of general games require time exponential in the size of the normal form; Idea: remember Identify as well thethat equilibria the inducedin normal theform bottom-most of an extensive-form trees, game and is adopt exponentially larger than the original representation. Computing Subgame Perfect Equilibria these as one moves up the tree function BACKWARINUCTION (node h) returns u(h) if h Z then return u(h) best util forall a χ(h) do util at child BACKWARINUCTION(σ(h, a)) if util at child ρ(h) > best util ρ(h) then best util util at child return best util // h is a terminal node Figure 5.6: Procedure for finding the value of a sample (subgame-perfect) Nash equilibrium of a perfect-information extensive-form game. For zero-sum games, BackwardInduction has another name: the minimax algorithm. The algorithm BACKWARINUCTION is described in Figure 5.6. The variable util at child ishere a vector it s denoting enough the utility to store for each one player number at the child pernode; node. util at child ρ(h) denotes the element of this vector corresponding to the utility for player ρ(h) (the It s possible to speed things up by pruning nodes that will player who gets to move at node h). Similarly best util is a vector giving utilities for each player. never be reached in play: alpha-beta pruning. Observe that this procedure does not return an equilibrium strategy for each of the n players, but rather describes how to label each node with a vector of n real numbers. Extensive ThisForm labeling Games: canbackward be seeninduction as an extension and Imperfect of the Information game s Games utility function tocpsc the non- 532A Lecture 10, Slide 10

13 Backward Induction Reasoning and Computing with the Extensive Form 1 A 2 A 1 A 2 A 1 A (3,5) (1,0) (0,2) (3,1) (2,4) (4,3) Figure 5.9 The centipede game What happens when we use this procedure on Centipede? In the only equilibrium, player 1 goes down in the first move. However, this outcome is Pareto-dominated by all but one place. In other words, you have reached a state to which your analysis has given a probability of zero. How should you amend your beliefs and course of action based on this measure-zero event? It turns out this seemingly small inconvenience actually raises other a fundamental outcome. problem in game theory. We will not develop the subject further here, but let us only mention that there exist different accounts of this situation, and they depend on the probabilistic assumptions made, on what is common knowledge (in particular, whether there is common knowledge of rationality), and on exactly how one practical: human subjects don t go down right away revises one s beliefs in the face of measure zero events. The last question is intimately related to the subject of belief revision discussed in Chapter 2. Two considerations: theoretical: what should you do as player 2 if player 1 doesn t go down? 5.2 Imperfect-information SPE analysis extensive-form says to go down. gameshowever, that same analysis Up to thissays point, that in our discussion P1 would of extensive-form already have games we gone have allowed down. players Howto do you specify the update action that your they would beliefs take at upon every choice observation node of theof game. a measure This implies zero event? that players know the node they are in, and recalling that in such games we equate nodes with but the histories if player that led 1 knows to them all that the prior you ll choices, doincluding something those ofelse, other it is agents. For rational this reasonfor we have himcalled nothese to perfect-information go down anymore... games. a paradox We might not always want to make such a strong assumption about our players and our environment. there s In many a whole situations literature we may want onto model this question agents needing to act with partial or no knowledge of the actions taken by others, or even agents with limited Extensive Form Games: memory Backward of their Induction own past andactions. Imperfect TheInformation sequencing of Games choices allows us to CPSC represent 532A Lecture 10, Slide 11

14 Lecture Overview 1 Recap 2 Backward Induction 3 Imperfect-Information Extensive-Form Games 4 Perfect Recall Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 12

15 Intro Up to this point, in our discussion of extensive-form games we have allowed players to specify the action that they would take at every choice node of the game. This implies that players know the node they are in and all the prior choices, including those of other agents. We may want to model agents needing to act with partial or no knowledge of the actions taken by others, or even themselves. This is possible using imperfect information extensive-form games. each player s choice nodes are partitioned into information sets if two choice nodes are in the same information set then the agent cannot distinguish between them. Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 13

16 Formal definition efinition An imperfect-information game (in extensive form) is a tuple (N, A, H, Z, χ, ρ, σ, u, I), where (N, A, H, Z, χ, ρ, σ, u) is a perfect-information extensive-form game, and I = (I 1,..., I n ), where I i = (I i,1,..., I i,ki ) is an equivalence relation on (that is, a partition of) {h H : ρ(h) = i} with the property that χ(h) = χ(h ) and ρ(h) = ρ(h ) whenever there exists a j for which h I i,j and h I i,j. Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 14

17 layer Recap would bebackward able toinduction distinguishimperfect-information the nodes). Thus, Extensive-Form if I Games I i is an equivalence Perfect Recall clas e can unambiguously use the notation χ(i) to denote the set of actions available layer Example i at any node in information set I. 1 L R 2 2 (1,1) A B 1 l r l r (0,0) (2,4) (2,4) (0,0) Figure 5.10 An imperfect-information game. What are the equivalence classes for each player? What are the pure strategies for each player? Consider the imperfect-information extensive-form game shown in Figure I his game, player 1 has two information sets: the set including the top choice node, an he set including the bottom choice nodes. Note that the two bottom choice nodes he second information set have the same set of possible actions. We can regard play as Extensive not knowing Form Games: whether Backward Induction player and2imperfect choseinformation A or BGames when she makes CPSC 532A herlecture choice 10, Slide betwee 15

18 layer Recap would bebackward able toinduction distinguishimperfect-information the nodes). Thus, Extensive-Form if I Games I i is an equivalence Perfect Recall clas e can unambiguously use the notation χ(i) to denote the set of actions available layer Example i at any node in information set I. 1 L R 2 2 (1,1) A B 1 l r l r (0,0) (2,4) (2,4) (0,0) Figure 5.10 An imperfect-information game. What are the equivalence classes for each player? What are the pure strategies for each player? Consider the imperfect-information extensive-form game shown in Figure I choice of an action in each equivalence class. his game, player 1 has two information sets: the set including the top choice node, an he set including Formally, the bottom the pure choice strategies nodes. of Note player that i the consist two of bottom the cross choice nodes he second information product Ii,j set I have i χ(i the i,j ). same set of possible actions. We can regard play as Extensive not knowing Form Games: whether Backward Induction player and2imperfect choseinformation A or BGames when she makes CPSC 532A herlecture choice 10, Slide betwee 15

19 Normal-form games 5 Reasoning and Computing with the Extensive We can represent any normal form game. 1 C 2 c d c d (-1,-1) (-4,0) (0,-4) (-3,-3) Figure Note5.11 that it The would Prisoner s also beilemma the samegame if we in put extensive player 2 form. at the root node. ecall that perfect-information games were not expressive enough to captu soner s Extensive Form ilemma Games: Backward gameinduction and many and Imperfect other Information ones. Games In contrast, CPSC as is 532A obvious Lecture 10, from Slide 16 th

20 Induced Normal Form Same as before: enumerate pure strategies for all agents Mixed strategies are just mixtures over the pure strategies as before. Nash equilibria are also preserved. Note that we ve now defined both mapping from NF games to IIEF and a mapping from IIEF to NF. what happens if we apply each mapping in turn? we might not end up with the same game, but we do get one with the same strategy spaces and equilibria. Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 17

21 Randomized Strategies It turns out there are two meaningfully different kinds of randomized strategies in imperfect information extensive form games mixed strategies behavioral strategies Mixed strategy: randomize over pure strategies Behavioral strategy: independent coin toss every time an information set is encountered Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 18

22 Figure 5.1 The Sharing game. Recap Backward Induction Imperfect-Information Extensive-Form Games Perfect Recall Randomized strategies example Notice that the definition contains a subtlety. An agent s strategy requires a decision at each choice node, regardless of whether or not it is possible to reach that node given the other choice nodes. In the Sharing game above the situation is straightforward player 1 has three pure strategies, and player 2 has eight (why?). But now consider the game shown in Figure A 1 B 2 C E F 1 (3,8) (8,3) (5,5) G H (2,10) (1,0) Figure 5.2 A perfect-information game in extensive form. In order to define a complete strategy for this game, each of the players must choose an action at each of his two choice nodes. Thus we can enumerate the pure strategies of the players as follows. Give an example of a behavioral strategy: S 1 = {(A, G), (A, H), (B, G), (B, H)} S 2 = {(C, E), (C, F ), (, E), (, F )} It is important to note that we have to include the strategies (A, G) and (A, H), even though once A is chosen the G-versus-H choice is moot. The definition of best response and Nash equilibria in this game are exactly as they are in for normal form games. Indeed, this example illustrates how every perfectinformation game can be converted to an equivalent normal form game. For example, the perfect-information game of Figure 5.2 can be converted into the normal form image of the game, shown in Figure 5.3. Clearly, the strategy spaces of the two games are Multi Agent Systems, draft of September 19, 2006 Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 19

23 Figure 5.1 The Sharing game. Recap Backward Induction Imperfect-Information Extensive-Form Games Perfect Recall Randomized strategies example Notice that the definition contains a subtlety. An agent s strategy requires a decision at each choice node, regardless of whether or not it is possible to reach that node given the other choice nodes. In the Sharing game above the situation is straightforward player 1 has three pure strategies, and player 2 has eight (why?). But now consider the game shown in Figure A 1 B 2 C E F 1 (3,8) (8,3) (5,5) G H (2,10) (1,0) Figure 5.2 A perfect-information game in extensive form. In order to define a complete strategy for this game, each of the players must choose an action at each of his two choice nodes. Thus we can enumerate the pure strategies of the players as follows. Give an example of a behavioral strategy: A with probability.5 and G with probability.3 S 1 = {(A, G), (A, H), (B, G), (B, H)} Give an Sexample 2 = {(C, E), (C, of F ), (, a mixed E), (, F )} strategy that is not a behavioral strategy: It is important to note that we have to include the strategies (A, G) and (A, H), even though once A is chosen the G-versus-H choice is moot. The definition of best response and Nash equilibria in this game are exactly as they are in for normal form games. Indeed, this example illustrates how every perfectinformation game can be converted to an equivalent normal form game. For example, the perfect-information game of Figure 5.2 can be converted into the normal form image of the game, shown in Figure 5.3. Clearly, the strategy spaces of the two games are Multi Agent Systems, draft of September 19, 2006 Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 19

24 Figure 5.1 The Sharing game. Recap Backward Induction Imperfect-Information Extensive-Form Games Perfect Recall Randomized strategies example Notice that the definition contains a subtlety. An agent s strategy requires a decision at each choice node, regardless of whether or not it is possible to reach that node given the other choice nodes. In the Sharing game above the situation is straightforward player 1 has three pure strategies, and player 2 has eight (why?). But now consider the game shown in Figure A 1 B 2 C E F 1 (3,8) (8,3) (5,5) G H (2,10) (1,0) Figure 5.2 A perfect-information game in extensive form. In order to define a complete strategy for this game, each of the players must choose an action at each of his two choice nodes. Thus we can enumerate the pure strategies of the players as follows. Give an example of a behavioral strategy: A with probability.5 and G with probability.3 S 1 = {(A, G), (A, H), (B, G), (B, H)} Give an Sexample 2 = {(C, E), (C, of F ), (, a mixed E), (, F )} strategy that is not a behavioral strategy: It is important to note that we have to include the strategies (A, G) and (A, H), even though once A is chosen the G-versus-H choice is moot. (.6(A, The definition G),.4(B, of best response H)) and (why Nash equilibria not?) in this game are exactly as they are in for normal form games. Indeed, this example illustrates how every perfectinformation game can be converted to an equivalent normal form game. For example, the perfect-information game of Figure 5.2 can be converted into the normal form image of the game, shown in Figure 5.3. Clearly, the strategy spaces of the two games are In this game every behavioral strategy corresponds to a mixed strategy... Multi Agent Systems, draft of September 19, 2006 Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 19

25 Games of imperfect recall Imagine that player 1 sends two proxies to the game with the same strategies. When one arrives, he doesn t know if the other has arrived before him, or if he s the first one. 5.2 Imperfect-information extensive-form games 121 L 1 L R 2 U R 1,0 100,100 5,1 2,2 Figure 5.12 A game with imperfect recall What is the space of pure strategies in this game? librium. Note in particular that in a mixed strategy, agent 1 decides probabilistically whether to play L or R in his information set, but once he decides he plays that pure strategy consistently. Thus the payoff of 100 is irrelevant in the context of mixed strategies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh each time he finds himself in the information set. Noting that the pure strategy is weakly dominant for agent 2 (and in fact is the unique best response to all strategies of agent 1 other than the pure strategy L), agent 1 computes the best response to as follows. If he uses the behavioral strategy (p, 1 p) (that is, choosing L with probability p each time he finds himself in the information set), his expected payoff is 1 p p(1 p) + 2 (1 p) The expression simplifies to 99p p + 2, whose maximum is obtained at p = Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 20 98/198. Thus (R,) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strate-

26 Games of imperfect recall Imagine that player 1 sends two proxies to the game with the same strategies. When one arrives, he doesn t know if the other has arrived before him, or if he s the first one. 5.2 Imperfect-information extensive-form games 121 L 1 L R 2 U R 1,0 100,100 5,1 2,2 Figure 5.12 A game with imperfect recall What is the space of pure strategies in this game? 1: (L, R); 2: (U, ) librium. Note in particular that in a mixed strategy, agent 1 decides probabilistically whether to play L or R in his information set, but once he decides he plays that pure strategy consistently. Thus the payoff of 100 is irrelevant in the context of mixed strategies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh each time he finds himself in the information set. Noting that the pure strategy is weakly dominant for agent 2 (and in fact is the unique best response to all strategies of agent 1 other than the pure strategy L), agent 1 computes the best response to as follows. If he uses the behavioral strategy (p, 1 p) (that is, choosing L with probability p each time he finds himself in the information set), his expected payoff is 1 p p(1 p) + 2 (1 p) The expression simplifies to 99p p + 2, whose maximum is obtained at p = Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 20 98/198. Thus (R,) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strate-

27 Games of imperfect recall Imagine that player 1 sends two proxies to the game with the same strategies. When one arrives, he doesn t know if the other has arrived before him, or if he s the first one. 5.2 Imperfect-information extensive-form games 121 L 1 L R 2 U R 1,0 100,100 5,1 2,2 Figure 5.12 A game with imperfect recall What is the space of pure strategies in this game? 1: (L, R); 2: (U, ) What is the mixed strategy equilibrium? librium. Note in particular that in a mixed strategy, agent 1 decides probabilistically whether to play L or R in his information set, but once he decides he plays that pure strategy consistently. Thus the payoff of 100 is irrelevant in the context of mixed strategies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh each time he finds himself in the information set. Noting that the pure strategy is weakly dominant for agent 2 (and in fact is the unique best response to all strategies of agent 1 other than the pure strategy L), agent 1 computes the best response to as follows. If he uses the behavioral strategy (p, 1 p) (that is, choosing L with probability p each time he finds himself in the information set), his expected payoff is 1 p p(1 p) + 2 (1 p) The expression simplifies to 99p p + 2, whose maximum is obtained at p = Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 20 98/198. Thus (R,) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strate-

28 Games of imperfect recall Imagine that player 1 sends two proxies to the game with the same strategies. When one arrives, he doesn t know if the other has arrived before him, or if he s the first one. 5.2 Imperfect-information extensive-form games 121 L 1 L R 2 U R 1,0 100,100 5,1 2,2 Figure 5.12 A game with imperfect recall What is the librium. space Note in particular of pure that in a mixed strategies strategy, agent 1indecides this probabilistically game? whether to play L or R in his information set, but once he decides he plays that pure 1: (L, strategy R); consistently. 2: (U, Thus ) the payoff of 100 is irrelevant in the context of mixed strategies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh each time he finds himself in the information set. Noting that the pure strategy is What is the mixed strategy equilibrium? weakly dominant for agent 2 (and in fact is the unique best response to all strategies of agent 1 other than the pure strategy L), agent 1 computes the best response to as follows. If that he uses the behavioral is dominant strategy (p, 1 p) for (that2. is, choosing R, L with isprobability better for 1 than Observe p each time he finds himself in the information set), his expected payoff is L,, so R, is an equilibrium. 1 p p(1 p) + 2 (1 p) The expression simplifies to 99p p + 2, whose maximum is obtained at p = Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 20 98/198. Thus (R,) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strate-

29 Games of imperfect recall Imagine that player 1 sends two proxies to the game with the same strategies. When one arrives, he doesn t know if the other has arrived before him, or if he s the first one. 5.2 Imperfect-information extensive-form games 121 L 1 L R 2 U R 1,0 100,100 5,1 2,2 Figure 5.12 A game with imperfect recall What is the librium. space Note in particular of pure that in a mixed strategies strategy, agent 1indecides this probabilistically game? whether to play L or R in his information set, but once he decides he plays that pure 1: (L, strategy R); consistently. 2: (U, Thus ) the payoff of 100 is irrelevant in the context of mixed strategies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh each time he finds himself in the information set. Noting that the pure strategy is What is the mixed strategy equilibrium? weakly dominant for agent 2 (and in fact is the unique best response to all strategies of agent 1 other than the pure strategy L), agent 1 computes the best response to as follows. If that he uses the behavioral is dominant strategy (p, 1 p) for (that2. is, choosing R, L with isprobability better for 1 than Observe p each time he finds himself in the information set), his expected payoff is L,, so R, is an equilibrium. 1 p p(1 p) + 2 (1 p) The expression simplifies to 99p p + 2, whose maximum is obtained at p = Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 20 98/198. Thus (R,) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strate-

30 Games of imperfect recall 5.2 Imperfect-information extensive-form games 121 L 1 L R 2 U R 1,0 100,100 5,1 2,2 Figure 5.12 A game with imperfect recall What is an equilibrium in behavioral strategies? librium. Note in particular that in a mixed strategy, agent 1 decides probabilistically whether to play L or R in his information set, but once he decides he plays that pure strategy consistently. Thus the payoff of 100 is irrelevant in the context of mixed strategies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh each time he finds himself in the information set. Noting that the pure strategy is weakly dominant for agent 2 (and in fact is the unique best response to all strategies of agent 1 other than the pure strategy L), agent 1 computes the best response to as follows. If he uses the behavioral strategy (p, 1 p) (that is, choosing L with probability p each time he finds himself in the information set), his expected payoff is 1 p p(1 p) + 2 (1 p) The expression simplifies to 99p p + 2, whose maximum is obtained at p = 98/198. Thus (R,) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strategies, and instead we get the equilibrium ((98/198, 100/198), (0, 1)). There is, however, a broad class of imperfect-information games in which the expressive power of mixed and behavioral strategies coincides. This is the class of games of perfect recall. Intuitively speaking, in these games no player forgets any information he knew about moves made so far; in particular, he remembers precisely all his own Extensive Form Games: Backward moves. Induction Formally: and Imperfect Information Games CPSC 532A Lecture 10, Slide 21

31 Games of imperfect recall 5.2 Imperfect-information extensive-form games 121 L 1 L R 2 U R 1,0 100,100 5,1 2,2 Figure 5.12 A game with imperfect recall What is an equilibrium in behavioral strategies? librium. Note in particular that in a mixed strategy, agent 1 decides probabilistically whether to play L or R in his information set, but once he decides he plays that pure again, strategy strongly consistently. Thus dominant the payoff of 100 isfor irrelevant 2 in the context of mixed strategies. On other hand, with behavioral strategies agent 1 gets to randomize afresh if 1 uses each time the he finds behavioural himself in the information strategy set. Noting(p, that 1the pure p), strategy his isexpected weakly dominant for agent 2 (and in fact is the unique best response to all strategies of utility is 1 p p(1 p) + 2 (1 p) agent 1 other than the pure strategy L), agent 1 computes the best response to as follows. If he uses the behavioral strategy (p, 1 p) (that is, choosing L with probability simplifies to 99p p + 2 p each time he finds himself in the information set), his expected payoff is maximum at p = 198/198 p p(1 p) + 2 (1 p) thus equilibrium is (98/198, 100/198), (0, 1) The expression simplifies to 99p p + 2, whose maximum is obtained at p = 98/198. Thus (R,) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strategies, and instead we get the equilibrium ((98/198, 100/198), (0, 1)). Thus, we can have behavioral strategies that are different There is, however, a broad class of imperfect-information games in which the expressive power of mixed and behavioral strategies coincides. This is the class of games from mixed of perfect strategies. recall. Intuitively speaking, in these games no player forgets any information he knew about moves made so far; in particular, he remembers precisely all his own Extensive Form Games: Backward moves. Induction Formally: and Imperfect Information Games CPSC 532A Lecture 10, Slide 21

32 Lecture Overview 1 Recap 2 Backward Induction 3 Imperfect-Information Extensive-Form Games 4 Perfect Recall Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 22

33 Perfect Recall: mixed and behavioral strategies coincide No player forgets anything he knew about moves made so far. efinition Player i has perfect recall in an imperfect-information game G if for any two nodes h, h that are in the same information set for player i, for any path h 0, a 0, h 1, a 1, h 2,..., h n, a n, h from the root of the game to h (where the h j are decision nodes and the a j are actions) and any path h 0, a 0, h 1, a 1, h 2,..., h m, a m, h from the root to h it must be the case that: 1 n = m 2 For all 0 j n, h j and h j are in the same equivalence class for player i. 3 For all 0 j n, if ρ(h j ) = i (that is, h j is a decision node of player i), then a j = a j. G is a game of perfect recall if every player has perfect recall in it. Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 23

34 Perfect Recall Clearly, every perfect-information game is a game of perfect recall. Theorem (Kuhn, 1953) In a game of perfect recall, any mixed strategy of a given agent can be replaced by an equivalent behavioral strategy, and any behavioral strategy can be replaced by an equivalent mixed strategy. Here two strategies are equivalent in the sense that they induce the same probabilities on outcomes, for any fixed strategy profile (mixed or behavioral) of the remaining agents. Corollary In games of perfect recall the set of Nash equilibria does not change if we restrict ourselves to behavioral strategies. Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 24

35 Computing Equilibria of Games of Perfect Recall How can we find an equilibrium of an imperfect information extensive form game? One idea: convert to normal form, and use techniques described earlier. Problem: exponential blowup in game size. Alternative (at least for perfect recall): sequence form for zero-sum games, computing equilibrium is polynomial in the size of the extensive form game exponentially faster than the LP formulation we saw before for general-sum games, can compute equilibrium in time exponential in the size of the extensive form game again, exponentially faster than converting to normal form Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10, Slide 25

Extensive Form Games: Backward Induction and Imperfect Information Games

Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture 10 October 12, 2006 Extensive Form Games: Backward Induction and Imperfect Information Games CPSC 532A Lecture