Game Theory: Normal Form Games CPSC 322 Lecture 34 April 3, 2006 Reading: excerpt from Multiagent Systems, chapter 3. Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 1
Lecture Overview Recap Game Theory Example Matrix Games Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 2
Rewards and Values Suppose the agent receives the sequence of rewards r 1, r 2, r 3, r 4,.... What value should be assigned? total reward V = i=1 r i average reward V = lim n discounted reward V = r 1 + + r n n i=1 γi 1 r i γ is the discount factor 0 γ 1 Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 3
Policies A stationary policy is a function: π : S A Given a state s, π(s) specifies what action the agent who is following π will do. An optimal policy is one with maximum expected value we ll focus on the case where value is defined as discounted reward. For an MDP with stationary dynamics and rewards with infinite or indefinite horizon, there is always an optimal stationary policy in this case. Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 4
Value of a Policy Q π (s, a), where a is an action and s is a state, is the expected value of doing a in state s, then following policy π. V π (s), where s is a state, is the expected value of following policy π in state s. Q π and V π can be defined mutually recursively: V π (s) = Q π (s, π(s)) Q π (s, a) = s P (s a, s) ( r(s, a, s ) + γv π (s ) ) Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 5
Value of the Optimal Policy Q (s, a), where a is an action and s is a state, is the expected value of doing a in state s, then following the optimal policy. V (s), where s is a state, is the expected value of following the optimal policy in state s. Q and V can be defined mutually recursively: Q (s, a) = s P (s a, s) ( r(s, a, s ) + γv (s ) ) V (s) = max Q (s, a) a π (s) = arg max Q (s, a) a Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 6
Value Iteration Idea: Given an estimate of the k-step lookahead value function, determine the k + 1 step lookahead value function. Set V 0 arbitrarily. e.g., zeros Compute Q i+1 and V i+1 from V i : Q i+1 (s, a) = s P (s a, s) ( r(s, a, s ) + γv i (s ) ) V i+1 (s) = max Q i+1 (s, a) a If we intersect these equations at Q i+1, we get an update equation for V : V i+1 (s) = max P (s ( a, s) r(s, a, s ) + γv i (s ) ) a s Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 7
Asynchronous VI: storing Q[s, a] Repeat forever: Select state s, action a; ( ) Q[s, a] P (s s, a) R(s, a, s ) + γ max Q[s, a ] ; a s Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 8
Lecture Overview Recap Game Theory Example Matrix Games Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 9
Non-Cooperative Game Theory What is it? Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 10
Non-Cooperative Game Theory What is it? mathematical study of interaction between rational, self-interested agents Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 10
Non-Cooperative Game Theory What is it? mathematical study of interaction between rational, self-interested agents Why is it called non-cooperative? Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 10
Non-Cooperative Game Theory What is it? mathematical study of interaction between rational, self-interested agents Why is it called non-cooperative? while it s most interested in situations where agents interests conflict, it s not restricted to these settings the key is that the individual is the basic modeling unit, and that individuals pursue their own interests cooperative/coalitional game theory has teams as the central unit, rather than agents You can think of a non-cooperative game as a decision diagram where different agents control different decision nodes, and where each agent has his own utility node. Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 10
TCP Backoff Game Should you send your packets using correctly-implemented TCP (which has a backoff mechanism) or using a defective implementation (which doesn t)? Consider this situation as a two-player game: both use a correct implementation: both get 1 ms delay one correct, one defective: 4 ms delay for correct, 0 ms for defective both defective: both get a 3 ms delay. Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 11
TCP Backoff Game Consider this situation as a two-player game: both use a correct implementation: both get 1 ms delay one correct, one defective: 4 ms delay for correct, 0 ms for defective both defective: both get a 3 ms delay. Questions: What action should a player of the game take? Would all users behave the same in this scenario? What global patterns of behaviour should the system designer expect? Under what changes to the delay numbers would behavior be the same? What effect would communication have? Repetitions? (finite? infinite?) Does it matter if I believe that my opponent is rational? Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 11
Defining Games Finite, n-person game: N, A, u : N is a finite set of n players, indexed by i A = A1,..., A n is a set of actions for each player i a A is an action profile u = {u 1,..., u n }, a utility function for each player, where u i : A R Writing a 2-player game as a matrix: row player is player 1, column player is player 2 rows are actions a A1, columns are a A 2 cells are outcomes, written as a tuple of utility values for each player Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 12
Lecture Overview Recap Game Theory Example Matrix Games Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 13
d D (for using a Defective one). If both you and your colleague verage packet delay is 1ms (millisecond). If you both adopt D the segames of additional in Matrix overhead at Form the network router. Finally, if one of e other adopts C then the D adopter will experience no delay at all, ill experience a delay of 4ms. ces arehere s shownthe in Figure TCP Backoff 3.1. YourGame options written are theas two a matrix rows, and ( normal form ) tions are andthe ascolumns. a decisionin network. each cell, the first number represents us your delay), and the second number represents your colleague s Recap Game Theory Example Matrix Games C D Action by Player 1 Action by Player 2 C 1, 1 4,0 D 0, 4 3, 3 P1 Utility P2 Utility ure 3.1 The TCP user s (aka the Prisoner s) Dilemma. ns what should you adopt, C or D? Does it depend on what you e will do? Furthermore, from the perspective of the network operaavior can he expect from the two users? Will any two users behave Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 14
d D (for using a Defective one). If both you and your colleague verage packet delay is 1ms (millisecond). If you both adopt D the segames of additional in Matrix overhead at Form the network router. Finally, if one of e other adopts C then the D adopter will experience no delay at all, ill experience a delay of 4ms. ces arehere s shownthe in Figure TCP Backoff 3.1. YourGame options written are theas two a matrix rows, and ( normal form ) tions are andthe ascolumns. a decisionin network. each cell, the first number represents us your delay), and the second number represents your colleague s Recap Game Theory Example Matrix Games C D Action by Player 1 Action by Player 2 C 1, 1 4,0 D 0, 4 3, 3 P1 Utility P2 Utility ure 3.1 The TCP user s (aka the Prisoner s) Dilemma. Play this game with someone near you, repeating five times. ns what should you adopt, C or D? Does it depend on what you e will do? Furthermore, from the perspective of the network operaavior can he expect from the two users? Will any two users behave Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 14
More General Form 3 Competition and Coordination: Normal form games Prisoner s dilemma is any game C D C a,a b,c D c,b d,d Figure 3.3 Any c > a > d > b define an instance of Prisoner s Dilemma. with c > a > d > b. To fully understand the role of the payoff numbers we would need to enter into a discussion of utility theory. Here, let us just mention that for most purposes, the analysis of any game is unchanged if the payoff numbers undergo any positive affine Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 15
Games of Pure Competition Players have exactly opposed interests There must be precisely two players (otherwise they can t have exactly opposed interests) For all action profiles a A, u 1 (a) + u 2 (a) = c for some constant c Special case: zero sum Thus, we only need to store a utility function for one player Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 16
the abbreviation we must explicit state whether this matrix represents a common-payoff game or a zero-sum one. Matching A classical Pennies example of a zero-sum game is the game of matching pennies. In this game, each of the two players has a penny, and independently chooses to display either heads or tails. The two players then compare their pennies. If they are the same then player 1 pockets both, and otherwise player 2 pockets them. The payoff matrix is shown One in Figure player 3.5. wants to match; the other wants to mismatch. Heads Tails Heads 1 1 Tails 1 1 Figure 3.5 Matching Pennies game. The popular children s game of Rock, Paper, Scissors, also known as Rochambeau, provides a three-strategy generalization of the matching-pennies game. The payoff matrix of this zero-sum game is shown in Figure 3.6. In this game, each of the two players can choose either Rock, Paper, or Scissors. If both players choose the same Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 17
the abbreviation we must explicit state whether this matrix represents a common-payoff game or a zero-sum one. Matching A classical Pennies example of a zero-sum game is the game of matching pennies. In this game, each of the two players has a penny, and independently chooses to display either heads or tails. The two players then compare their pennies. If they are the same then player 1 pockets both, and otherwise player 2 pockets them. The payoff matrix is shown One in Figure player 3.5. wants to match; the other wants to mismatch. Heads Tails Heads 1 1 Tails 1 1 Figure 3.5 Matching Pennies game. Play this game with someone near you, repeating five times. The popular children s game of Rock, Paper, Scissors, also known as Rochambeau, provides a three-strategy generalization of the matching-pennies game. The payoff matrix of this zero-sum game is shown in Figure 3.6. In this game, each of the two players can choose either Rock, Paper, or Scissors. If both players choose the same Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 17
Rock-Paper-Scissors 3 Competition and Coordination: Normal form games Generalized matching pennies. Rock Paper Scissors Rock 0 1 1 Paper 1 0 1 Scissors 1 1 0 Figure 3.6 Rock, Paper, Scissors game....believe it or not, there s an annual international competition for this game! VG GL VG 2,1 0,0 Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 18
Games of Cooperation Players have exactly the same interests. no conflict: all players want the same things a A, i, j, u i (a) = u j (a) we often write such games with a single payoff per cell why are such games noncooperative? Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 19
Game Theory: Normal Form Games c Shoham and Leyton-Brown, 2006 CPSC 322 Lecture 34, Slide 20 the agents have no conflicting interests; their sole challenge is to coordinate on an Recap Game Theory Example Matrix Games action that is maximally beneficial to all. Coordination Because of theirgame special nature, we often represent common value games with an abbreviated form of the matrix in which we list only one payoff in each of the cells. As an example, imagine two drivers driving towards each other in a country without traffic rules, and who must independently decide whether to drive on the left or on the right. If the players choose the same side (left or right) they have some high utility, and otherwise Which they side have ofathe lowroad utility. should The game you matrix drive on? is shown in Figure 3.4. Left Right Left 1 0 Right 0 1 Figure 3.4 Coordination game. At the other end of the spectrum from pure coordination games lie zero-sum games, which (bearing in mind the comment we made earlier about positive affine transformations) are more properly called constant-sum games. Unlike common-payoff games,
the agents have no conflicting interests; their sole challenge is to coordinate on an Recap Game Theory Example Matrix Games action that is maximally beneficial to all. Coordination Because of theirgame special nature, we often represent common value games with an abbreviated form of the matrix in which we list only one payoff in each of the cells. As an example, imagine two drivers driving towards each other in a country without traffic rules, and who must independently decide whether to drive on the left or on the right. If the players choose the same side (left or right) they have some high utility, and otherwise Which they side have ofathe lowroad utility. should The game you matrix drive on? is shown in Figure 3.4. Left Right Left 1 0 Right 0 1 Figure 3.4 Coordination game. Play this game with someone near you, repeating five times. At the other end of the spectrum from pure coordination games lie zero-sum games, which (bearing in mind the comment we made earlier about positive affine transformations) are more properly called constant-sum games. Unlike common-payoff games, c Shoham and Leyton-Brown, 2006 Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 20
Rock 0 1 1 Recap Game Theory Example Matrix Games We have so far defined the actions available to each player in a game, but not yet his Game set of Theory: strategies, Normal FormorGames his available choices. Certainly one kind of CPSC strategy 322 Lecture is to 34, select Slide 21 General Games: Battle of the Sexes Paper 1 0 1 Scissors 1 1 0 The most interesting games combine elements of cooperation and competition. Figure 3.6 Rock, Paper, Scissors game. B F B 2,1 0,0 F 0,0 1,2 Figure 3.7 Battle of the Sexes game. Strategies in normal-form games
Rock 0 1 1 Recap Game Theory Example Matrix Games General Games: Battle of the Sexes Paper 1 0 1 Scissors 1 1 0 The most interesting games combine elements of cooperation and competition. Figure 3.6 Rock, Paper, Scissors game. B F B 2,1 0,0 F 0,0 1,2 Figure 3.7 Battle of the Sexes game. Play this game with someone near you, repeating five times. Strategies in normal-form games We have so far defined the actions available to each player in a game, but not yet his set of strategies, or his available choices. Certainly one kind of strategy is to select Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 21