Prisoner 2 Confess Remain Silent Confess (-5, -5) (0, -20) Remain Silent (-20, 0) (-1, -1)

Session 14 Two-person non-zero-sum games of perfect information The analysis of zero-sum games is relatively straightforward because for a player to maximize its utility is equivalent to minimizing the winnings of the other player. We will now move away from these confrontational games and consider games where the interests of the players are at least partially aligned (i.e., they can benefit without necessarily making their opponents worse off). This type of games sometimes leads to unexpected conclusions; in particular, we might need to stop thinking about Nash equilibria as providing optimal solutions for the game. To motivate our move, consider the famous prisoner dilemma (do you ever watch Law & Order on TV?) Two men suspected of committing a crime together are arrested and placed in separate interrogation rooms. Each suspect my either confess or remain silent, and each one knows the consequences of his actions. If one suspect confesses and his partner does not, the one who confessed turns state s evidence and goes free while the other goes to jail for twenty years. On the other hand, if both suspects confess, the both of them go to jail for five years. Finally, if both suspects remain silent, they both go to jail for a year for a lesser charge. Assuming that each criminal only cares for his of her own well-being, the payoffs can be summarized in the following table: Prisoner 1 Prisoner 2 Confess Remain Silent Confess (-5, -5) (0, -20) Remain Silent (-20, 0) (-1, -1) Note that the sum of the payoffs is not constant (it is -10 if both confess but -2 if both remain silent). Hence, this is not a zero-sum game. From a cursory examination of the table, it would seem like remaining silent is the optimal solution for the game (at least from the point of view of the aggregate number of years that the criminals will spend in jail). However, can this solution is not an equilibrium, and it is unlikely that players will adopt such strategy. To see why, let s consider the set of best responses for each player.

If prisoner 2 confesses, remains silent, prisoner 1 should confess confess Hence, confessing is a dominant strategy for prisoner 1 (and, by the symmetry of the game, for prisoner 2 as well). This can seem somewhat contradictory at first sight. The outcome that involves both prisoners confessing is an equilibrium, as no player has a unilateral incentive to change his or her behavior if he/she knows that the other player will confess. However, this is clearly a stupid strategy for both players because both would be better off if they could coordinate their actions so that both remained silent. This type of paradoxes (where Nash equilibria are not necessarily good solutions to the game) cannot arise in zero-sum games, but are very common in non-zero sum games. They arise because in non-zero-sum games details such as the order of play and the ability of the players to communicate, make binding agreements or set side payments, can have a big effect on the outcome of the game. As a second example, consider the following game and assume that no communication, agreements, or wealth transfer are allowed: Player 1 Player 2 A B a (0, 0) (10, 5) b (5, 10) (0, 0) As with all other games, let s consider the set of best responses for each player. From player s 1 perspective:

If player 2 A, B, player 1 should b a Similarly, from the point of view of player 2: If player 1 a, b, player 2 should B A No strategy is dominant or dominated in this example. However, we can easily argue that the pairs (A, b) and (B, a) are Nash equilibria for this game. For the first pair, note that A is the best response for b, and that b is also the best response for A. Hence, there is no incentive for unilateral changes of strategies. A similar argument can be made for the pair (B, a). In addition to these pure strategy equilibria, the game admits a mixed strategy equilibrium. Let p be the probability that player 2 plays A (so that the probability that he/she will play B is 1 p). The expected payoffs for player 1 are: If Player 1 plays a, b, the expected value of the game for Player 1 is 0 p +10 (1 p) =10 10p 5 p + 0 (1 p) = 5p Hence, for player 1 to be indifferent among any action performed by player 2 we need to have 5p =10 10p 15 p =10 p = 10 15 = 2 3 and the expected payoff for player 1 is 10p = 10 3. Now, let q be the probability that player 1 plays a, then

If Player 2 plays A, B, the expected value of the game for Player 2 is 0 q +10 (1 q) =10 10q 5 q + 0 (1 q) = 5q Hence, using the same argument, q = 2 3, and the expected payoff for player 2 is also 10 3. Note that this is really a Nash equilibrium for the game. If player 1 plays a with probability 2/3, there is nothing that player 2 can do to improve his/her expected utility over the one he/she would get by playing A 2/3 of the time, and viceversa. Unlike the pure-strategy equilibria, the mixed-strategy equilibrium is fair (in the sense that the payoff for both players is the same, 10 3 ). However, the expected payoff of 10 3 for each player is still well below the payoffs that players could obtain by cooperating and moving to either (a, B) or (b, A). Indeed, the optimal outcome (if communication and transfers were allowed), would be for both to concentrate on one of those two options and the player with the highest payoff transferring 2.5 units to the other player so that both make a benefit of 7.5 units! Finally, let s consider another example, often called the game of chicken. The name has its origins in a game in which two drivers drive towards each other on a collision course: one must swerve, or both may die in the crash, but if one driver swerves and the other does not, the one who swerved will be called a "chicken" (ever seem Rebel without a cause?). A payoff matrix associated with this game would correspond to: Player 1 Player 2 Swerve Straight Swerve (0, 0) (-1, +1) Straight (+1, -1) (-100, -100) The payoff of -100 for each player in case of a collision is meant to represent a big loss (at least, when compared against the small profit/loss made when one player swerves and the other goes straight). The game of chicken has been used to model a number of

real-life situations, including the doctrine of mutually assured destruction (or, for that matter, of the current standoff between republicans and democrats in congress!!!). Let s consider the best responses from each player. From the perspective of player 1: If player 2 swerves, goes straight, player 1 should go straight. swerve (better chicken that dead!) Similarly, from the point of view of player 2: If player 1 swerves, goes straight, player 2 should go straight. swerve (better chicken that dead!). Hence, the pair of strategies corresponding to (swerve, go straight) and (go straight, swerve) are Nash equilibria. Indeed, if Player 1 knows that Player 2 will go straight, his optimal strategy is to swerve, and if Player 2 knows that Player 1 will swerve, then his best response if to go straight. However, these equilibria might not arise in real life because they imply that everybody will know who the chicken is! The game also admits a mixed-strategy equilibrium, which corresponds to the players swerving 99% of the time and going straight 1%. Indeed, if we let p be the probability that player 2 swerves, the game from the point of view of player 1 looks like If Player 1 swerves, goes straight, the expected value of the game for Player 1 is 0 p +( 1) (1 p) = p 1 1 p + ( 100) (1 p) =101p 100 Since the utilities need to be same for both options, p 1 =101p 100 99 =100 p p = 99 100

Because of the symmetry of the game, the other player must follow the same strategy. The expected value of the game is 1 p =1 99 100 = 1 100 for each of the players. These mixed strategy equilibrium seems like a most plausible solution to the problem.