cs294-p29 Seminar on Algorihmic Game heory Sepember 6, 2011 Lecure Sepember 6, 2011 Lecurer: Chrisos H. Papadimiriou Scribes: Aloni Cohen and James Andrews 1 Game Represenaion 1.1 abular Form and he Problem of Succincness In he previous lecure, we generally deal wih games represened in abular or normal form in which each player s payoff is lised for all choices of sraegies by all players. able 1 represens he Prisoner s Dilemma in his form. able 1: Prisoner s Dilemma in abular Form silen defec silen 3,3 0,4 defec 4,0 1,1 If here are n players, each wih s sraegies o choose from, hen he number of enries in he able is n s n. Wih a represenaion his large, he goal of algorihmic efficiency becomes meaningless, as simply reading he game ino memory akes exponenial ime. I is for his reason ha alernaive game represenaions have been developed. 1.2 Graphical Games A graphical game is given by a graph G = (V, E) in which he verices correspond o he players and he presence of an edge e = (p 1, p 2 ) beween players 1 and 2 means ha he sraegies ha each player chooses (poenially) affecs he oher s payoff. For example, in Figure 1, he payoff of p 2 is given by a funcion U 2 (s 1, s 2, s 3, s 4 ) whose value depends only on he sraegies of players 1, 2, 3, and 4. For a graphical game wih n players, each of which have s sraegies each, where he graph has maximum degree d, he represenaion of a graphical game requires n s d+1 enries. For a sparse graph, his is a far smaller represenaion han normal form. Figure 1: Graphical Game Each player p i plays sraegy s i. 1-1
A decomposible graphical game is a graphical game wih he furher resricion ha he payoff for he player is affeced by he sraegies ha each neighbor plays independenly. For example, if he game in Figure 1 was a decomposible graphical game, he payoff of p 2 is given by a funcion U 2 (p 1, p 2, p 3, p 4 ) = U 2,1 (p 2, p 1 ) + U 2,3 (p 2, p 3 ) + U 2,4 (p 2, p 4 ). We will see below (Sec. 2) ha zero-sum decomposible graphical games can be solved easily wih linear programming. 1.3 Congesion Games A congesion game is given by a graph where each of he n players is assigned 2 verices and each edge is assigned an n-uple defining a delay funcion for ha edge. In he game, each player mus choose a se of edges forming a pah beween his 2 verices. Each edge of he graph has some delay, which is a funcion of only he number of players ha choose o use ha edge. A player s oal delay is he sum of he delays of all edges used on he pah. See Figure 2 for an example. For a congesion game formed by a graph wih m edges, represening he game requires roughly m n enries. Figure 2: Congesion Game In he congesion game above, suppose Player A uses edge 1 and Player B uses edge 2, and ha Players D and E use neiher edge 1 nor edge 2. If Player C uses edge 1, he delay on ha edge for Players A and C will be 2 whereas he delay for Player B (on edge 2) will be 1. Alernaively, if Player C uses edge 2, he delays on boh edge 1 and edge 2 will be 1. 1.4 Symmeric Games A game in which all players are indisinguishable is a symmeric game. Examples of such games include Prisoner s Dilemma, Rock-Paper-Scissors, and 2/3 of he Majoriy. In a symmeric game all n players have he same sraegies and same payoffs, which are a funcion only of how many players chose each sraegy, no which players chose hem. Nash proved ha every symmeric game has a symmeric Nash equilibrium - one in which every player shares a single sraegy (1951). o represen a symmeric game, we only need o sore a single payoff value for each possible se of sraegies played. Since only he number of players who choose a paricular sraegy maers, s (n+s 1 ) s 1 enries are needed. 1.4.1 Anonymous Games A generalizaion of a symmeric game is an anonymous game, one in which he players are disinguishable, bu he payoffs (possibly differen for each player) are sill a funcion only of how many players choose each sraegy raher han which players. In oher words, each player sees all he oher players as anonymous, or indisiguishable. An anonymous game requires n s (n+s 1 ) enries o represen. 1-2 s
1.5 Exensive Form (Bayesian Games) he following descripion of exensive-form games is aken from Wikipedia: An exensive-form game is a specificaion of a game in game heory, allowing (as he name suggess) explici represenaion of number of imporan aspecs, like he sequencing of players possible moves, heir choices a every decision poin, he (possibly imperfec) informaion each player has abou he oher player s moves when he makes a decision, and his payoffs for all possible game oucomes. Exensive-form games also allow represenaion of incomplee informaion in he form of chance evens encoded as moves by naure....[a]n n-player exensive-form game hus consiss of he following: A finie se of n (raional) players A rooed ree, called he game ree Each erminal (leaf) node of he game ree has an n-uple of payoffs, meaning here is one payoff for each player a he end of every possible play A pariion of he non-erminal nodes of he game ree in n+1 subses, one for each (raional) player, and wih a special subse for a ficiious player called Chance (or Naure). Each player s subse of nodes is referred o as he nodes of he player. (A game of complee informaion hus has an empy se of Chance nodes.) Each node of he Chance player has a probabiliy disribuion over is ougoing edges. Each se of nodes of a raional player is furher pariioned in informaion ses, which make cerain choices indisinguishable for he player when making a move, in he sense ha: here is a one-o-one correspondence beween ougoing edges of any wo nodes of he same informaion se hus he se of all ougoing edges of an informaion se is pariioned in equivalence classes, each class represening a possible choice for a player s move a some poin, and every (direced) pah in he ree form he roo o a erminal node can cross each informaion se a mos once he complee descripion of he game specified by he above parameers is common knowledge among he players A play is hus a pah hrough he ree from he roo o a erminal node. A any given nonerminal node belonging o Chance, an ougoing branch is chosen according o he probabiliy disribuion. A any raional player s node, he player mus choose one of he equivalence classes for he edges, which deermines precisely one ougoing edge excep (in general) he player doesn know which one is being followed. So far, every game we have seen in he above represenaions can easily be expanded ino abular form. Bu how can we reconcile Chance nodes in an exensive form game? In he example in Figure 3, each player is assigned a ype which can affec he choices made and payoffs earned. If each player ges a ype, hen he Chance node has a probabiliy disribuion on n, he se of all ype assignmens. Now we can generalize he idea of a player s sraegy (ex: do A wih some probabiliy p a, do B wih some probabiliy p b, ec) o a funcion f : sraegy ha assigns a sraegy o each ype. Given an assignmen of ypes, he game can ake a abular form. 1.6 Ohers here are many oher ypes of games and game represenaions, including bu cerainly no limied o Scheduling, Faciliy Locaion, and Nework Design games. 1-3
Figure 3: Exensive Form - An even simpler poker game In his game, Players 1 and 2 each pay $1 o play. hen Naure (he dealer) gives Player 1 a card ha s High or Low. Player 1 can eiher fold (in which case Player 2 ges he po) or be an addiional $1. If Player 2 bes, hen Player 1 can eiher fold (in which case Player 2 ges he po) or call by placing anoher $1 in he po. If boh players be hen Player 1 wins if his card was High loses if i was Low. Afer Player 1 bes, Player 2 has a single Informaion Se - he wo circled nodes are indisinguishable from his poin of view. his example was aken from hp://www.u.arizona.edu/ mwalker/pokergame.pdf. 1-4
2 Decomposable Zero-Sum Games In he homework for week 1, we saw a simple case of a decomposable graphical game. In ha scenario, Player B was playing 2 zero-sum games simulaneously, one agains Player A and one agains Player C. he cach was ha Player B had o a single sraegy o apply o boh games and his payoff was o be he sum of his payoffs from he wo separae games. As an exercise, we showed ha his could be solved by a linear program. I urns ou ha his approach can be used more generally. heorem 1. In any zero-sum decomposable graphical game, he Nash Equilibrium can be found in polynomial ime. Proof: Firs, we mus prove ha a Nash Equilibrium exiss a all. o do his, we simply use he sledge hammer ha is Nash s heorem - namely ha every game has a Nash Equilibrium. Now our ask is o find one. Define he following variables: x u,j = Pr[player u chooses acion j] U u,v (i, j) =he payoff for player u if u plays i and v plays j. L u,i ( x) = he expeced payoff for player u if he chooses acion i given ha he oher players fix heir sraegies x v,j. Noice ha L u,i is a linear funcion because he game is decomposable. In paricular, L u,i = v,j U u,v(i, j) x v,j. Consider he following linear program: Minimize u w u subjec o: w u L u,i u, i xu,i = 1 u he following claim complees he proof: x u,i 0 u, i Claim 1. he minimum is achieved a 0 which is a Nash Equilibrium. Proof: Define: w u = i x u,il u,i - his is he average gain of player u. Obviously, u w u = 0 because his is a zero-sum game. From he linear program we have w u w u. So o minimize w u, we can se w u = w u u, yielding u w u = 0, which is minimal. hus, for every sraegy i, L u,i w u is no beer han he average w u, proving ha i is indeed a Nash Equilibrium. 3 An Ancien Algorihm: Ficiious Play In addiion o he linear programming mehod we learned abou previously, here are several naural algorihms for solving zero sum games. Of hese, we firs discuss ficiious play, a sraegy in which we imagine he wo players repeaedly playing wih a naive sraegy. he firs round of ficiious play is played randomly. For each subsequen round, boh players look a he hisory of plays by he oher, and assume ha hese hisorical plays represen he sraegy ha player will use: a sraegy ha was used in p% of he pas rounds is assumed o be played wih probabiliy p%. Each player hen plays he bes response o ha assumed hisory-based sraegy. Over ime, hese hisorical sraegies will converge o an equilibrium sraegy. As an example, consider a zero-sum game wih row player payoff marix: R = 2 1 0 2 0 3 1 3 3 (1) 1-5
(and column player payoff marix C = R) he ficiious play algorihm may hen run as follows: row plays col plays u v (ave gain vecor by row) (ave loss) 0 1 3 0,3,-3 2,1,0 1 2 3 0,3,-3 2, 1 2, 3 2 2 2 2 1 3,2,-1 2, 1 3,2 ec Noe ha his process bounds he equilibrium value v as: max u v min v heorem 2. (Robinson, 1950) Ficiious play converges o he Nash equilibrium. Robinson s proof of convergence indicaes a rae of ( c ɛ )m+n (in he wors case, no probabilisic). Karlin conjecured (1965) a faser convergence rae ( c ɛ )2 suffices. 4 Anoher Naural Algorihm: Expers Anoher naural algorihm is based on a concep called boosing, or alernaively no regre learning, hedging, expers, or muliplicaive updaes (MU). We will firs inroduce his concep, and hen show how i can be applied o zero sum games. 4.1 Inroducion o Expers Algorihms In hese algorihms, n expers sugges sraegies over ime, and he player can decide which mix of exper sraegies o use based on he hisorical losses observed from each exper s sraegies. Specifically, a each ime sep, he player produces an n-vecor of weighs w indicaing how much o follow each exper s sraegy, and hen naure produces an n-vecor of losses l caused by following each sraegy. w he player s loss is hen compued as w 1 l. Cumulaive loss over ime is L = w =1 w 1 l. For example, we may imagine he following sequence of sraegies, weighs, and resuling losses from 3 expers: = 1 = 2 = 3 = 4 weigh loss weigh loss weigh loss weigh loss 1 4 1 2 2 1 2 3 1 1 2 3 2 1 2 7 1 3 1 4 1 0 2 4 8 12 4 14 3 4 5 3 he goal is o choose weighs w o do well. Doing well could mean geing he smalles cumulaive loss L, or predicing he bes row (bes exper). Given his seup, several algorihms for choosing he weigh vecors are possible: Follow he leader: For each round choose he exper i who has he lowes loss on average so far, and se w i = 1, w j i = 0. I is easy o consruc a wors case scenario in which his algorihm incurs oal loss which is n imes worse han following he rue bes exper. Muliplicaive updaes (MU): Punish ih exper by punishmen funcion: xπ(l i ) = x(1 + ɛ)l i, wih π(l i ) bounded by α x π(x) 1 (1 α)x heorem 3. Wih muliplicaive updaes, cumulaive loss L min i L i Proof: n i=1 w +1 i = n wi π(li) i=1 ln 1 α 1 α + ln(n) 1 α 1-6
i w i(1 (1 α)l i) = ( i w i)( i p i(1 (1 α)l i)) ln( i w +1 i ) ln( i w ) + ln(1 (1 α) l ) ln( i w +1 i ) ln( i w ) (1 α) l ln( i w +1 i ) (1 α)l L ln( i w +1 ) 1 α ln w i 1 α i w +1 i w 1 i α l1 i +l2 i +...+l i 1 n αl i L ln(n) 1 α + ln(α) 1 α L i i ln n Furher, noe ha seing α = 1 L min L i + 2 ln n i 1 1 ln n L min i L + 2 4.2 Applicaion o Zero Sum Games Given a zero sum game (A, A), we can le boh players play he game repeaedly, using muliplicaive updaes o updae heir sraegies, x and y. hen we can analyze he convergence of his mehod as follows: L = (x Ay ) (e i A( y ) O( ) i( ) =1 x Ay x Ae j + O( ) j x Ay y x A + O( )( ) (, ) y x A( ) e ia y O( ) x y ( )A( ) e ia y O( ( 1 )) i x herefore, approximaes he Nash equilibrium for he row player And similarly he algorihm also converges for he column player. Noe ha ( c ɛ )2, so his converges a he rae Karlin conjecured for ficiious play. 1-7