A Game-Theoretic Analysis of Strictly Competitive Multiagent Scenarios

Similar documents
UNIT 11 TWO-PERSON ZERO-SUM GAMES WITH SADDLE POINT

Test 2. ECON3161, Game Theory. Tuesday, November 6 th

Tile Values of Information in Some Nonzero Sum Games

A TWO-PLAYER MODEL FOR THE SIMULTANEOUS LOCATION OF FRANCHISING SERVICES WITH PREFERENTIAL RIGHTS

Fall 2018 #11 Games and Nimbers. A. Game. 0.5 seconds, 64 megabytes

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems

Safe Opponent Exploitation

Microeconomics 2 Game Theory Lecture notes. 1. Simultaneous-move games: Solution in (iterated) strict dominance and rationalizable strategies

Generalized Incomplete Trojan-Type Designs with Unequal Cell Sizes

Calculation of the received voltage due to the radiation from multiple co-frequency sources

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

Interdependent Relationships in Game Theory: A Generalized Model

Rational Secret Sharing without Broadcast

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter

Computing Optimal Strategies to Commit to in Stochastic Games

Review: Our Approach 2. CSC310 Information Theory

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

Understanding the Spike Algorithm

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

Secure Transmission of Sensitive data using multiple channels

MTBF PREDICTION REPORT

problems palette of David Rock and Mary K. Porter 6. A local musician comes to your school to give a performance

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

Available online at ScienceDirect. IFAC-PapersOnLine (2016)

Control Chart. Control Chart - history. Process in control. Developed in 1920 s. By Dr. Walter A. Shewhart

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

Digital Transmission

Multiband Jamming Strategies with Minimum Rate Constraints

Analysis of Time Delays in Synchronous and. Asynchronous Control Loops. Bj rn Wittenmark, Ben Bastian, and Johan Nilsson

GAME THEORETIC FLOW AND ROUTING CONTROL FOR COMMUNICATION NETWORKS. Ismet Sahin. B.S., Cukurova University, M.S., University of Florida, 2001

Learning Ensembles of Convolutional Neural Networks

It can be smart to be dumb

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

Chapter 1. On-line Choice of On-line Algorithms. Yossi Azar Andrei Z. Broder Mark S. Manasse

RESOURCE CONTROL FOR HYBRID CODE AND TIME DIVISION SCHEDULING

Discussion on How to Express a Regional GPS Solution in the ITRF

NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION

Chinese Remainder. Discrete Mathematics Andrei Bulatov

Jointly optimal transmission and probing strategies for multichannel wireless systems

Utility-based Routing

A General Framework for Codes Involving Redundancy Minimization

High Speed ADC Sampling Transients

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

Resource Allocation Optimization for Device-to- Device Communication Underlaying Cellular Networks

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

Ensemble Evolution of Checkers Players with Knowledge of Opening, Middle and Endgame

Lossless Abstraction of Imperfect Information Games

antenna antenna (4.139)

ANNUAL OF NAVIGATION 11/2006

Double-oracle Algorithm for Computing an Exact Nash Equilibrium in Zero-sum Extensive-form Games

Ultimate X Bonus Streak Analysis

Adaptive Modulation for Multiple Antenna Channels

Graph Method for Solving Switched Capacitors Circuits

Weighted Penalty Model for Content Balancing in CATS

A MODIFIED DIFFERENTIAL EVOLUTION ALGORITHM IN SPARSE LINEAR ANTENNA ARRAY SYNTHESIS

Appendix E: The Effect of Phase 2 Grants

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate

A Strategy-Proof Combinatorial Heterogeneous Channel Auction Framework in Noncooperative Wireless Networks

Application of Intelligent Voltage Control System to Korean Power Systems

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 30, NO. 11, DECEMBER

The Complexity of Playing Durak

Game Theory: an Overview 2.1 Introduction

Performance Analysis of Multi User MIMO System with Block-Diagonalization Precoding Scheme

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b

Comparison of Two Measurement Devices I. Fundamental Ideas.

4.3- Modeling the Diode Forward Characteristic

International Journal of Network Security & Its Application (IJNSA), Vol.2, No.1, January SYSTEL, SUPCOM, Tunisia.

Selective Sensing and Transmission for Multi-Channel Cognitive Radio Networks

Uncertainty in measurements of power and energy on power networks

High Speed, Low Power And Area Efficient Carry-Select Adder

On the Feasibility of Receive Collaboration in Wireless Sensor Networks

Joint Power Control and Scheduling for Two-Cell Energy Efficient Broadcasting with Network Coding

NETWORK 2001 Transportation Planning Under Multiple Objectives

N( E) ( ) That is, if the outcomes in sample space S are equally likely, then ( )

EE 508 Lecture 6. Degrees of Freedom The Approximation Problem

1 GSW Multipath Channel Models

The Performance Improvement of BASK System for Giga-Bit MODEM Using the Fuzzy System

Algorithms Airline Scheduling. Airline Scheduling. Design and Analysis of Algorithms Andrei Bulatov

Traffic balancing over licensed and unlicensed bands in heterogeneous networks

Power Control for Wireless Data

Sorting signed permutations by reversals, revisited

A MODIFIED DIRECTIONAL FREQUENCY REUSE PLAN BASED ON CHANNEL ALTERNATION AND ROTATION

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation

ECE315 / ECE515 Lecture 5 Date:

CDMA Uplink Power Control as a Noncooperative Game

I' I THE GAME OF CHECKERS SOME STUDIES IN MACHINE LEARNING USING. should eventually eliminate the need for much of this detailed programming

arxiv: v2 [physics.soc-ph] 14 Sep 2015

Full-duplex Relaying for D2D Communication in mmwave based 5G Networks

Guidelines for CCPR and RMO Bilateral Key Comparisons CCPR Working Group on Key Comparison CCPR-G5 October 10 th, 2014

Multiagent Jamming-Resilient Control Channel Game for Cognitive Radio Ad Hoc Networks

Secure Power Scheduling Auction for Smart Grids Using Homomorphic Encryption

Malicious User Detection in Spectrum Sensing for WRAN Using Different Outliers Detection Techniques

Distributed Uplink Scheduling in EV-DO Rev. A Networks

Asynchronous TDMA ad hoc networks: Scheduling and Performance

Priority based Dynamic Multiple Robot Path Planning

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation

THE IMPACT OF TECHNOLOGY ON THE PRODUCTION OF INFORMATION

Transcription:

A Game-Theoretc Analyss of Strctly Compettve Multagent Scenaros Felx Brandt Felx Fscher Paul Harrensten Computer Scence Department Unversty of Munch 80538 Munch, Germany {brandtf,fscherf,harrenst}@tcs.f.lmu.de Yoav Shoham Computer Scence Department Stanford Unversty Stanford CA 94305, USA shoham@cs.stanford.edu Abstract Ths paper s a comparatve study of game-theoretc soluton concepts n strctly compettve multagent scenaros, as commonly encountered n the context of parlor games, compettve economc stuatons, and some socal choce settngs. We model these scenaros as rankng games n whch every outcome s a rankng of the players, wth hgher ranks beng preferred over lower ones. Rather than confnng our attenton to one partcular soluton concept, we gve matchng upper and lower bounds for varous comparatve ratos of soluton concepts wthn rankng games. The soluton concepts we consder n ths context are securty level strateges (maxmn), Nash equlbrum, and correlated equlbrum. Addtonally, we also examne quasstrct equlbrum, an equlbrum refnement proposed by Harsany, whch remedes some apparent shortcomngs of Nash equlbrum when appled to rankng games. In partcular, we compute the prce of cautousness,.e., the worst-possble loss an agent may ncur by playng maxmn nstead of the worst (quas-strct) Nash equlbrum, the medaton value,.e., the rato between the socal welfare obtaned n the best correlated equlbrum and the best Nash equlbrum, and the enforcement value,.e., the rato between the hghest obtanable socal welfare and that of the best correlated equlbrum. Introducton Consder the followng three-player game. Alce, Bob, and Charle ndependently and smultaneously are to decde whether to rase ther hand or not. Alce wns f the number of players rasng ther hand s odd, whereas Bob wns f t s even and postve. Should nobody rase hs hand, Charle wns. What would you recommend Alce to do? Clearly, ths queston les at the heart of game theory, and game-theoretc soluton concepts should be called upon when Ths materal s based upon work supported by the Deutsche Forschungsgemenschaft under grants BR 232/- and BR 232/3-, and by the Natonal Scence Foundaton under ITR grant IIS-0205633. tryng to gve a sound answer (see Secton 3 for formal defntons of the concepts used n the followng paragraphs). In the game descrbed above there can be just one wnner; all the other players lose. As such t s an nstance of a subclass of rankng games, whch were recently ntroduced as models of strctly compettve mult-player scenaros [Brandt et al., 2006]. Outcomes of a rankng game are related to rankngs of the players,.e., orderngs of the players accordng to how well they have done n the game relatve to one another. Players are assumed to generally prefer hgher ranks over lower ones and to be ndfferent to the ranks of other players. Formally, rankng games are defned as normal-form games n whch the payoff functons represent the preferences of the players regardng lotteres over rankngs. In ths paper, we conduct a comparatve study of game-theoretc soluton concepts n rankng games. It s well-known that two-player strctly compettve games admt a unque ratonal soluton (the maxmn soluton),.e., a set of (possbly randomzed) strateges for each player so that each player s best off playng one of the recommended strateges. Unfortunately, soluton concepts for rankng games wth more than two players are less appealng due to a lack of normatve power. Nash equlbra, for example, whch are defned as profles of strateges that are mutual best responses to each other, may not be unque. Indeed, the game descrbed above possesses numerous Nash equlbra: Rasng her hand, not rasng her hand, and mxng unformly between both actons are all optmal strateges for Alce n some equlbrum. The only pure,.e., non-randomzed, equlbrum of the game tells Alce not to rase her hand based on the belef that Bob wll rase hs hand and Charle wll not (see Fgure for an llustraton). Ths assumpton, however, s unreasonably strong. Both Bob and Charle may devate from ther respectve strateges to any other strategy wthout decreasng ther chances of wnnng. After all, they cannot do any worse than losng. Ths weakness s due to the ndfference of losers, whch s nherent to rankng games. In fact, we argue that pure Nash equlbra are partcularly weak solutons of such games and conjecture (and prove for certan sub-cases) that every sngle-wnner game possesses at least one non-pure equlbrum,.e., an equlbrum where at least one player randomzes. Returnng to the example gven at the begnnng of ths secton, t s stll unclear whch strategy Alce should adopt 99

n order to maxmze her chances of wnnng. We consder three soluton concepts n addton to Nash equlbra: maxmn strateges, quas-strct equlbra, and correlated equlbra. By playng her maxmn strategy, Alce can guarantee a certan chance of wnnng, her so-called securty level, no matter whch actons her opponents choose. Alce s securty level n ths partcular game s 0.5 and can be obtaned by randomzng unformly between both actons. The same expected payoff s acheved n the worst quas-strct equlbrum of the game where Alce and Charle randomze unformly and Bob nvarably rases hs hand (see Fgure ). We wll see that ths equvalence s no mere concdence, snce n any sngle-wnner game where a player has just two actons, the payoff n hs worst quas-strct equlbrum equals hs (postve) securty level. However, none of the aforementoned soluton concepts offers a soluton for mult-player rankng games that s as obvously rght as maxmn s for strctly compettve two-player games. We nevertheless facltate the analyss of rankng games by evaluatng the followng comparatve ratos: the prce of cautousness,.e., the worst-possble loss an agent may face when playng maxmn nstead of the worst Nash equlbrum, the prce of cautousness for quas-strct equlbra,.e., the worst-possble loss an agent may face when playng maxmn nstead of the worst quas-strct equlbrum, the medaton value,.e., the rato between the socal welfare obtanable n the best correlated equlbrum and the best Nash equlbrum, and the enforcement value,.e., the rato between the hghest obtanable socal welfare and that of the best correlated equlbrum. Each of these values obvously equals n the case of twoplayer rankng games, as these form a subclass of constantsum games. The nterestng queston s how these values unfold for games wth more than two players. The remander of ths paper s organzed as follows. After revewng related work n Secton 2, we formally ntroduce rankng games and game-theoretc soluton concepts n Secton 3. Secton 4 dscusses a weakness of the Nash equlbrum concept that s characterstc for rankng games. Sectons 5 and 6 ntroduce and evaluate the prce of cautousness and the value of correlaton, respectvely. The paper concludes wth Secton 7. 2 Related Work Game playng research n AI has largely focused on twoplayer games [see, e.g., Marsland and Schaeffer, 990]. As a matter of fact, n AI, games are usually of a rather specalzed knd what game theorsts call determnstc, turn-takng, two-player, zero-sum games of perfect nformaton [Russell and Norvg, 2003, p. 6]. Notable exceptons nclude cooperatve games n the context of coalton formaton [see, e.g., Sandholm et al., 999] and complete nformaton extensve-form games, a class of mult-player games for whch effcent Nash equlbrum search algorthms have been nvestgated by the AI communty [e.g., Luckhardt and Iran, 986; Sturtevant, 2004]. In extensve-form games, players move consecutvely and a pure (so-called subgame perfect) Nash equlbrum s guaranteed to exst [see, e.g., Myerson, 99]. Normal-form games are more general than (perfectnformaton) extensve-form games because every extensveform game can be mapped to a correspondng normal-form game, whle the opposte s not the case. Rankng games were ntroduced by Brandt et al. [2006], who also showed that fndng Nash equlbra of rankng games wth more than two players s just as hard as for general games and thus unlkely to be feasble n polynomal tme. Ths further underlnes the mportance of alternatve soluton concepts such as maxmn strateges and correlated equlbra whch can both be computed effcently va lnear programmng. Most work on comparatve ratos n computatonal game theory has been nspred by the lterature on the prce of anarchy [Koutsoupas and Papadmtrou, 999],.e., the rato between the hghest obtanable socal welfare and that of the best Nash equlbrum. Smlar ratos for correlated equlbra (the value of medaton and the enforcement value) were ntroduced by Ashlag et al. [2005]. To our knowledge, Tennenholtz [2002] was the frst to conduct a numercal comparson of Nash equlbrum payoff and the securty level. Ths work s nspred by an ntrgung example game due to Aumann [985] where the only Nash equlbrum yelds each player no more than hs securty level, but the equlbrum strateges are actually dfferent from the maxmn strateges. In other words, the equlbrum merely yelds securty level payoffs but fals to guarantee them. 3 Prelmnares 3. Rankng Games An accepted way to model stuatons of conflct and socal nteracton s by means of a normal-form game [see, e.g., Myerson, 99]. Defnton (Normal-form game) A game n normal-form s a tuple Γ=(N, (A ) N, (p ) N ) where N s a set of players and for each player N, A s a nonempty set of actons avalable to player, and p :( N A ) R s a functon mappng each acton profle of the game (.e., combnaton of actons) to a real-valued payoff for player. a a 2 c c 2 b b 2 b b 2 3 2 2 Fgure : Three-player sngle-wnner game. Alce () chooses row a or a 2, Bob (2) chooses column b or b 2, and Charle (3) chooses matrx c or c 2. Outcomes are denoted by the wnner s ndex. The dashed square marks the only pure Nash equlbrum. Dotted rectangles mark a quas-strct equlbrum n whch Alce and Charle randomze unformly over ther respectve actons. 2 200

Unless stated otherwse, we wll henceforth assume that every player has at least two dfferent actons. A combnaton of actons s A = N A s also called a profle of pure strateges. Ths concept can be generalzed to mxed strategy profles s S = N S, by lettng players randomze over ther actons. Here, S =Δ(A ) denotes the set of probablty dstrbutons over player s actons, or mxed strateges avalable to player. Payoff functons naturally extend to mxed strategy profles, and we wll frequently wrte p (s) for the expected payoff of player,andp(s) = N p (s) for the socal welfare, under profle s. In the followng, we further wrte n = N for the number of players n a game, A and S for the set of acton or strategy profles for all players but, s for the th strategy n profle s, s for the vector of all strateges n s but s, and s (a) for the probablty assgned to acton a by player n strategy profle s. The stuatons of socal nteracton ths paper s concerned wth are such that outcomes are related to a rankng of the players,.e., an orderng of the players accordng to how well they have done n the game relatve to one another. We assume that players generally prefer hgher ranks over lower ones and that they are ndfferent to the ranks of other players. Moreover, we hypothesze that the players entertan qualtatve preferences over lotteres,.e., probablty dstrbutons over ranks [cf. von Neumann and Morgenstern, 947]. For example, one player may prefer to be ranked second to havng a ffty-ffty chance of beng ranked frst or beng ranked thrd, whereas another player may judge qute dfferently. We arrve at the followng defnton of the rank payoff to a player. Defnton 2 (Rank payoff) The rank payoff of a player s defned as vector r = (r, r2,...,rn ) Rn such that r k r k+ for all k {, 2,...,n }, and r > r n. For convenence, we assume rank payoffs to be normalzed so that r = and r n = 0. In other words, hgher ranks are weakly preferred, and for at least one rank the preference s strct. Intutvely, r k represents player s payoff for beng ranked n kth. Buldng on Defnton 2, defnng rankng games s straghtforward. Defnton 3 (Rankng game) A rankng game s a game where for any strategy profle s S there s a permutaton (π,π 2,...,π n ) of the players so that p (s) = r π for all N. A bnary rankng game s one where each rank payoff vector only conssts of zeros and ones. An mportant subclass of bnary rankng games are sngle-wnner games,.e., games where r = (, 0,...,0) for all N. When consderng mxed strateges, the expected payoff n a sngle-wnner game equals the probablty of wnnng. An example sngle-wnner game wth three players the game ntroduced at the begnnng of ths paper s gven n Fgure. A convenent way of representng these games s to just denote the ndex of the wnnng player for each outcome. For general rankng games, we wll sometmes wrte [, 2,..., n ] to denote the outcome where player s ranked frst, 2 s ranked second, and so forth. 3.2 Soluton Concepts Over the years, game theory has produced a number of soluton concepts that dentfy reasonable or desrable strategy profles n a gven game. Perhaps the most cautous way for a player to play a game s to try to maxmze hs own payoff regardless of whch strateges the other player choose,.e., even when the other players (collaboratvely) try to mnmze hs payoff. Such a strategy s called a maxmn strategy, and the correspondng (guaranteed mnmum) payoff s called the maxmn payoff or securty level of that player. Defnton 4 (Maxmn strategy) A strategy s S s called a maxmn strategy for player Nf s arg max mn p (r, t ). r S t S v = max r S mn t S p (r, t ) s called the securty level for player. Gven a partcular game Γ, we wll wrte v (Γ) for the securty level of player n Γ. In the game of Fgure, Alce can acheve her securty level of 0.5 by unform randomzaton over her actons,.e., by rasng her hand wth probablty 0.5. The securty level for players 2 and 3 s zero. One of the best-known soluton concepts s Nash equlbrum [Nash, 95]. In a Nash equlbrum, no player s able to ncrease hs payoff by unlaterally changng hs strategy. Defnton 5 (Nash equlbrum) A strategy profle s Ss called a Nash equlbrum f for each player N and each strategy s S, p (s) p ((s, s )). A Nash equlbrum s called pure f t s a pure strategy profle. Nash [95] has shown that every normal-form game possesses at least one equlbrum. There are nfntely many Nash equlbra n the sngle-wnner game of Fgure, the only pure equlbrum s denoted by a dashed square. A weakness of Nash equlbrum as a normatve soluton concept (besdes the multplcty of equlbra) s that players may be ndfferent between actons they play wth non-zero probablty and actons they do not play at all. For example, n the pure Nash equlbrum of the game n Fgure, players 2 and 3 mght as well devate wthout decreasng ther chances of wnnng the game. Quas-strct equlbrum as ntroduced by Harsany [973] tres to allevate ths phenomenon by demandng that every best response be played wth postve probablty. (It follows from the defnton of Nash equlbrum that every acton played wth postve probablty yelds the same expected payoff.) Defnton 6 (Quas-strct Nash equlbrum) A Nash equlbrum s S s called quas-strct f for all N and all a, b A wth s (a) > 0 and s (b) = 0, p (s, a) > p (s, b). Harsany orgnally referred to quas-strct equlbrum as quas-strong. However, ths term has been dropped to dstngush the concept from Aumann s strong equlbrum. 20

Fgure shows a quas-strct equlbrum of the game between Alce, Bob, and Charle. 2 Whle quas-strct equlbra have recently been shown to always exst n two-player games [Norde, 999], ths s not the case for games wth more than two players (see Footnote 3). Nash equlbrum assumes that players randomze between ther actons ndependently from each other. Aumann [974] ntroduced the noton of a correlated strategy, where players are allowed to coordnate ther actons by means of a devce or agent that randomly selects one of several acton profles and recommends the actons of ths profle to the respectve players. The correspondng equlbrum concept s defned as follows. Defnton 7 (Correlated equlbrum) A correlated strategy μ Δ(A) s called a correlated equlbrum f for all N, s, a A, s A μ(s)(p (s) p(s, a )) 0. In other words, a correlated equlbrum of a game s a probablty dstrbuton μ over the set of acton profles, such that, f a partcular acton profle s s chosen accordng to ths dstrbuton, and every player N s only nformed about hs own acton s A, t s optmal for to play s, gven that the other players play s. Correlated equlbrum s based upon the assumpton that there exsts a trustworthy party who can recommend behavor but cannot enforce t. It can easly be seen from the defnton that the Nash equlbra of any game form a subset of the correlated equlbra, wth the addtonal property of beng a product of strateges for the ndvdual players. The exstence result by Nash [95] thus carres over to correlated equlbra. Agan consder the game of Fgure. It s easly verfed that the correlated strategy that assgns probablty 0.25 each to acton profles (a, b, c ), (a, b 2, c ), (a 2, b, c ), and (a 2, b, c 2 )s a correlated equlbrum n whch the expected payoff s 0.5 for player and 0.25 for players 2 and 3. In ths partcular case, the correlated equlbrum s a convex combnaton of Nash equlbra, and correlaton can be acheved by means of a publcly observable random varable. Perhaps surprsngly, Aumann [974] has shown that n general the (expected) socal welfare of a correlated equlbrum may exceed that of every Nash equlbrum, and that correlated equlbrum payoffs may n fact be outsde the convex hull of the Nash equlbrum payoffs. Ths s of course not possble f socal welfare s dentcal n all outcomes, as t s the case for the game n Fgure. 4 Equlbrum Ponts n Rankng Games As we have already seen n Secton, the stablty of some Nash equlbra n rankng games s questonable because losng players are assumed to play certan strateges even though they could as well play any other strategy wthout decreasng ther payoff. By defnton, there s at least one player the 2 Observe that Charle plays a weakly domnated acton wth postve probablty n ths equlbrum. a a 2 c c 2 b b 2 b b 2 2 2 Fgure 2: Three-player sngle-wnner game. Dashed boxes denote all Nash equlbra (one player may mx arbtrarly n boxes that span two outcomes). one ranked lowest n any outcome, who receves hs mnmum payoff of zero and therefore has no ncentve to actually play that partcular acton. As a consequence, all pure equlbra are weak n ths sense, especally n sngle-wnner games where n players are ndfferent over whch acton to play. Quas-strct equlbrum mtgates ths phenomenon by addtonally requrng that actons played wth postve probablty yeld strctly more payoff than non-equlbrum actons. Thus, quas-strct equlbrum can be used to formally llustrate the weakness of pure Nash equlbrum. Fact Quas-strct equlbra n rankng games are never pure,.e., n any quas-strct equlbrum there s at least one player who randomzes over some of hs actons. There s at least one quas-strct equlbrum n every twoplayer game (and thus also n every two-player rankng game) [Norde, 999]. In games wth more than two players, there may be no quas-strct equlbrum. Fgure 2 shows that ths even holds for sngle-wnner games. 3 It appears as f most rankng games possess non-pure equlbra,.e., mxed strategy equlbra where at least one player randomzes. We prove ths clam for three subclasses of rankng games. Theorem The followng classes of rankng games always possess at least one non-pure equlbrum: () two-player rankng games, () three-player sngle-wnner games where each player has two actons, and () n-player sngle-wnner games where the securty level of at least two players s postve. Proof: Statement () follows from Fact and the exstence result by Norde [999]. For reasons of completeness, we gve a smple alternatve proof. Assume for contradcton that there s a two-player rankng game that only possesses pure equlbra and consder, wthout loss of generalty, a pure equlbrum e n whch player wns. Snce player 2 must be ncapable of ncreasng hs payoff by devatng from e, player has to wn no matter whch acton the second player chooses. 3 There are few examples n the lterature for games wthout quas-strct equlbra (essentally there s one example by van Damme [983] and another one by Cubtt and Sugden [994]). For ths reason, the game depcted n Fgure 2 mght be of ndependent nterest. 3 202

As a consequence, the strateges n e reman n equlbrum even f player 2 s strategy s replaced wth an arbtrary randomzaton among hs actons. As for (), consder a three-player sngle wnner game wth actons A = {a, a 2 }, A 2 = {b, b 2 }, and A 3 = {c, c 2 }. Assume for contradcton that there are only pure equlbra n the game and consder, wthout loss of generalty, a pure equlbrum e = (a, b, c ) n whch player wns. In the followng, we say that a pure equlbrum s sem-strct f at least one player strctly prefers hs equlbrum acton over all hs other actons gven that the other players play ther equlbrum actons. In sngle-wnner games, ths player has to be the wnner n the pure equlbrum. We frst show that f e s sem-strct,.e., player does not wn n acton profle (a 2, b, c ), then there must exst a non-pure equlbrum. For ths, consder the strategy profles e where player 2 mxes unformly between e and (a, b 2, c ) and e 2 where player 3 mxes unformly between e and (a, b, c 2 ). Snce player does not wn n (a 2, b, c ), he wll not devate from ether e or e 2 even when he wns n (a 2, b 2, c ) and (a 2, b, c 2 ). Consequently, player 3 must wn n (a, b 2, c 2 ) n order for e not to be an equlbrum. Analogously, for e 2 not to be an equlbrum, player 2 has to wn n the same acton profle (a, b 2, c 2 ), contradctng the assumpton that the game s a sngle-wnner game. Thus, the exstence of a sem-strct pure equlbrum mples that of a non-pure equlbrum. Conversely assume that e s not sem-strct. When any of the acton profles n E = {(a 2, b, c ), (a, b 2, c ), (a, b, c 2 )} s a pure equlbrum, ths also yelds a non-pure equlbrum because two pure equlbra that only dffer by the acton of a sngle player can be combned nto nfntely many mxed equlbra. For E not to contan any pure equlbra, there must be (exactly) one player for every profle n E who devates to a profle n D = {(a 2, b 2, c ), (a 2, b, c 2 ), (a, b 2, c 2 )} because the game s a sngle-wnner game and because e s not sem-strct. Ths mples two facts: Frst, acton profle e = (a 2, b 2, c 2 )sa pure equlbrum because no player wll devate from e to any profle n D. Second, the player who wns n e strctly prefers the equlbrum outcome over the correspondng acton profle n D, mplyng that e s sem-strct. The above observaton that every sem-strct equlbrum also yelds a non-pure equlbrum completes the proof. As for (), recall that the payoff a player obtans n equlbrum must be at least hs securty level. Thus, a postve securty level for player rules out all equlbra n whch player receves zero payoff, n partcular all pure equlbra n whch he does not wn. If there are two players wth postve securty levels, both of them have to wn wth postve probablty n any equlbrum of the game. In sngle-wnner games, ths can only be the case n a non-pure equlbrum. We were unable to fnd a sngle-wnner game that only contans pure equlbra, even when employng a computer program that checked tens of thousands of games. However, a general exstence result has so far tenacously ressted proof. 5 The Prce of Cautousness Despte ts conceptual elegance and smplcty, Nash equlbrum has been crtczed on varous grounds. In the common case of multple equlbra, t s unclear whch one to play; coaltons mght beneft from jontly devatng; and recent complexty-theoretc results ndcate that there mght exst no polynomal-tme algorthm for fndng Nash equlbra [Chen and Deng, 2006]. Addng the ndfference of players, whch s partcularly problematc n rankng games, a compellng queston s how much worse a player can be off when revertng to the most defensve choce hs maxmn strategy nstead of hopng for an equlbrum outcome. We refer to ths value by the prce of cautousness. In the followng, let G denote the set of all normal-form games and for Γ Glet N(Γ) denote the set of Nash equlbra of Γ. Defnton 8 Let Γ be a normal-form game wth non-negatve payoffs, N a player such that v (Γ) > 0. The prce of cautousness for player n Γ s defned as PC (Γ) = mn { p (s) s N(Γ) }. v (Γ) For any class C Gof games nvolvng player, we further wrte PC (C) = sup Γ C PC (Γ). In other words, the prce of cautousness of a player s the rato between hs mnmum payoff n a Nash equlbrum and hs securty level, thus capturng the worst-case loss the player may experence by playng hs maxmn strategy nstead of a Nash equlbrum. For a player whose securty level s zero, every strategy s a maxmn strategy. Snce we are manly nterested n a comparson of normatve soluton concepts, we wll only consder games where the securty level of at least one player s postve. As already mentoned n Secton, the prce of cautousness n two-player rankng games s due to the Mnmax Theorem [von Neumann and Morgenstern, 947]. In general rankng games, the prce of cautousness s unbounded. The proof of the followng theorem s omtted for reasons of lmted space. Theorem 2 Let R be the class of rankng games wth more than two players that nvolve player. Then, PC (R) =, even f R only contans games wthout weakly domnated actons. We proceed to show that, due to the structural lmtatons of bnary rankng games, the prce of cautousness n these games s bounded from above by the number of actons of the respectve player. We also derve a matchng lower bound. Theorem 3 Let R b be the class of bnary rankng games wth more than two players nvolvng a player wth exactly k actons. Then, PC (R b ) = k, even f R b only contans snglewnner games or games wthout weakly domnated actons. Proof: By defnton, the prce of cautousness takes ts maxmum for maxmum payoff n a Nash equlbrum, whch s bounded by n a rankng game, and mnmum securty level. By the requrement that the securty level must be strctly postve, we have that for every opponent acton profle s there must be some acton a such that p (a, s ) > 0,.e., p (a, s ) =. It s then easly verfed that player can ensure a securty level of /k by unform randomzaton over hs k actons, resultng n a prce of cautousness of at most k. 203

c b b 2 a (0,, ) (, 0, 0) a 2 (, 0, 0) (0,, 0) c 2 b b 2 (0,, 0) (, 0, 0) (, 0, ) (, 0, ) Fgure 3: Three-player rankng game Γ used n the proof of Theorem 3 For a matchng lower bound, agan consder the sngle wnner game of Fgure 2. We wll argue that all Nash equlbra of ths game are mxtures of the acton profles (a 2, b, c 2 ), (a 2, b 2, c 2 ), and (a, b 2, c 2 ) and yeld payoff for player, twce as much as hs securty level of 0.5. For ths, we look at the possble strateges for player 3. If player 3 plays c, the game reduces to the well-known matchng pennes game for players and 2, n whch they wll randomze unformly over both of ther actons. In ths case, player 3 wll devate to c 2. If player 3 plays c 2, we mmedately obtan the equlbra descrbed above. Fnally, f player 3 randomzes between actons c and c 2, the payoff obtaned from both of these actons must be the same. Ths can only be the case f ether player plays a 2 and player 2 randomzes between b and b 2, or f player randomzes between a and a 2 and player 2 plays b 2. In the former case, player 2 wll play b 2, causng player to devate to a. In the latter case, player wll play a, causng player 2 to devate to b. The above constructon can be generalzed to k > 2 by vrtue of a sngle-wnner game wth actons A = {a,...,a k }, A 2 = {b,...,b k }, and A 3 = {c, c 2 }, and payoffs (0,, 0) f l = and k j + p((a, b j, c l )) = (0, 0, ) f l = 2 and = j = (, 0, 0) otherwse. It s easly verfed that the securty level of player n ths game s /k whle, by the same arguments as above, hs payoff n every Nash equlbrum equals. Ths shows tghtness of the upper bound of k on the prce of cautousness for sngle-wnner games. Now consder the game Γ of Fgure 3, whch s a rankng game for rank payoff vectors r = r 2 = (, 0, 0) and r 3 = (,, 0), and rankngs [2, 3, ], [, 2, 3], [2,, 3], and [, 3, 2]. It s easly verfed that none of the actons of Γ s weakly domnated and that v (Γ ) = 0.5. On the other hand, we wll argue that all Nash equlbra of Γ are mxtures of acton profles (a 2, b, c 2 ) and (a 2, b 2, c 2 ), correspondng to a payoff of for player. For ths, we agan look at the possble strateges for player 3. If player 3 plays c, players and 2 wll agan randomze unformly over both of ther actons, causng player 3 to devate to c 2. If player 3 plays c 2,we mmedately obtan the equlbra descrbed above. Fnally, f player 3 randomzes between actons c and c 2, he must agan get the same payoff from both of these actons. Ths can only be the case f ether player plays a and player 2 plays b 2, or f player randomzes between a and a 2 and player 2 plays b. In the former case, player 2 wll devate to b. In the latter case, player wll devate to a 2. Ths constructon can be generalzed to k > 2 by vrtue of a game wth actons A = {a,...,a k }, A 2 = {b,...,b k }, and A 3 = {c, c 2 }, and payoffs (0,, ) f = j = l = (, 0, 0) f l = and = k j + p((a, b j, c l )) = or l = 2, = and j > (, 0, ) f l = 2 and j > 2 (0,, 0) otherwse. Agan, t s easly verfed that the securty level of player n ths game s /k whle, by the same arguments as above, hs payoff s n every Nash equlbrum. Thus, the upper bound of k for the prce of cautousness s tght as well for bnary rankng games wthout weakly domnated actons. Informally, the prevous theorem states that the payoff a player can obtan n Nash equlbrum can be at most k tmes hs securty level. The proof reles on equlbra n whch the payoff of at least one player s. As we have already ponted out n Secton 4, such equlbra (lke pure equlbra) are partcularly weak. We therefore also study the prce of cautousness wth respect to quas-strct equlbra. Defnton 9 Let Γ be a normal-form game wth non-negatve payoffs, N a player such that v (Γ) > 0. The prce of cautousness wth respect to quas-strct equlbra for player nγ s defned as PC QS (Γ) = mn { p (s) s N QS (Γ) }, v (Γ) where N QS (Γ) denotes the set of quas-strct equlbra n Γ. As before, PC QS (C) = sup Γ C PC QS (Γ). Returnng to the bnary rankng game of Fgure 3 and ts generalzatons, t turns out that player 2 can do nothng about the fact that he always loses n every Nash equlbrum. As a consequence, all Nash equlbra where every acton profle wth payoff (, 0, ) s played wth postve probablty are quas-strct, and the prce of cautousness n bnary rankng games remans k when restrctng attenton to quas-strct equlbra. In sngle-wnner games, on the other hand, a slght decrease n the prce of cautousness can be wtnessed. Ths s due to the fact that there can be no quas-strct equlbrum n whch only one player wns (see also Fact ). Theorem 4 Let R b be the class of sngle-wnner games wth more than two players nvolvng a player wth exactly k actons. Then, PC QS (R b ) = k. Proof: Lke n the proof of Theorem 3, an upper bound for the prce of cautousness can be found by lettng the numerator and denomnator take ther maxmum and mnmum, respectvely. As before, the lowest postve securty value s /k for a player wth k actons. The argument for a useful upper bound on the payoff n a quas-strct equlbrum s slghtly more delcate. We start by observng that the exstence of a quas-strct equlbrum n whch a player (say, player ) obtans payoff mples that ths player has a wnnng acton,.e., an acton whch always yelds payoff regardless 204

of the other players actons. Ths s seen as follows. In a sngle-wnner game, a payoff of for player means that all other players get payoff zero. In a quas-strct equlbrum, all players have to receve strctly more payoff for equlbrum actons than for actons that are not contaned n the equlbrum s support. For ths reason, all losng players have to randomze over all ther actons n a quas-strct equlbrum n whch player wns. Ths mples that player must have an acton that guarantees hm a wn, and thus hs securty level s. Snce a maxmum securty level s useless for fndng a reasonable upper bound, we restrct our attenton to games where no player has a securty level of. Accordng to our prevous argument, there can be no quas-strct equlbrum n such games where only one player wns. We clam that the hghest payoff less than that player may obtan n the worst equlbrum s (k )/k. Frst, we observe that we can restrct our attenton to equlbra e that do not contan acton profle b A by all players except player, so that player wns no matter whch acton he chooses. Whenever b s part of an equlbrum, there must be another equlbrum where b s not played, but that s otherwse dentcal. Obvously, player cannot get more payoff n ths new equlbrum than n the orgnal one. Now assume for contradcton that the payoff to player n e s greater than (k )/k. For any acton a that player plays n equlbrum, the sum of probabltes that the other players put on all actons profle b A such that player wns n acton profle (a, b) must be greater than (k )/k. Let Z j A denote the set of all remanng acton profles,.e., those combnatons of actons by other players where player loses. Clearly, the sum of probabltes for all acton profles n Z j must be strctly less than /k. On the other hand, snce player loses at least once for every acton profle of the other players, the unon of all sets Z j equals the set of all acton profles played n e, and the probabltes of these actons must sum up to, yeldng a contradcton. As for a matchng lower bound, consder the sngle-wnner game nvolvng Alce, Bob, and Charle that s shown n Fgure. Alce s payoff n the quas-strct equlbrum marked by the dotted rectangles s 0.5, whle her securty level of 0.5 mples that there cannot be an equlbrum wth lower payoff. For k > 2, we nstead use a game wth actons A = {a,...,a k }, A 2 = {b,...,b k }, and A 3 = {c, c 2 }, and payoffs (0,, 0) f l = and j p((a, b j, c l )) = (0, 0, ) f l = 2 and = j (, 0, 0) otherwse. It s easly verfed that the strategy profle where player 3 plays c 2 and players and 2 randomze unformly between all of ther actons s a quas-strct Nash equlbrum and n fact the only Nash equlbrum of ths game. The payoff of player n ths equlbrum s (k )/k, (k ) tmes hs securty level of /k. Applyng ths theorem to a sngle-wnner game whch contans a quas-strct equlbrum, a player wth only two actons at hs dsposal wll not obtan more payoff than hs (postve) securty level n some quas-strct equlbrum. 6 The Value of Correlaton We wll now turn to the queston whether, and by whch amount, socal welfare can be mproved by allowng players n a rankng game to correlate ther actons. Just as the payoff of a player n any Nash equlbrum s at least hs securty level, socal welfare n the best correlated equlbrum s at least as hgh as socal welfare n the best Nash equlbrum. In order to quantfy the value of correlaton n strategc games wth non-negatve payoffs, Ashlag et al. [2005] recently ntroduced the medaton value of a game as the rato between the maxmum socal welfare n a correlated versus that n a Nash equlbrum, and the enforcement value as the rato between the maxmum socal welfare n any outcome versus that n a correlated equlbrum. Whenever socal welfare,.e., the sum of all players payoffs, s used as a measure of global satsfacton, one mplctly assumes the nter-agent comparablty of payoffs. Whle ths assumpton s controversal, socal welfare s nevertheless commonly used n the defntons of comparatve ratos such as the prce of anarchy [Koutsoupas and Papadmtrou, 999]. For Γ Gand X Δ(S ), let C(Γ) denote the set of correlated equlbra of Γ and let v X (Γ) = max{ p(s) s X }. Defnton 0 Let Γ be a normal-form game wth nonnegatve payoffs. The medaton value MV(Γ) and the enforcement value EV(Γ) of Γ are defned as MV(Γ) = v C(Γ)(Γ) v N(Γ) (Γ) and EV(Γ) = v S (Γ) v C(Γ) (Γ). If both numerator and denomnator are 0 for one of the values, the respectve value s defned to be. If only the denomnator s 0, the value s defned to be. For any class C Gof games, we further wrte MV(C) = sup Γ C MV(Γ) and EV(C) = sup Γ C EV(Γ). Ashlag et al. [2005] have shown that both the medaton value and the enforcement value cannot be bounded for any class of games wth an arbtrary payoff structure, as soon as there are more than two players or some player has more than two actons. Ths holds even f payoffs are normalzed to the nterval [0, ]. Rankng games also satsfy ths normalzaton crteron, but here socal welfare s also strctly postve for every outcome of the game. Rankng games wth dentcal rank payoff vectors for all players,.e., ones where r k = r k j for all, j N and k n, are constant-sum games. Hence, the socal welfare s the same n every outcome so that both the medaton value and the enforcement value are. Ths partcularly concerns all rankng games wth two players. In general, socal welfare n an arbtrary outcome of a rankng game s bounded by n from above and from below. Snce the Nash and correlated equlbrum payoffs must le n the convex hull of the feasble payoffs of the game, we obtan trval lower and upper bounds of and n, respectvely, on both the medaton and the enforcement value. It turns out that the upper bound of n s tght for both the medaton value and the enforcement value. For the former, we show that for any n 3 there s a rankng game where all Nash equlbra have socal welfare whle there s a correlated equlbrum wth socal welfare n. In partcular, we 205

explot the fact that a Nash equlbrum has to be a product of strateges for the ndvdual players and desgn a game where one of the players strctly prefers a desgnated acton gven that the other players play a strategy profle nvolvng an outcome wth hgh socal welfare, whle the same s not the case for a certan correlated strategy. The proof tself s rather nvolved, and s omtted for reasons of lmted space. Theorem 5 Let R be the class of rankng games wth more than two players such that at least one player has more than two actons when there are only three players. Then, MV(R ) = n. In order to match the upper bound of the enforcement value, we desgn a rankng game that has socal welfare n for a sngle acton profle and socal welfare + ɛ for all others. To show that there exsts no correlated equlbrum wth socal welfare larger than + ɛ, the problem of fndng a socal welfare maxmzng correlated equlbrum s wrtten as a lnear program and then transformed nto ts dual. Snce the dual consttutes a mnmzaton problem, t suffces to fnd a feasble soluton wth objectve value + ɛ. We agan omt the detals of the proof. Theorem 6 Let R be the class of rankng games wth more than two players. Then, EV(R) = n, even f R only contans games wthout weakly domnated actons. 7 Concluson We have quantfed and bounded comparatve ratos between varous soluton concepts n rankng games. It turned out that playng one s maxmn strategy n bnary rankng games wth only few actons avalable mght be a prudent choce, not only because ths strategy guarantees a certan payoff even when playng aganst rratonal opponents, but also because of the lmted prce of cautousness and the nherent weakness of Nash equlbra n rankng games. Moreover, maxmn strateges can be computed n polynomal tme whle all known algorthms for computng Nash equlbra have exponental worst-case complexty. In the second part of the paper, we have nvestgated the relatonshp between correlated and Nash equlbra. Whle correlaton can never decrease socal welfare, t s an mportant queston whch (especally compettve) scenaros permt an ncrease. In scenaros wth many players and asymmetrc preferences over ranks (.e., non-dentcal rank payoff vectors) overall satsfacton can be mproved substantally by allowng players to correlate ther actons. Furthermore, correlated equlbra have the advantage of beng polynomal-tme computable and do not suffer from the equlbrum selecton problem snce the equlbrum to be played s selected by a medator. References I. Ashlag, D. Monderer, and M. Tennenholtz. On the value of correlaton. In Proc. of 2st UAI Conference, pages 34 4. AUAI Press, 2005. R. Aumann. Subjectvty and correlaton n randomzed strateges. Journal of Mathematcal Economcs, :67 96, 974. R. Aumann. On the non-transferable utlty value: A comment on the Roth-Shafer examples. Econometrca, 53(3):667 678, 985. F. Brandt, F. Fscher, and Y. Shoham. On strctly compettve mult-player games. In Y. Gl and R. Mooney, edtors, Proc. of 2st AAAI Conference, pages 605 62. AAAI Press, 2006. X. Chen and X. Deng. Settlng the complexty of 2-player Nash-equlbrum. In Proc. of 47th FOCS Symposum. IEEE Press, 2006. To Appear. R. Cubtt and R. Sugden. Ratonally justfable play and the theory of non-cooperatve games. Economc Journal, 04(425):798 803, 994. J. C. Harsany. Oddness of the number of equlbrum ponts: A new proof. Internatonal Journal of Game Theory, 2:235 250, 973. E. Koutsoupas and C. Papadmtrou. Worst-case equlbra. In Proc. of 6th STACS, volume 563 of LNCS, pages 404 43. Sprnger, 999. C. Luckhardt and K. Iran. An algorthmc soluton of n- person games. In Proc. of 5th AAAI Conference, pages 58 62. AAAI Press, 986. A. T. Marsland and J. Schaeffer, edtors. Computers, Chess, and Cognton. Sprnger, 990. R. B. Myerson. Game Theory: Analyss of Conflct. Harvard Unversty Press, 99. J. F. Nash. Non-cooperatve games. Annals of Mathematcs, 54(2):286 295, 95. H. Norde. Bmatrx games have quas-strct equlbra. Mathematcal Programmng, 85:35 49, 999. S. J. Russell and P. Norvg. Artfcal Intellgence. A Modern Approach. Prentce Hall, 2nd edton, 2003. T. Sandholm, K. Larson, M. Andersson, O. Shehory, and F. Tohmé. Coalton structure generaton wth worst case guarantees. Artfcal Intellgence, ( 2):209 238, 999. N. Sturtevant. Current challenges n mult-player game search. In Proc. of 4th Internatonal Conference on Computers and Games (CG), volume 3846 of LNCS. Sprnger, 2004. M. Tennenholtz. Compettve safety analyss: Robust decson-makng n mult-agent systems. Journal of Artfcal Intellgence Research, 7:363 378, 2002. E. van Damme. Refnements of the Nash Equlbrum Concept. Sprnger, 983. J. von Neumann and O. Morgenstern. The Theory of Games and Economc Behavor. Prnceton Unversty Press, 2nd edton, 947. 206