Lecture September 6, 2011

Similar documents
Pointwise Image Operations

Lab 3 Acceleration. What You Need To Know: Physics 211 Lab

Variation Aware Cross-Talk Aggressor Alignment by Mixed Integer Linear Programming

Revision: June 11, E Main Suite D Pullman, WA (509) Voice and Fax

Lecture #7: Discrete-time Signals and Sampling

ECE-517 Reinforcement Learning in Artificial Intelligence

Notes on the Fourier Transform

The Significance of Temporal-Difference Learning in Self-Play Training TD-rummy versus EVO-rummy

5 Spatial Relations on Lines

OpenStax-CNX module: m Elemental Signals. Don Johnson. Perhaps the most common real-valued signal is the sinusoid.

P. Bruschi: Project guidelines PSM Project guidelines.

(This lesson plan assumes the students are using an air-powered rocket as described in the Materials section.)

2.6 Special Angles on Parallel Lines Objectives: I CAN define angle pairs made by parallel lines. I CAN solve problems involving parallel lines.

Table of Contents. 3.0 SMPS Topologies. For Further Research. 3.1 Basic Components. 3.2 Buck (Step Down) 3.3 Boost (Step Up) 3.4 Inverter (Buck/Boost)

Teacher Supplement to Operation Comics, Issue #5

EECE 301 Signals & Systems Prof. Mark Fowler

Square Waves, Sinusoids and Gaussian White Noise: A Matching Pursuit Conundrum? Don Percival

x O O 3 O 05. Questions on Conditional Probability Q1. The probability that it will rain on a day in June is 0.

4.5 Biasing in BJT Amplifier Circuits

Answer Key for Week 3 Homework = 100 = 140 = 138

10. The Series Resistor and Inductor Circuit

Laplacian Mixture Modeling for Overcomplete Mixing Matrix in Wavelet Packet Domain by Adaptive EM-type Algorithm and Comparisons

Negative frequency communication

Figure A linear pair? Explain. No, because vertical angles are not adjacent angles, and linear pairs are.

March 13, 2009 CHAPTER 3: PARTIAL DERIVATIVES AND DIFFERENTIATION

Analysis of Low Density Codes and Improved Designs Using Irregular Graphs

An Emergence of Game Strategy in Multiagent Systems

Analysis of Low Density Codes. and. Improved Designs Using Irregular Graphs. 1 Introduction. codes. As the codes that Gallager builds are derived

Lecture 4. EITN Chapter 12, 13 Modulation and diversity. Antenna noise is usually given as a noise temperature!

Role of Kalman Filters in Probabilistic Algorithm

An off-line multiprocessor real-time scheduling algorithm to reduce static energy consumption

Traffic. analysis. The general setting. Example: buffer. Arrival Curves. Cumulative #bits: R(t), R*(t) Instantaneous speeds: r(t), r*(t)

Comparing image compression predictors using fractal dimension

Memorandum on Impulse Winding Tester

The University of Melbourne Department of Mathematics and Statistics School Mathematics Competition, 2013 JUNIOR DIVISION Time allowed: Two hours

Lecture 11. Digital Transmission Fundamentals

EXPERIMENT #9 FIBER OPTIC COMMUNICATIONS LINK

EE 330 Lecture 24. Amplification with Transistor Circuits Small Signal Modelling

The Relationship Between Creation and Innovation

Technology Trends & Issues in High-Speed Digital Systems

EE201 Circuit Theory I Fall

EXPERIMENT #4 AM MODULATOR AND POWER AMPLIFIER

EE 40 Final Project Basic Circuit

Spring Localization I. Roland Siegwart, Margarita Chli, Martin Rufli. ASL Autonomous Systems Lab. Autonomous Mobile Robots

NCTM Content Standard/National Science Education Standard:

DAGSTUHL SEMINAR EPIDEMIC ALGORITHMS AND PROCESSES: FROM THEORY TO APPLICATIONS

Wrap Up. Fourier Transform Sampling, Modulation, Filtering Noise and the Digital Abstraction Binary signaling model and Shannon Capacity

ECMA st Edition / June Near Field Communication Wired Interface (NFC-WI)

Signals and the frequency domain ENGR 40M lecture notes July 31, 2017 Chuan-Zheng Lee, Stanford University

6.003: Signals and Systems

Estimating a Time-Varying Phillips Curve for South Africa

Volume Author/Editor: Simon Kuznets, assisted by Elizabeth Jenks. Volume URL:

Examination Mobile & Wireless Networking ( ) April 12,

= f 8 f 2 L C. i C. 8 f C. Q1 open Q2 close (1+D)T DT 2. i C = i L. Figure 2: Typical Waveforms of a Step-Down Converter.

OPERATION MANUAL. Indoor unit for air to water heat pump system and options EKHBRD011ADV1 EKHBRD014ADV1 EKHBRD016ADV1

AN303 APPLICATION NOTE

ELEG 3124 SYSTEMS AND SIGNALS Ch. 1 Continuous-Time Signals

Explanation of Maximum Ratings and Characteristics for Thyristors

The student will create simulations of vertical components of circular and harmonic motion on GX.

OPERATION MANUAL. Indoor unit for air to water heat pump system and options EKHBRD011AAV1 EKHBRD014AAV1 EKHBRD016AAV1

Evaluation of Instantaneous Reliability Measures for a Gradual Deteriorating System

Knowledge Transfer in Semi-automatic Image Interpretation

Comparitive Analysis of Image Segmentation Techniques

Chapter 2 Summary: Continuous-Wave Modulation. Belkacem Derras

MEASUREMENTS OF VARYING VOLTAGES

A New Design of Private Information Retrieval for Storage Constrained Databases

MODEL: M6NXF1. POWER INPUT DC Power R: 24 V DC

<Diode Modules> RM200CY-24S HIGH POWER SWITCHING USE INSULATED TYPE

PREVENTIVE MAINTENANCE WITH IMPERFECT REPAIRS OF VEHICLES

University of Maryland, College Park, MD 20742, USA. San Diego, CA 92121, USA.

Inefficiency of voting in Parrondo games

FROM ANALOG TO DIGITAL

MATLAB/SIMULINK TECHNOLOGY OF THE SYGNAL MODULATION

A1 K. 12V rms. 230V rms. 2 Full Wave Rectifier. Fig. 2.1: FWR with Transformer. Fig. 2.2: Transformer. Aim: To Design and setup a full wave rectifier.

Diodes. Diodes, Page 1

ECMA-373. Near Field Communication Wired Interface (NFC-WI) 2 nd Edition / June Reference number ECMA-123:2009

Mobile Robot Localization Using Fusion of Object Recognition and Range Information

16.5 ADDITIONAL EXAMPLES

96 SLAVE CUBES. Agent. one space each Move 2 Slaves. g Action Phase. During the Pla. from each of the move up. on the board and them to the supply.

Autonomous Robotics 6905

Adaptive Approach Based on Curve Fitting and Interpolation for Boundary Effects Reduction

Robot Control using Genetic Algorithms

Control and Protection Strategies for Matrix Converters. Control and Protection Strategies for Matrix Converters

Direct Analysis of Wave Digital Network of Microstrip Structure with Step Discontinuities

A Perspective on Radio Resource Management in B3G

A Cognitive Modeling of Space using Fingerprints of Places for Mobile Robot Navigation

Location Tracking in Mobile Ad Hoc Networks using Particle Filter

UNIT IV DIGITAL MODULATION SCHEME

Fuzzy Inference Model for Learning from Experiences and Its Application to Robot Navigation

THE OSCILLOSCOPE AND NOISE. Objectives:

MODEL: M6SXF1. POWER INPUT DC Power R: 24 V DC

How to Shorten First Order Unit Testing Time. Piotr Mróz 1

Radio Resource Management in Beyond 3G Systems

Pairs of Lines and Angles

Social-aware Dynamic Router Node Placement in Wireless Mesh Networks

Demodulation Based Testing of Off Chip Driver Performance

Network Design and Optimization for Quality of Services in Wireless Local Area Networks using Multi-Objective Approach

Investigation and Simulation Model Results of High Density Wireless Power Harvesting and Transfer Method

FASER: Fast Analysis of Soft Error Susceptibility for Cell-Based Designs

A Flexible Contention Resolution Scheme for QoS Provisioning in Optical Burst Switching Networks

Transcription:

cs294-p29 Seminar on Algorihmic Game heory Sepember 6, 2011 Lecure Sepember 6, 2011 Lecurer: Chrisos H. Papadimiriou Scribes: Aloni Cohen and James Andrews 1 Game Represenaion 1.1 abular Form and he Problem of Succincness In he previous lecure, we generally deal wih games represened in abular or normal form in which each player s payoff is lised for all choices of sraegies by all players. able 1 represens he Prisoner s Dilemma in his form. able 1: Prisoner s Dilemma in abular Form silen defec silen 3,3 0,4 defec 4,0 1,1 If here are n players, each wih s sraegies o choose from, hen he number of enries in he able is n s n. Wih a represenaion his large, he goal of algorihmic efficiency becomes meaningless, as simply reading he game ino memory akes exponenial ime. I is for his reason ha alernaive game represenaions have been developed. 1.2 Graphical Games A graphical game is given by a graph G = (V, E) in which he verices correspond o he players and he presence of an edge e = (p 1, p 2 ) beween players 1 and 2 means ha he sraegies ha each player chooses (poenially) affecs he oher s payoff. For example, in Figure 1, he payoff of p 2 is given by a funcion U 2 (s 1, s 2, s 3, s 4 ) whose value depends only on he sraegies of players 1, 2, 3, and 4. For a graphical game wih n players, each of which have s sraegies each, where he graph has maximum degree d, he represenaion of a graphical game requires n s d+1 enries. For a sparse graph, his is a far smaller represenaion han normal form. Figure 1: Graphical Game Each player p i plays sraegy s i. 1-1

A decomposible graphical game is a graphical game wih he furher resricion ha he payoff for he player is affeced by he sraegies ha each neighbor plays independenly. For example, if he game in Figure 1 was a decomposible graphical game, he payoff of p 2 is given by a funcion U 2 (p 1, p 2, p 3, p 4 ) = U 2,1 (p 2, p 1 ) + U 2,3 (p 2, p 3 ) + U 2,4 (p 2, p 4 ). We will see below (Sec. 2) ha zero-sum decomposible graphical games can be solved easily wih linear programming. 1.3 Congesion Games A congesion game is given by a graph where each of he n players is assigned 2 verices and each edge is assigned an n-uple defining a delay funcion for ha edge. In he game, each player mus choose a se of edges forming a pah beween his 2 verices. Each edge of he graph has some delay, which is a funcion of only he number of players ha choose o use ha edge. A player s oal delay is he sum of he delays of all edges used on he pah. See Figure 2 for an example. For a congesion game formed by a graph wih m edges, represening he game requires roughly m n enries. Figure 2: Congesion Game In he congesion game above, suppose Player A uses edge 1 and Player B uses edge 2, and ha Players D and E use neiher edge 1 nor edge 2. If Player C uses edge 1, he delay on ha edge for Players A and C will be 2 whereas he delay for Player B (on edge 2) will be 1. Alernaively, if Player C uses edge 2, he delays on boh edge 1 and edge 2 will be 1. 1.4 Symmeric Games A game in which all players are indisinguishable is a symmeric game. Examples of such games include Prisoner s Dilemma, Rock-Paper-Scissors, and 2/3 of he Majoriy. In a symmeric game all n players have he same sraegies and same payoffs, which are a funcion only of how many players chose each sraegy, no which players chose hem. Nash proved ha every symmeric game has a symmeric Nash equilibrium - one in which every player shares a single sraegy (1951). o represen a symmeric game, we only need o sore a single payoff value for each possible se of sraegies played. Since only he number of players who choose a paricular sraegy maers, s (n+s 1 ) s 1 enries are needed. 1.4.1 Anonymous Games A generalizaion of a symmeric game is an anonymous game, one in which he players are disinguishable, bu he payoffs (possibly differen for each player) are sill a funcion only of how many players choose each sraegy raher han which players. In oher words, each player sees all he oher players as anonymous, or indisiguishable. An anonymous game requires n s (n+s 1 ) enries o represen. 1-2 s

1.5 Exensive Form (Bayesian Games) he following descripion of exensive-form games is aken from Wikipedia: An exensive-form game is a specificaion of a game in game heory, allowing (as he name suggess) explici represenaion of number of imporan aspecs, like he sequencing of players possible moves, heir choices a every decision poin, he (possibly imperfec) informaion each player has abou he oher player s moves when he makes a decision, and his payoffs for all possible game oucomes. Exensive-form games also allow represenaion of incomplee informaion in he form of chance evens encoded as moves by naure....[a]n n-player exensive-form game hus consiss of he following: A finie se of n (raional) players A rooed ree, called he game ree Each erminal (leaf) node of he game ree has an n-uple of payoffs, meaning here is one payoff for each player a he end of every possible play A pariion of he non-erminal nodes of he game ree in n+1 subses, one for each (raional) player, and wih a special subse for a ficiious player called Chance (or Naure). Each player s subse of nodes is referred o as he nodes of he player. (A game of complee informaion hus has an empy se of Chance nodes.) Each node of he Chance player has a probabiliy disribuion over is ougoing edges. Each se of nodes of a raional player is furher pariioned in informaion ses, which make cerain choices indisinguishable for he player when making a move, in he sense ha: here is a one-o-one correspondence beween ougoing edges of any wo nodes of he same informaion se hus he se of all ougoing edges of an informaion se is pariioned in equivalence classes, each class represening a possible choice for a player s move a some poin, and every (direced) pah in he ree form he roo o a erminal node can cross each informaion se a mos once he complee descripion of he game specified by he above parameers is common knowledge among he players A play is hus a pah hrough he ree from he roo o a erminal node. A any given nonerminal node belonging o Chance, an ougoing branch is chosen according o he probabiliy disribuion. A any raional player s node, he player mus choose one of he equivalence classes for he edges, which deermines precisely one ougoing edge excep (in general) he player doesn know which one is being followed. So far, every game we have seen in he above represenaions can easily be expanded ino abular form. Bu how can we reconcile Chance nodes in an exensive form game? In he example in Figure 3, each player is assigned a ype which can affec he choices made and payoffs earned. If each player ges a ype, hen he Chance node has a probabiliy disribuion on n, he se of all ype assignmens. Now we can generalize he idea of a player s sraegy (ex: do A wih some probabiliy p a, do B wih some probabiliy p b, ec) o a funcion f : sraegy ha assigns a sraegy o each ype. Given an assignmen of ypes, he game can ake a abular form. 1.6 Ohers here are many oher ypes of games and game represenaions, including bu cerainly no limied o Scheduling, Faciliy Locaion, and Nework Design games. 1-3

Figure 3: Exensive Form - An even simpler poker game In his game, Players 1 and 2 each pay $1 o play. hen Naure (he dealer) gives Player 1 a card ha s High or Low. Player 1 can eiher fold (in which case Player 2 ges he po) or be an addiional $1. If Player 2 bes, hen Player 1 can eiher fold (in which case Player 2 ges he po) or call by placing anoher $1 in he po. If boh players be hen Player 1 wins if his card was High loses if i was Low. Afer Player 1 bes, Player 2 has a single Informaion Se - he wo circled nodes are indisinguishable from his poin of view. his example was aken from hp://www.u.arizona.edu/ mwalker/pokergame.pdf. 1-4

2 Decomposable Zero-Sum Games In he homework for week 1, we saw a simple case of a decomposable graphical game. In ha scenario, Player B was playing 2 zero-sum games simulaneously, one agains Player A and one agains Player C. he cach was ha Player B had o a single sraegy o apply o boh games and his payoff was o be he sum of his payoffs from he wo separae games. As an exercise, we showed ha his could be solved by a linear program. I urns ou ha his approach can be used more generally. heorem 1. In any zero-sum decomposable graphical game, he Nash Equilibrium can be found in polynomial ime. Proof: Firs, we mus prove ha a Nash Equilibrium exiss a all. o do his, we simply use he sledge hammer ha is Nash s heorem - namely ha every game has a Nash Equilibrium. Now our ask is o find one. Define he following variables: x u,j = Pr[player u chooses acion j] U u,v (i, j) =he payoff for player u if u plays i and v plays j. L u,i ( x) = he expeced payoff for player u if he chooses acion i given ha he oher players fix heir sraegies x v,j. Noice ha L u,i is a linear funcion because he game is decomposable. In paricular, L u,i = v,j U u,v(i, j) x v,j. Consider he following linear program: Minimize u w u subjec o: w u L u,i u, i xu,i = 1 u he following claim complees he proof: x u,i 0 u, i Claim 1. he minimum is achieved a 0 which is a Nash Equilibrium. Proof: Define: w u = i x u,il u,i - his is he average gain of player u. Obviously, u w u = 0 because his is a zero-sum game. From he linear program we have w u w u. So o minimize w u, we can se w u = w u u, yielding u w u = 0, which is minimal. hus, for every sraegy i, L u,i w u is no beer han he average w u, proving ha i is indeed a Nash Equilibrium. 3 An Ancien Algorihm: Ficiious Play In addiion o he linear programming mehod we learned abou previously, here are several naural algorihms for solving zero sum games. Of hese, we firs discuss ficiious play, a sraegy in which we imagine he wo players repeaedly playing wih a naive sraegy. he firs round of ficiious play is played randomly. For each subsequen round, boh players look a he hisory of plays by he oher, and assume ha hese hisorical plays represen he sraegy ha player will use: a sraegy ha was used in p% of he pas rounds is assumed o be played wih probabiliy p%. Each player hen plays he bes response o ha assumed hisory-based sraegy. Over ime, hese hisorical sraegies will converge o an equilibrium sraegy. As an example, consider a zero-sum game wih row player payoff marix: R = 2 1 0 2 0 3 1 3 3 (1) 1-5

(and column player payoff marix C = R) he ficiious play algorihm may hen run as follows: row plays col plays u v (ave gain vecor by row) (ave loss) 0 1 3 0,3,-3 2,1,0 1 2 3 0,3,-3 2, 1 2, 3 2 2 2 2 1 3,2,-1 2, 1 3,2 ec Noe ha his process bounds he equilibrium value v as: max u v min v heorem 2. (Robinson, 1950) Ficiious play converges o he Nash equilibrium. Robinson s proof of convergence indicaes a rae of ( c ɛ )m+n (in he wors case, no probabilisic). Karlin conjecured (1965) a faser convergence rae ( c ɛ )2 suffices. 4 Anoher Naural Algorihm: Expers Anoher naural algorihm is based on a concep called boosing, or alernaively no regre learning, hedging, expers, or muliplicaive updaes (MU). We will firs inroduce his concep, and hen show how i can be applied o zero sum games. 4.1 Inroducion o Expers Algorihms In hese algorihms, n expers sugges sraegies over ime, and he player can decide which mix of exper sraegies o use based on he hisorical losses observed from each exper s sraegies. Specifically, a each ime sep, he player produces an n-vecor of weighs w indicaing how much o follow each exper s sraegy, and hen naure produces an n-vecor of losses l caused by following each sraegy. w he player s loss is hen compued as w 1 l. Cumulaive loss over ime is L = w =1 w 1 l. For example, we may imagine he following sequence of sraegies, weighs, and resuling losses from 3 expers: = 1 = 2 = 3 = 4 weigh loss weigh loss weigh loss weigh loss 1 4 1 2 2 1 2 3 1 1 2 3 2 1 2 7 1 3 1 4 1 0 2 4 8 12 4 14 3 4 5 3 he goal is o choose weighs w o do well. Doing well could mean geing he smalles cumulaive loss L, or predicing he bes row (bes exper). Given his seup, several algorihms for choosing he weigh vecors are possible: Follow he leader: For each round choose he exper i who has he lowes loss on average so far, and se w i = 1, w j i = 0. I is easy o consruc a wors case scenario in which his algorihm incurs oal loss which is n imes worse han following he rue bes exper. Muliplicaive updaes (MU): Punish ih exper by punishmen funcion: xπ(l i ) = x(1 + ɛ)l i, wih π(l i ) bounded by α x π(x) 1 (1 α)x heorem 3. Wih muliplicaive updaes, cumulaive loss L min i L i Proof: n i=1 w +1 i = n wi π(li) i=1 ln 1 α 1 α + ln(n) 1 α 1-6

i w i(1 (1 α)l i) = ( i w i)( i p i(1 (1 α)l i)) ln( i w +1 i ) ln( i w ) + ln(1 (1 α) l ) ln( i w +1 i ) ln( i w ) (1 α) l ln( i w +1 i ) (1 α)l L ln( i w +1 ) 1 α ln w i 1 α i w +1 i w 1 i α l1 i +l2 i +...+l i 1 n αl i L ln(n) 1 α + ln(α) 1 α L i i ln n Furher, noe ha seing α = 1 L min L i + 2 ln n i 1 1 ln n L min i L + 2 4.2 Applicaion o Zero Sum Games Given a zero sum game (A, A), we can le boh players play he game repeaedly, using muliplicaive updaes o updae heir sraegies, x and y. hen we can analyze he convergence of his mehod as follows: L = (x Ay ) (e i A( y ) O( ) i( ) =1 x Ay x Ae j + O( ) j x Ay y x A + O( )( ) (, ) y x A( ) e ia y O( ) x y ( )A( ) e ia y O( ( 1 )) i x herefore, approximaes he Nash equilibrium for he row player And similarly he algorihm also converges for he column player. Noe ha ( c ɛ )2, so his converges a he rae Karlin conjecured for ficiious play. 1-7