Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen

Size: px
Start display at page:

Download "Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen"

Transcription

1 Policy Teaching Through Reward Function Learning Haoqi Zhang, David Parkes, and Yiling Chen School of Engineering and Applied Sciences Harvard University ACM EC 2009 Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

2 Interested party and agent Web 2.0 site wants a user to... Online retailer wants a customer to... Ad-network wants a publisher to... Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

3 Interested party and agent Web 2.0 site wants a user to... Online retailer wants a customer to... Ad-network wants a publisher to... Often, agent does not behave as desired. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

4 Interested party and agent Web 2.0 site wants a user to... Online retailer wants a customer to... Ad-network wants a publisher to... Often, agent does not behave as desired. Idea: provide incentives. Effective incentives depend on agent preferences. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

5 Policy Teaching An agent performs a sequence of observable actions [MDP]. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

6 Policy Teaching An agent performs a sequence of observable actions [MDP]. The interested party can associate limited rewards with states. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

7 Policy Teaching An agent performs a sequence of observable actions [MDP]. The interested party can associate limited rewards with states. Can interact multiple times, but cannot impose actions. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

8 Policy Teaching An agent performs a sequence of observable actions [MDP]. The interested party can associate limited rewards with states. Can interact multiple times, but cannot impose actions. Goal: to induce desired behavior quickly and at a low cost. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

9 Policy Teaching An agent performs a sequence of observable actions [MDP]. The interested party can associate limited rewards with states. Can interact multiple times, but cannot impose actions. Goal: to induce desired behavior quickly and at a low cost. Policy Teaching is an example of Environment Design. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

10 Mechanism Design vs. Environment Design Mechanism Design Environment Design elicit preferences via direct queries infer preferences from behavior Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

11 Mechanism Design vs. Environment Design Mechanism Design Environment Design elicit preferences via direct queries center implements outcomes infer preferences from behavior agents take actions, not center Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

12 Mechanism Design vs. Environment Design Mechanism Design Environment Design elicit preferences via direct queries center implements outcomes equilibrium analysis infer preferences from behavior agents take actions, not center agent is myopic to incentives Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

13 Understanding preferences Direct preference elicitation is costly and intrusive. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

14 Understanding preferences Direct preference elicitation is costly and intrusive. Passive indirect elicitation is insufficient. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

15 Understanding preferences Direct preference elicitation is costly and intrusive. Passive indirect elicitation is insufficient. Use active, indirect elicitation method [Z. & Parkes, 2008] Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

16 This paper Objective: induce a fixed target policy. Easier for interested party to specify policy than utility function. More tractable than value-based policy teaching [Z. & Parkes, 08] Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

17 Main results Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

18 Main results Finding limited incentives to induce pre-specified policy is in P. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

19 Main results Finding limited incentives to induce pre-specified policy is in P. With unknown rewards, polynomial time algorithm to find incentives that induce desired policy after logarithmic interactions. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

20 Main results Finding limited incentives to induce pre-specified policy is in P. With unknown rewards, polynomial time algorithm to find incentives that induce desired policy after logarithmic interactions. Tractable slack-based heuristic with empirical results in a simulated, ad-network setting. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

21 Main results Finding limited incentives to induce pre-specified policy is in P. With unknown rewards, polynomial time algorithm to find incentives that induce desired policy after logarithmic interactions. Tractable slack-based heuristic with empirical results in a simulated, ad-network setting. Extension to partial observations and partial target policies. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

22 Main results Finding limited incentives to induce pre-specified policy is in P. With unknown rewards, polynomial time algorithm to find incentives that induce desired policy after logarithmic interactions. Tractable slack-based heuristic with empirical results in a simulated, ad-network setting. Extension to partial observations and partial target policies. Game-theoretic analysis to handle strategic agents. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

23 Markov Decision Process Definition An infinite horizon MDP is a model M = {S, A, R, P, γ}: S is the finite set of states. A is the finite set of available actions. R : S R is the reward function. P : S A S [0, 1] is the transition function. γ is the discount factor from (0, 1). We assume bounded rewards: R(s) < R max for all s S. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

24 Example: hyperlink design in ad-network Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

25 Example: hyperlink design in ad-network MDP model c.f. Immorlica et al, 2006 Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

26 Example: hyperlink design in ad-network Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

27 Policy Teaching with known rewards Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

28 Policy Teaching with known rewards Agent performs his optimal policy π. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

29 Policy Teaching with known rewards Agent performs his optimal policy π. Interested party can provide admissible incentive : S R (e.g., within budget and without punishment). Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

30 Policy Teaching with known rewards Agent performs his optimal policy π. Interested party can provide admissible incentive : S R (e.g., within budget and without punishment). Agent performs π w.r.t. R +. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

31 Policy Teaching with known rewards Agent performs his optimal policy π. Interested party can provide admissible incentive : S R (e.g., within budget and without punishment). Agent performs π w.r.t. R +. Goal: provide minimal admissible to induce π T. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

32 Policy Teaching with known rewards Agent performs his optimal policy π. Interested party can provide admissible incentive : S R (e.g., within budget and without punishment). Agent performs π w.r.t. R +. Goal: provide minimal admissible to induce π T. Use Inverse Reinforcement Learning (IRL) [Ng and Russell, 2000] rewards consistent with a policy is given by linear constraints. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

33 Policy Teaching with known rewards Agent performs his optimal policy π. Interested party can provide admissible incentive : S R (e.g., within budget and without punishment). Agent performs π w.r.t. R +. Goal: provide minimal admissible to induce π T. Use Inverse Reinforcement Learning (IRL) [Ng and Russell, 2000] rewards consistent with a policy is given by linear constraints. Linear Programming formulation Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

34 Policy Teaching with unknown rewards Typically, the interested party won t know the agent s reward. Idea: provide incentives, observe behavior, and repeat. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

35 An indirect approach R IRL space IRL πt Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

36 An indirect approach R R T R IRL space IRL πt Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

37 An indirect approach R R T R IRL space IRL πt Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

38 An indirect approach R R IRL space IRL πt Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

39 An indirect approach R R IRL space IRL πt Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

40 An indirect approach R R IRL π IRL space IRL πt Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

41 An indirect approach R R IRL π IRL space IRL πt Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

42 Convergence result Theorem The elicitation method terminates after finite steps with admissible incentives that induce π T if such exists. Intuition. Pigeonhole argument on # of hypercubes that fit in IRL space. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

43 Choosing an objective function Few elicitation rounds Tractable Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

44 Centroid-based approach Theorem (Grünbaum, 1960) Any halfspace containing the centroid of a convex set contains at least of its volume. 1 e Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

45 Centroid-based approach Theorem (Grünbaum, 1960) Any halfspace containing the centroid of a convex set contains at least of its volume. 1 e Pick the centroid of the IRL space for R at every iteration Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

46 Centroid-based approach Theorem (Grünbaum, 1960) Any halfspace containing the centroid of a convex set contains at least of its volume. 1 e Pick the centroid of the IRL space for R at every iteration Adding IRL constraints eliminate at least 1 e of its volume. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

47 Centroid-based approach Theorem (Grünbaum, 1960) Any halfspace containing the centroid of a convex set contains at least of its volume. 1 e Pick the centroid of the IRL space for R at every iteration Adding IRL constraints eliminate at least 1 e of its volume. Idea: maintain volume around true reward that s never eliminated Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

48 Centroid-based approach Theorem (Grünbaum, 1960) Any halfspace containing the centroid of a convex set contains at least of its volume. 1 e Pick the centroid of the IRL space for R at every iteration Adding IRL constraints eliminate at least 1 e of its volume. Idea: maintain volume around true reward that s never eliminated Algorithm: use relaxation of IRL constraints for observations. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

49 Centroid-based approach Theorem (Grünbaum, 1960) Any halfspace containing the centroid of a convex set contains at least of its volume. 1 e Pick the centroid of the IRL space for R at every iteration Adding IRL constraints eliminate at least 1 e of its volume. Idea: maintain volume around true reward that s never eliminated Algorithm: use relaxation of IRL constraints for observations. Obtain logarithmic bound on the number of elicitation rounds. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

50 Computing the centroid Centroid is #P-hard to compute [Rademacher, 2007]. Approximate via sampling in P [Bertsimas and Vempala, 2004]. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

51 Computing the centroid Centroid is #P-hard to compute [Rademacher, 2007]. Approximate via sampling in P [Bertsimas and Vempala, 2004]. Find approximate centroid in polynomial time to obtain logarithmic convergence bound with arbitrarily high probability. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

52 Computing the centroid Centroid is #P-hard to compute [Rademacher, 2007]. Approximate via sampling in P [Bertsimas and Vempala, 2004]. Find approximate centroid in polynomial time to obtain logarithmic convergence bound with arbitrarily high probability. But: algorithm is about O( S 6 ). bound not representative of actual performance. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

53 Two-sided slack maximization heuristic R R R T IRL space IRL πt Idea from Z. & Parkes, Here formulation is a linear program Generates S A new constraints at each round. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

54 Example: An ad-network setting Publisher designs link structure on website to maximize utility. An ad-network provides incentives to influence the link design. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

55 Example: An ad-network setting Publisher designs link structure on website to maximize utility. An ad-network provides incentives to influence the link design. Results on slack-based heuristics, 20 to 100 web pages: Elicitation takes 8 12 rounds # of rounds about constant as number of states increases. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

56 Extension: partial observation and partial target policy Partial observation: observe agent actions in some of the states. Partial target policy: care about the agent s policy in certain states. Goal: induce desired partial policy in observable states. Can formulate as mixed integer program and get convergence. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

57 Handling forward-looking agents Consider interactions as an infinitely repeated game. A strategic agent may misrepresent its preferences. Consider a trigger strategy: provide maximal admissible incentives. If agent does not perform π T, provide no future incentives. This approach is unsatisfying: Doesn t work for myopic agents Commitment issues Hard to implement in applications Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

58 Handling forward-looking agents: a solution Idea: use elicitation method until elicit desired policy or no possible rewards remain. Agent may misrepresent, but if patient will want to keep getting incentives. Agent will follow desired policy as best response. Benefits: Works for myopic and strategic agent Can still get fast convergence Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

59 The message There are tractable methods for finding limited incentives to quickly induce desired sequential agent behavior when: Direct queries about agent preferences are unavailable Can observe agent behavior over time Many interesting open questions, both theoretical and practical. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

60 Thank you I would love to get your comments and suggestions! hq@eecs.harvard.edu. Haoqi Zhang (Harvard University) Policy Teaching ACM EC / 23

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer Learning via Delayed Knowledge A Case of Jamming SaiDhiraj Amuru and R. Michael Buehrer 1 Why do we need an Intelligent Jammer? Dynamic environment conditions in electronic warfare scenarios failure of

More information

Resource Management in QoS-Aware Wireless Cellular Networks

Resource Management in QoS-Aware Wireless Cellular Networks Resource Management in QoS-Aware Wireless Cellular Networks Zhi Zhang Dept. of Electrical and Computer Engineering Colorado State University April 24, 2009 Zhi Zhang (ECE CSU) Resource Management in Wireless

More information

Tracking of Real-Valued Markovian Random Processes with Asymmetric Cost and Observation

Tracking of Real-Valued Markovian Random Processes with Asymmetric Cost and Observation Tracking of Real-Valued Markovian Random Processes with Asymmetric Cost and Observation Parisa Mansourifard Joint work with: Prof. Bhaskar Krishnamachari (USC) and Prof. Tara Javidi (UCSD) Ming Hsieh Department

More information

CSE 473 Midterm Exam Feb 8, 2018

CSE 473 Midterm Exam Feb 8, 2018 CSE 473 Midterm Exam Feb 8, 2018 Name: This exam is take home and is due on Wed Feb 14 at 1:30 pm. You can submit it online (see the message board for instructions) or hand it in at the beginning of class.

More information

Iteration. Many thanks to Alan Fern for the majority of the LSPI slides.

Iteration. Many thanks to Alan Fern for the majority of the LSPI slides. Approximate Click to edit Master titlepolicy style Iteration Click to edit Emma Master Brunskill subtitle style Many thanks to Alan Fern for the majority of the LSPI slides. https://web.engr.oregonstate.edu/~afern/classes/cs533/notes/lspi.pdf

More information

CMU-Q Lecture 20:

CMU-Q Lecture 20: CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro ICE-CREAM WARS http://youtu.be/jilgxenbk_8 2 GAME THEORY Game theory is the formal study of conflict and cooperation in (rational) multi-agent

More information

AI Agent for Ants vs. SomeBees: Final Report

AI Agent for Ants vs. SomeBees: Final Report CS 221: ARTIFICIAL INTELLIGENCE: PRINCIPLES AND TECHNIQUES 1 AI Agent for Ants vs. SomeBees: Final Report Wanyi Qian, Yundong Zhang, Xiaotong Duan Abstract This project aims to build a real-time game playing

More information

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Slides borrowed from Katerina Fragkiadaki Solving known MDPs: Dynamic Programming Markov Decision Process (MDP)! A Markov Decision Process

More information

The Future of Network Science: Guiding the Formation of Networks

The Future of Network Science: Guiding the Formation of Networks The Future of Network Science: Guiding the Formation of Networks Mihaela van der Schaar and Simpson Zhang University of California, Los Angeles Acknowledgement: ONR 1 Agenda Establish methods for guiding

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1

CS 188 Fall Introduction to Artificial Intelligence Midterm 1 CS 188 Fall 2018 Introduction to Artificial Intelligence Midterm 1 You have 120 minutes. The time will be projected at the front of the room. You may not leave during the last 10 minutes of the exam. Do

More information

Domination Rationalizability Correlated Equilibrium Computing CE Computational problems in domination. Game Theory Week 3. Kevin Leyton-Brown

Domination Rationalizability Correlated Equilibrium Computing CE Computational problems in domination. Game Theory Week 3. Kevin Leyton-Brown Game Theory Week 3 Kevin Leyton-Brown Game Theory Week 3 Kevin Leyton-Brown, Slide 1 Lecture Overview 1 Domination 2 Rationalizability 3 Correlated Equilibrium 4 Computing CE 5 Computational problems in

More information

Microeconomics II Lecture 2: Backward induction and subgame perfection Karl Wärneryd Stockholm School of Economics November 2016

Microeconomics II Lecture 2: Backward induction and subgame perfection Karl Wärneryd Stockholm School of Economics November 2016 Microeconomics II Lecture 2: Backward induction and subgame perfection Karl Wärneryd Stockholm School of Economics November 2016 1 Games in extensive form So far, we have only considered games where players

More information

Game Theory: Normal Form Games

Game Theory: Normal Form Games Game Theory: Normal Form Games CPSC 322 Lecture 34 April 3, 2006 Reading: excerpt from Multiagent Systems, chapter 3. Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 1 Lecture Overview Recap

More information

Solving Coup as an MDP/POMDP

Solving Coup as an MDP/POMDP Solving Coup as an MDP/POMDP Semir Shafi Dept. of Computer Science Stanford University Stanford, USA semir@stanford.edu Adrien Truong Dept. of Computer Science Stanford University Stanford, USA aqtruong@stanford.edu

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

Minmax and Dominance

Minmax and Dominance Minmax and Dominance CPSC 532A Lecture 6 September 28, 2006 Minmax and Dominance CPSC 532A Lecture 6, Slide 1 Lecture Overview Recap Maxmin and Minmax Linear Programming Computing Fun Game Domination Minmax

More information

Summary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility

Summary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility Summary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility theorem (consistent decisions under uncertainty should

More information

ECON 312: Games and Strategy 1. Industrial Organization Games and Strategy

ECON 312: Games and Strategy 1. Industrial Organization Games and Strategy ECON 312: Games and Strategy 1 Industrial Organization Games and Strategy A Game is a stylized model that depicts situation of strategic behavior, where the payoff for one agent depends on its own actions

More information

LECTURE 26: GAME THEORY 1

LECTURE 26: GAME THEORY 1 15-382 COLLECTIVE INTELLIGENCE S18 LECTURE 26: GAME THEORY 1 INSTRUCTOR: GIANNI A. DI CARO ICE-CREAM WARS http://youtu.be/jilgxenbk_8 2 GAME THEORY Game theory is the formal study of conflict and cooperation

More information

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures

More information

Reinforcement Learning Applied to a Game of Deceit

Reinforcement Learning Applied to a Game of Deceit Reinforcement Learning Applied to a Game of Deceit Theory and Reinforcement Learning Hana Lee leehana@stanford.edu December 15, 2017 Figure 1: Skull and flower tiles from the game of Skull. 1 Introduction

More information

Optimization Techniques for Alphabet-Constrained Signal Design

Optimization Techniques for Alphabet-Constrained Signal Design Optimization Techniques for Alphabet-Constrained Signal Design Mojtaba Soltanalian Department of Electrical Engineering California Institute of Technology Stanford EE- ISL Mar. 2015 Optimization Techniques

More information

UNIVERSITY of PENNSYLVANIA CIS 391/521: Fundamentals of AI Midterm 1, Spring 2010

UNIVERSITY of PENNSYLVANIA CIS 391/521: Fundamentals of AI Midterm 1, Spring 2010 UNIVERSITY of PENNSYLVANIA CIS 391/521: Fundamentals of AI Midterm 1, Spring 2010 Question Points 1 Environments /2 2 Python /18 3 Local and Heuristic Search /35 4 Adversarial Search /20 5 Constraint Satisfaction

More information

Section Notes 6. Game Theory. Applied Math 121. Week of March 22, understand the difference between pure and mixed strategies.

Section Notes 6. Game Theory. Applied Math 121. Week of March 22, understand the difference between pure and mixed strategies. Section Notes 6 Game Theory Applied Math 121 Week of March 22, 2010 Goals for the week be comfortable with the elements of game theory. understand the difference between pure and mixed strategies. be able

More information

CS510 \ Lecture Ariel Stolerman

CS510 \ Lecture Ariel Stolerman CS510 \ Lecture04 2012-10-15 1 Ariel Stolerman Administration Assignment 2: just a programming assignment. Midterm: posted by next week (5), will cover: o Lectures o Readings A midterm review sheet will

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Reinforcement Learning Assumptions we made so far: Known state space S Known transition model T(s, a, s ) Known reward function R(s) not realistic for many real agents Reinforcement

More information

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

Appendix A A Primer in Game Theory

Appendix A A Primer in Game Theory Appendix A A Primer in Game Theory This presentation of the main ideas and concepts of game theory required to understand the discussion in this book is intended for readers without previous exposure to

More information

Decentralized Cognitive MAC for Opportunistic Spectrum Access in Ad-Hoc Networks: A POMDP Framework

Decentralized Cognitive MAC for Opportunistic Spectrum Access in Ad-Hoc Networks: A POMDP Framework Decentralized Cognitive MAC for Opportunistic Spectrum Access in Ad-Hoc Networks: A POMDP Framework Qing Zhao, Lang Tong, Anathram Swami, and Yunxia Chen EE360 Presentation: Kun Yi Stanford University

More information

Chapter 12. Cross-Layer Optimization for Multi- Hop Cognitive Radio Networks

Chapter 12. Cross-Layer Optimization for Multi- Hop Cognitive Radio Networks Chapter 12 Cross-Layer Optimization for Multi- Hop Cognitive Radio Networks 1 Outline CR network (CRN) properties Mathematical models at multiple layers Case study 2 Traditional Radio vs CR Traditional

More information

Two-stage column generation and applications in container terminal management

Two-stage column generation and applications in container terminal management Two-stage column generation and applications in container terminal management Ilaria Vacca Matteo Salani Michel Bierlaire Transport and Mobility Laboratory EPFL 8th Swiss Transport Research Conference

More information

Reinforcement Learning for Ethical Decision Making

Reinforcement Learning for Ethical Decision Making Reinforcement Learning for Ethical Decision Making The Workshops of the Thirtieth AAAI Conference on Artificial Intelligence AI, Ethics, and Society: Technical Report WS-16-02 David Abel, James MacGlashan,

More information

Five-In-Row with Local Evaluation and Beam Search

Five-In-Row with Local Evaluation and Beam Search Five-In-Row with Local Evaluation and Beam Search Jiun-Hung Chen and Adrienne X. Wang jhchen@cs axwang@cs Abstract This report provides a brief overview of the game of five-in-row, also known as Go-Moku,

More information

A Multi Armed Bandit Formulation of Cognitive Spectrum Access

A Multi Armed Bandit Formulation of Cognitive Spectrum Access 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Q-Learning Algorithms for Constrained Markov Decision Processes with Randomized Monotone Policies: Application to MIMO Transmission Control

Q-Learning Algorithms for Constrained Markov Decision Processes with Randomized Monotone Policies: Application to MIMO Transmission Control Q-Learning Algorithms for Constrained Markov Decision Processes with Randomized Monotone Policies: Application to MIMO Transmission Control Dejan V. Djonin, Vikram Krishnamurthy, Fellow, IEEE Abstract

More information

Strategies and Game Theory

Strategies and Game Theory Strategies and Game Theory Prof. Hongbin Cai Department of Applied Economics Guanghua School of Management Peking University March 31, 2009 Lecture 7: Repeated Game 1 Introduction 2 Finite Repeated Game

More information

Topic 1: defining games and strategies. SF2972: Game theory. Not allowed: Extensive form game: formal definition

Topic 1: defining games and strategies. SF2972: Game theory. Not allowed: Extensive form game: formal definition SF2972: Game theory Mark Voorneveld, mark.voorneveld@hhs.se Topic 1: defining games and strategies Drawing a game tree is usually the most informative way to represent an extensive form game. Here is one

More information

TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill

TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS Thomas Keller and Malte Helmert Presented by: Ryan Berryhill Outline Motivation Background THTS framework THTS algorithms Results Motivation Advances

More information

Modeling the Dynamics of Coalition Formation Games for Cooperative Spectrum Sharing in an Interference Channel

Modeling the Dynamics of Coalition Formation Games for Cooperative Spectrum Sharing in an Interference Channel Modeling the Dynamics of Coalition Formation Games for Cooperative Spectrum Sharing in an Interference Channel Zaheer Khan, Savo Glisic, Senior Member, IEEE, Luiz A. DaSilva, Senior Member, IEEE, and Janne

More information

Repeated Games. Economics Microeconomic Theory II: Strategic Behavior. Shih En Lu. Simon Fraser University (with thanks to Anke Kessler)

Repeated Games. Economics Microeconomic Theory II: Strategic Behavior. Shih En Lu. Simon Fraser University (with thanks to Anke Kessler) Repeated Games Economics 302 - Microeconomic Theory II: Strategic Behavior Shih En Lu Simon Fraser University (with thanks to Anke Kessler) ECON 302 (SFU) Repeated Games 1 / 25 Topics 1 Information Sets

More information

Downlink Scheduler Optimization in High-Speed Downlink Packet Access Networks

Downlink Scheduler Optimization in High-Speed Downlink Packet Access Networks Downlink Scheduler Optimization in High-Speed Downlink Packet Access Networks Hussein Al-Zubaidy SCE-Carleton University 1125 Colonel By Drive, Ottawa, ON, Canada Email: hussein@sce.carleton.ca 21 August

More information

Design of intelligent surveillance systems: a game theoretic case. Nicola Basilico Department of Computer Science University of Milan

Design of intelligent surveillance systems: a game theoretic case. Nicola Basilico Department of Computer Science University of Milan Design of intelligent surveillance systems: a game theoretic case Nicola Basilico Department of Computer Science University of Milan Outline Introduction to Game Theory and solution concepts Game definition

More information

Agenda. Intro to Game Theory. Why Game Theory. Examples. The Contractor. Games of Strategy vs other kinds

Agenda. Intro to Game Theory. Why Game Theory. Examples. The Contractor. Games of Strategy vs other kinds Agenda Intro to Game Theory AUECO 220 Why game theory Games of Strategy Examples Terminology Why Game Theory Provides a method of solving problems where each agent takes into account how others will react

More information

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game Outline Game Playing ECE457 Applied Artificial Intelligence Fall 2007 Lecture #5 Types of games Playing a perfect game Minimax search Alpha-beta pruning Playing an imperfect game Real-time Imperfect information

More information

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

Multiple Agents. Why can t we all just get along? (Rodney King)

Multiple Agents. Why can t we all just get along? (Rodney King) Multiple Agents Why can t we all just get along? (Rodney King) Nash Equilibriums........................................ 25 Multiple Nash Equilibriums................................. 26 Prisoners Dilemma.......................................

More information

3 Game Theory II: Sequential-Move and Repeated Games

3 Game Theory II: Sequential-Move and Repeated Games 3 Game Theory II: Sequential-Move and Repeated Games Recognizing that the contributions you make to a shared computer cluster today will be known to other participants tomorrow, you wonder how that affects

More information

ECON 301: Game Theory 1. Intermediate Microeconomics II, ECON 301. Game Theory: An Introduction & Some Applications

ECON 301: Game Theory 1. Intermediate Microeconomics II, ECON 301. Game Theory: An Introduction & Some Applications ECON 301: Game Theory 1 Intermediate Microeconomics II, ECON 301 Game Theory: An Introduction & Some Applications You have been introduced briefly regarding how firms within an Oligopoly interacts strategically

More information

Geometric Programming and its Application in Network Resource Allocation. Presented by: Bin Wang

Geometric Programming and its Application in Network Resource Allocation. Presented by: Bin Wang Geometric Programming and its Application in Network Resource Allocation Presented by: Bin Wang Why this talk? Nonlinear and nonconvex problem, can be turned into nonlinear convex problem Global optimal,

More information

Game Theory and Algorithms Lecture 3: Weak Dominance and Truthfulness

Game Theory and Algorithms Lecture 3: Weak Dominance and Truthfulness Game Theory and Algorithms Lecture 3: Weak Dominance and Truthfulness March 1, 2011 Summary: We introduce the notion of a (weakly) dominant strategy: one which is always a best response, no matter what

More information

Contents. MA 327/ECO 327 Introduction to Game Theory Fall 2017 Notes. 1 Wednesday, August Friday, August Monday, August 28 6

Contents. MA 327/ECO 327 Introduction to Game Theory Fall 2017 Notes. 1 Wednesday, August Friday, August Monday, August 28 6 MA 327/ECO 327 Introduction to Game Theory Fall 2017 Notes Contents 1 Wednesday, August 23 4 2 Friday, August 25 5 3 Monday, August 28 6 4 Wednesday, August 30 8 5 Friday, September 1 9 6 Wednesday, September

More information

An Empirical Evaluation of Policy Rollout for Clue

An Empirical Evaluation of Policy Rollout for Clue An Empirical Evaluation of Policy Rollout for Clue Eric Marshall Oregon State University M.S. Final Project marshaer@oregonstate.edu Adviser: Professor Alan Fern Abstract We model the popular board game

More information

Extensive Form Games. Mihai Manea MIT

Extensive Form Games. Mihai Manea MIT Extensive Form Games Mihai Manea MIT Extensive-Form Games N: finite set of players; nature is player 0 N tree: order of moves payoffs for every player at the terminal nodes information partition actions

More information

CSE 591: Human-aware Robotics

CSE 591: Human-aware Robotics CSE 591: Human-aware Robotics Instructor: Dr. Yu ( Tony ) Zhang Location & Times: CAVC 359, Tue/Thu, 9:00--10:15 AM Office Hours: BYENG 558, Tue/Thu, 10:30--11:30AM Nov 8, 2016 Slides adapted from Subbarao

More information

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Monte Carlo Tree Search and AlphaGo Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Zero-Sum Games and AI A player s utility gain or loss is exactly balanced by the combined gain or loss of opponents:

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

Fictitious Play applied on a simplified poker game

Fictitious Play applied on a simplified poker game Fictitious Play applied on a simplified poker game Ioannis Papadopoulos June 26, 2015 Abstract This paper investigates the application of fictitious play on a simplified 2-player poker game with the goal

More information

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.)

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Eric B. Laber February 12, 2008 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or,

More information

CS 188 Introduction to Fall 2014 Artificial Intelligence Midterm

CS 188 Introduction to Fall 2014 Artificial Intelligence Midterm CS 88 Introduction to Fall Artificial Intelligence Midterm INSTRUCTIONS You have 8 minutes. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators only.

More information

CMU Lecture 22: Game Theory I. Teachers: Gianni A. Di Caro

CMU Lecture 22: Game Theory I. Teachers: Gianni A. Di Caro CMU 15-781 Lecture 22: Game Theory I Teachers: Gianni A. Di Caro GAME THEORY Game theory is the formal study of conflict and cooperation in (rational) multi-agent systems Decision-making where several

More information

1890 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 30, NO. 10, NOVEMBER 2012

1890 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 30, NO. 10, NOVEMBER 2012 1890 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 30, NO. 10, NOVEMBER 2012 Dynamic Spectrum Sharing Among Repeatedly Interacting Selfish Users With Imperfect Monitoring Yuanzhang Xiao and Mihaela

More information

Alternation in the repeated Battle of the Sexes

Alternation in the repeated Battle of the Sexes Alternation in the repeated Battle of the Sexes Aaron Andalman & Charles Kemp 9.29, Spring 2004 MIT Abstract Traditional game-theoretic models consider only stage-game strategies. Alternation in the repeated

More information

Wireless Network Pricing Chapter 7: Network Externalities

Wireless Network Pricing Chapter 7: Network Externalities Wireless Network Pricing Chapter 7: Network Externalities Jianwei Huang & Lin Gao Network Communications and Economics Lab (NCEL) Information Engineering Department The Chinese University of Hong Kong

More information

Chapter 3 Learning in Two-Player Matrix Games

Chapter 3 Learning in Two-Player Matrix Games Chapter 3 Learning in Two-Player Matrix Games 3.1 Matrix Games In this chapter, we will examine the two-player stage game or the matrix game problem. Now, we have two players each learning how to play

More information

Optimizing Media Access Strategy for Competing Cognitive Radio Networks Y. Gwon, S. Dastangoo, H. T. Kung

Optimizing Media Access Strategy for Competing Cognitive Radio Networks Y. Gwon, S. Dastangoo, H. T. Kung Optimizing Media Access Strategy for Competing Cognitive Radio Networks Y. Gwon, S. Dastangoo, H. T. Kung December 12, 2013 Presented at IEEE GLOBECOM 2013, Atlanta, GA Outline Introduction Competing Cognitive

More information

ECON 282 Final Practice Problems

ECON 282 Final Practice Problems ECON 282 Final Practice Problems S. Lu Multiple Choice Questions Note: The presence of these practice questions does not imply that there will be any multiple choice questions on the final exam. 1. How

More information

UMBC 671 Midterm Exam 19 October 2009

UMBC 671 Midterm Exam 19 October 2009 Name: 0 1 2 3 4 5 6 total 0 20 25 30 30 25 20 150 UMBC 671 Midterm Exam 19 October 2009 Write all of your answers on this exam, which is closed book and consists of six problems, summing to 160 points.

More information

OPPORTUNISTIC spectrum access (OSA), first envisioned

OPPORTUNISTIC spectrum access (OSA), first envisioned IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 5, MAY 2008 2053 Joint Design and Separation Principle for Opportunistic Spectrum Access in the Presence of Sensing Errors Yunxia Chen, Student Member,

More information

Fast Online Learning of Antijamming and Jamming Strategies

Fast Online Learning of Antijamming and Jamming Strategies Fast Online Learning of Antijamming and Jamming Strategies Y. Gwon, S. Dastangoo, C. Fossa, H. T. Kung December 9, 2015 Presented at the 58 th IEEE Global Communications Conference, San Diego, CA This

More information

Game Theory and Economics of Contracts Lecture 4 Basics in Game Theory (2)

Game Theory and Economics of Contracts Lecture 4 Basics in Game Theory (2) Game Theory and Economics of Contracts Lecture 4 Basics in Game Theory (2) Yu (Larry) Chen School of Economics, Nanjing University Fall 2015 Extensive Form Game I It uses game tree to represent the games.

More information

Deep Learning for Autonomous Driving

Deep Learning for Autonomous Driving Deep Learning for Autonomous Driving Shai Shalev-Shwartz Mobileye IMVC dimension, March, 2016 S. Shalev-Shwartz is also affiliated with The Hebrew University Shai Shalev-Shwartz (MobilEye) DL for Autonomous

More information

Outline for today s lecture Informed Search Optimal informed search: A* (AIMA 3.5.2) Creating good heuristic functions Hill Climbing

Outline for today s lecture Informed Search Optimal informed search: A* (AIMA 3.5.2) Creating good heuristic functions Hill Climbing Informed Search II Outline for today s lecture Informed Search Optimal informed search: A* (AIMA 3.5.2) Creating good heuristic functions Hill Climbing CIS 521 - Intro to AI - Fall 2017 2 Review: Greedy

More information

Communication over a Time Correlated Channel with an Energy Harvesting Transmitter

Communication over a Time Correlated Channel with an Energy Harvesting Transmitter Communication over a Time Correlated Channel with an Energy Harvesting Transmitter Mehdi Salehi Heydar Abad Faculty of Engineering and Natural Sciences Sabanci University, Istanbul, Turkey mehdis@sabanciuniv.edu

More information

Index Terms Deterministic channel model, Gaussian interference channel, successive decoding, sum-rate maximization.

Index Terms Deterministic channel model, Gaussian interference channel, successive decoding, sum-rate maximization. 3798 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 58, NO 6, JUNE 2012 On the Maximum Achievable Sum-Rate With Successive Decoding in Interference Channels Yue Zhao, Member, IEEE, Chee Wei Tan, Member,

More information

Foundations of AI. 3. Solving Problems by Searching. Problem-Solving Agents, Formulating Problems, Search Strategies

Foundations of AI. 3. Solving Problems by Searching. Problem-Solving Agents, Formulating Problems, Search Strategies Foundations of AI 3. Solving Problems by Searching Problem-Solving Agents, Formulating Problems, Search Strategies Luc De Raedt and Wolfram Burgard and Bernhard Nebel Contents Problem-Solving Agents Formulating

More information

Name: Your EdX Login: SID: Name of person to left: Exam Room: Name of person to right: Primary TA:

Name: Your EdX Login: SID: Name of person to left: Exam Room: Name of person to right: Primary TA: UC Berkeley Computer Science CS188: Introduction to Artificial Intelligence Josh Hug and Adam Janin Midterm I, Fall 2016 This test has 8 questions worth a total of 100 points, to be completed in 110 minutes.

More information

The Game-Theoretic Approach to Machine Learning and Adaptation

The Game-Theoretic Approach to Machine Learning and Adaptation The Game-Theoretic Approach to Machine Learning and Adaptation Nicolò Cesa-Bianchi Università degli Studi di Milano Nicolò Cesa-Bianchi (Univ. di Milano) Game-Theoretic Approach 1 / 25 Machine Learning

More information

TUD Poker Challenge Reinforcement Learning with Imperfect Information

TUD Poker Challenge Reinforcement Learning with Imperfect Information TUD Poker Challenge 2008 Reinforcement Learning with Imperfect Information Outline Reinforcement Learning Perfect Information Imperfect Information Lagging Anchor Algorithm Matrix Form Extensive Form Poker

More information

U strictly dominates D for player A, and L strictly dominates R for player B. This leaves (U, L) as a Strict Dominant Strategy Equilibrium.

U strictly dominates D for player A, and L strictly dominates R for player B. This leaves (U, L) as a Strict Dominant Strategy Equilibrium. Problem Set 3 (Game Theory) Do five of nine. 1. Games in Strategic Form Underline all best responses, then perform iterated deletion of strictly dominated strategies. In each case, do you get a unique

More information

Outline for this presentation. Introduction I -- background. Introduction I Background

Outline for this presentation. Introduction I -- background. Introduction I Background Mining Spectrum Usage Data: A Large-Scale Spectrum Measurement Study Sixing Yin, Dawei Chen, Qian Zhang, Mingyan Liu, Shufang Li Outline for this presentation! Introduction! Methodology! Statistic and

More information

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Last name: First name: SID: Class account login: Collaborators: CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Due: Monday 2/28 at 5:29pm either in lecture or in 283 Soda Drop Box (no slip days).

More information

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010 1401 Decomposition Principles and Online Learning in Cross-Layer Optimization for Delay-Sensitive Applications Fangwen Fu, Student Member,

More information

Asynchronous Best-Reply Dynamics

Asynchronous Best-Reply Dynamics Asynchronous Best-Reply Dynamics Noam Nisan 1, Michael Schapira 2, and Aviv Zohar 2 1 Google Tel-Aviv and The School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel. 2 The

More information

DeepMind Self-Learning Atari Agent

DeepMind Self-Learning Atari Agent DeepMind Self-Learning Atari Agent Human-level control through deep reinforcement learning Nature Vol 518, Feb 26, 2015 The Deep Mind of Demis Hassabis Backchannel / Medium.com interview with David Levy

More information

ACRUCIAL issue in the design of wireless sensor networks

ACRUCIAL issue in the design of wireless sensor networks 4322 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 8, AUGUST 2010 Coalition Formation for Bearings-Only Localization in Sensor Networks A Cooperative Game Approach Omid Namvar Gharehshiran, Student

More information

Convergence in competitive games

Convergence in competitive games Convergence in competitive games Vahab S. Mirrokni Computer Science and AI Lab. (CSAIL) and Math. Dept., MIT. This talk is based on joint works with A. Vetta and with A. Sidiropoulos, A. Vetta DIMACS Bounded

More information

Theory of Moves Learners: Towards Non-Myopic Equilibria

Theory of Moves Learners: Towards Non-Myopic Equilibria Theory of s Learners: Towards Non-Myopic Equilibria Arjita Ghosh Math & CS Department University of Tulsa garjita@yahoo.com Sandip Sen Math & CS Department University of Tulsa sandip@utulsa.edu ABSTRACT

More information

Reinforcement Learning Simulations and Robotics

Reinforcement Learning Simulations and Robotics Reinforcement Learning Simulations and Robotics Models Partially observable noise in sensors Policy search methods rather than value functionbased approaches Isolate key parameters by choosing an appropriate

More information

Scheduling. Radek Mařík. April 28, 2015 FEE CTU, K Radek Mařík Scheduling April 28, / 48

Scheduling. Radek Mařík. April 28, 2015 FEE CTU, K Radek Mařík Scheduling April 28, / 48 Scheduling Radek Mařík FEE CTU, K13132 April 28, 2015 Radek Mařík (marikr@fel.cvut.cz) Scheduling April 28, 2015 1 / 48 Outline 1 Introduction to Scheduling Methodology Overview 2 Classification of Scheduling

More information

Game Theory and Randomized Algorithms

Game Theory and Randomized Algorithms Game Theory and Randomized Algorithms Guy Aridor Game theory is a set of tools that allow us to understand how decisionmakers interact with each other. It has practical applications in economics, international

More information

Optimal Coded Information Network Design and Management via Improved Characterizations of the Binary Entropy Function

Optimal Coded Information Network Design and Management via Improved Characterizations of the Binary Entropy Function Optimal Coded Information Network Design and Management via Improved Characterizations of the Binary Entropy Function John MacLaren Walsh & Steven Weber Department of Electrical and Computer Engineering

More information

Localization (Position Estimation) Problem in WSN

Localization (Position Estimation) Problem in WSN Localization (Position Estimation) Problem in WSN [1] Convex Position Estimation in Wireless Sensor Networks by L. Doherty, K.S.J. Pister, and L.E. Ghaoui [2] Semidefinite Programming for Ad Hoc Wireless

More information

Guess the Mean. Joshua Hill. January 2, 2010

Guess the Mean. Joshua Hill. January 2, 2010 Guess the Mean Joshua Hill January, 010 Challenge: Provide a rational number in the interval [1, 100]. The winner will be the person whose guess is closest to /3rds of the mean of all the guesses. Answer:

More information

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) AI Plays 2048 Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) Abstract The strategy game 2048 gained great popularity quickly. Although it is easy to play, people cannot win the game easily,

More information

Incentive design for social computing: Interdisciplinarity time!

Incentive design for social computing: Interdisciplinarity time! Incentive design for social computing: Interdisciplinarity time! ARPITA GHOSH Cornell University June 2015 Incentive design for social computing: Interdisciplinarity time! 1 Hello (world) Question: Please

More information

Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks

Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks Recently, consensus based distributed estimation has attracted considerable attention from various fields to estimate deterministic

More information

Elements of Artificial Intelligence and Expert Systems

Elements of Artificial Intelligence and Expert Systems Elements of Artificial Intelligence and Expert Systems Master in Data Science for Economics, Business & Finance Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135 Milano (MI) Ufficio

More information

Multi-user Space Time Scheduling for Wireless Systems with Multiple Antenna

Multi-user Space Time Scheduling for Wireless Systems with Multiple Antenna Multi-user Space Time Scheduling for Wireless Systems with Multiple Antenna Vincent Lau Associate Prof., University of Hong Kong Senior Manager, ASTRI Agenda Bacground Lin Level vs System Level Performance

More information

Optimization of On-line Appointment Scheduling

Optimization of On-line Appointment Scheduling Optimization of On-line Appointment Scheduling Brian Denton Edward P. Fitts Department of Industrial and Systems Engineering North Carolina State University Tsinghua University, Beijing, China May, 2012

More information