Some Complexity Results for Subclasses of Stochastic Games

Similar documents
Some recent results and some open problems concerning solving infinite duration combinatorial games. Peter Bro Miltersen Aarhus University

Qualitative Determinacy and Decidability of Stochastic Games with Signals

How Much Memory is Needed to Win in Partial-Observation Games

Qualitative Determinacy and Decidability of Stochastic Games with Signals

On Range of Skill. Thomas Dueholm Hansen and Peter Bro Miltersen and Troels Bjerre Sørensen Department of Computer Science University of Aarhus

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 3 Fall 2010

Math 152: Applicable Mathematics and Computing

Math 152: Applicable Mathematics and Computing

CS510 \ Lecture Ariel Stolerman

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

Dynamic Programming in Real Life: A Two-Person Dice Game

Dice Games and Stochastic Dynamic Programming

The Complexity of Request-Response Games

Asymptotic and exact enumeration of permutation classes

Design of intelligent surveillance systems: a game theoretic case. Nicola Basilico Department of Computer Science University of Milan

Plan. Related courses. A Take-Away Game. Mathematical Games , (21-801) - Mathematical Games Look for it in Spring 11

Timed Games UPPAAL-TIGA. Alexandre David

CSE 417: Review. Larry Ruzzo

THE GAMES OF COMPUTER SCIENCE. Topics

5.4 Imperfect, Real-Time Decisions

GREATER CLARK COUNTY SCHOOLS PACING GUIDE. Algebra I MATHEMATICS G R E A T E R C L A R K C O U N T Y S C H O O L S

Multiplayer Pushdown Games. Anil Seth IIT Kanpur

CMU-Q Lecture 20:

Summary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility

Modeling Billiards Games

A game-based model for human-robots interaction

Introduction to Game Theory

Game Theory and Randomized Algorithms

Advanced Automata Theory 4 Games

3 Game Theory II: Sequential-Move and Repeated Games

Introduction to Spring 2009 Artificial Intelligence Final Exam

STRATEGY AND COMPLEXITY OF THE GAME OF SQUARES

171S5.4p Properties of Logarithmic Functions. November 20, CHAPTER 5: Exponential and Logarithmic Functions. Examples. Express as a product.

10703 Deep Reinforcement Learning and Control

4. Games and search. Lecture Artificial Intelligence (4ov / 8op)

Games and Adversarial Search II

Lecture 19 November 6, 2014

Final Exam, Math 6105

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game?

LECTURE 26: GAME THEORY 1

Some algorithmic and combinatorial problems on permutation classes

Crossing Game Strategies

CS 188 Introduction to Fall 2014 Artificial Intelligence Midterm

Game Theory: Normal Form Games

Math 147 Section 5.2. Application Example

Senior Math Circles February 10, 2010 Game Theory II

First Cycle Games. Benjamin Aminof (IST Austria) and Sasha Rubin (TU Wien) Strategic Reasoning /20

Notes for Recitation 3

College Pre-Calc Lesson Plans

CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9

Reinforcement Learning in Games Autonomous Learning Systems Seminar

CSCI 1590 Intro to Computational Complexity

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Lesson Plan Mr. Baglos Course: Honors Algebra II As of: 4/2/18. After School: 2:30-3:30 Room 2232

Chapter 3 Learning in Two-Player Matrix Games

Design of intelligent surveillance systems: a game theoretic case. Nicola Basilico Department of Computer Science University of Milan

Combinatorics and Intuitive Probability

Repeated Games. ISCI 330 Lecture 16. March 13, Repeated Games ISCI 330 Lecture 16, Slide 1

Optimal Yahtzee performance in multi-player games

The next several lectures will be concerned with probability theory. We will aim to make sense of statements such as the following:

Simulations. 1 The Concept

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Olympiad Combinatorics. Pranav A. Sriram

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen

Concurrent Reachability Games

Adversarial Search 1

ECE 201: Introduction to Signal Analysis

Heads-up Limit Texas Hold em Poker Agent

Background. Game Theory and Nim. The Game of Nim. Game is Finite 1/27/2011

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

1.5 How Often Do Head and Tail Occur Equally Often?

DVA325 Formal Languages, Automata and Models of Computation (FABER)

Noncooperative Games COMP4418 Knowledge Representation and Reasoning

Reflections on the N + k Queens Problem

CMPUT 396 Tic-Tac-Toe Game

Game-playing AIs: Games and Adversarial Search I AIMA

Tiling Problems. This document supersedes the earlier notes posted about the tiling problem. 1 An Undecidable Problem about Tilings of the Plane

CS1802 Week 6: Sets Operations, Product Sum Rule Pigeon Hole Principle (Ch )

final examination on May 31 Topics from the latter part of the course (covered in homework assignments 4-7) include:

Contents. MA 327/ECO 327 Introduction to Game Theory Fall 2017 Notes. 1 Wednesday, August Friday, August Monday, August 28 6

A short introduction to Security Games

Bead Sort: A Natural Sorting Algorithm

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

An evaluation of how Dynamic Programming and Game Theory are applied to Liar s Dice

How hard are computer games? Graham Cormode, DIMACS

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1

game tree complete all possible moves

Graphs of Tilings. Patrick Callahan, University of California Office of the President, Oakland, CA

arxiv: v2 [math.gm] 31 Dec 2017

SOLITAIRE CLOBBER AS AN OPTIMIZATION PROBLEM ON WORDS

Math Lecture 2 Inverse Functions & Logarithms

depth parallel time width hardware number of gates computational work sequential time Theorem: For all, CRAM AC AC ThC NC L NL sac AC ThC NC sac

Iteration. Many thanks to Alan Fern for the majority of the LSPI slides.

1. Simultaneous games All players move at same time. Represent with a game table. We ll stick to 2 players, generally A and B or Row and Col.

5.1 State-Space Search Problems

AI Approaches to Ultimate Tic-Tac-Toe

On form and function in board games

Wireless Networks Do Not Disturb My Circles

Modular arithmetic Math 2320

Learning from Hints: AI for Playing Threes

Transcription:

Some Complexity Results for Subclasses of Stochastic Games Krishnendu Chatterjee Workshop on Stochastic Games, Singapore, Nov 30, 2015 Krishnendu Chatterjee 1

Stochastic Games This talk glimpse of two types of results: Computational complexity. Strategy complexity. For stochastic games as well as many different subclasses. Krishnendu Chatterjee 3

Stochastic Game Graphs A stochastic game graph is a tuple G =(S,M, 1, 2,±) S is a finite set of states. M is a finite set of moves or actions. i : S! 2 M n ; is an action assignment function that assigns the non-empty set i (s) of actions to player i at s, where i 2 {1,2}. ±: S M M! D(S), is a stochastic transition function that given a state and actions of both players gives a distribution over the next state. For deterministic games, the transition function is deterministic. Krishnendu Chatterjee 4

Example game ½ ½ Krishnendu Chatterjee 5

Example game ½ ½ Krishnendu Chatterjee 6

Example game ½ ½ Krishnendu Chatterjee 7

Example game ½ ½ Krishnendu Chatterjee 8

Example game ½ ½ Krishnendu Chatterjee 9

Example game ½ ½ Krishnendu Chatterjee 10

Example game ½ ½ Krishnendu Chatterjee 11

Example game ½ ½ Krishnendu Chatterjee 12

Example game ½ ½ Krishnendu Chatterjee 13

Strategies Recipes to play the game. ¾: (S M M) * S D(M) Complexity of strategies: Memory. Randomization. Stationary strategies (no memory): ¾: S D(M) Krishnendu Chatterjee 14

Mean-payoff Objective Every transition is assigned a rational reward in the interval [0,1], by a reward function r. Mean-payoff objective: The payoff for a play (infinite path) is the long-run average of the rewards of the path. LimSupAvg. LimInfAvg. Krishnendu Chatterjee 15

Existence of Value Fundamental result on existence of values [MN81] sup ¾ inf ¼ E s, ¾ ¼ [LimInfAvg] = inf ¼ sup ¾ E s, ¾ ¼ [LimSupAvg] Order of strategies can be exchanged. The value of the game v(s). Value problem: The basic computational problem is to decide whether v(s). Krishnendu Chatterjee 16

Survey of Results Computational complexity of the value problem. Strategy complexity: Strategies for witness of the value problem. General stochastic games and various subclasses. Krishnendu Chatterjee 17

General Problem Result Decision problem: First result: Exponential time: 2 poly(m,n), where m is number of actions, and n is number of states [CMH08]. Second result: Doubly exponential: m 2n [HKLMT 11]. For constant number of states is polynomial. Nice generalization of zero-sum matrix games. Strategy complexity: very complicated even for simple games like Big-match. Krishnendu Chatterjee 18

Towards Subclasses Concurrent games, Mean-payoff obj Krishnendu Chatterjee 19

Towards Subclasses Concurrent games, Mean-payoff obj Structural restr. Turn-based stochastic Ergodic Turn-based deterministic Krishnendu Chatterjee 20

Towards Subclasses Concurrent games, Mean-payoff obj Structural restr. Objective restr. Turn-based stochastic Ergodic Reach obj. Turn-based deterministic Krishnendu Chatterjee 21

Reachability and Safety Games Reachability/safety games: A set T of terminal or absorbing states with reward 1, all other states have reward 0. Hence the reachability player wishes to reach T, and safety player wishes to avoid T. Most basic objectives in computer science Reactive safety critical systems. Positive recursive games Reachability player. Safety player is the opponent. Krishnendu Chatterjee 22

Computational Classes Polynomial time (P): Efficient Linear, Quadratic. Non-deterministic polynomial time (NP): Given a witness of polynomial length it can be checked in polynomial time. conp some sense complement of NP Given a counter-witness (to show some answer is no) of polynomial length it can be checked in polynomial time. Krishnendu Chatterjee 23

Computational Classes NP conp NP-c P conp-c NP and conp Krishnendu Chatterjee 24

TURN-BASED (STOCH. & DET.) GAMES Krishnendu Chatterjee 25

Turn-based Games: Computational Complexity 1. Turn-based deterministic: a) Reach: Linear time. b) Mean-payoff [EM79,ZP95,Karp79]: I. O(n m W); II. NP and conp; not known to be P. 2. Turn-based stochastic: a) Reach: I. NP and conp, not known to be P. II. At least as hard as 1b [Con92]. b) Mean-payoff: I. Equivalent to 2a [AM09]. Krishnendu Chatterjee 26

Turn-based Games: Computational Complexity 1. Turn-based deterministic: a) Reach: Linear time. b) Mean-payoff [EM79,ZP95,Karp79]: I. O(n m W); II. NP and conp; not known to be P. 2. Turn-based stochastic: a) Reach: I. NP and conp, not known to be P. II. At least as hard as 1b [Con92]. b) Mean-payoff: I. Equivalent to 2a [AM09]. Krishnendu Chatterjee 27

Turn-based Stochastic Games Strategy complexity [LL69]: Positional (deterministic and stationary). The NP and conp bound: Polynomial witness: Positional strategy. An action for every state. Polynomial time verification: Given a positional strategy is fixed we obtain an MDP. Values in MDPs can be computed in polynomial time by linear programming [FV97]. Krishnendu Chatterjee 28

Some Hardness Results Hardness results: TBD Mean-payoff Value Problem. TBS Reach Value Problem. SQUARE-ROOT-SUM problem: Given positive integers a 1, a 2,, a n, and b, decide if the sum of square roots of a i is at least b. This problem is not even known to be in NP. Krishnendu Chatterjee 29

ERGODIC GAMES Krishnendu Chatterjee 30

Ergodic Games For all strategies all states appear infinitely often with probability 1. Stationary optimal strategies exist [HK66]. However, not positional, randomization is need. Strategy complexity of stationary strategies How complex is to represent the probability distribution of a stationary strategy. Krishnendu Chatterjee 31

Stationary Strategy Representation Distribution in every state. Representation of distributions Exponential numbers have polynomial-size representation due to binary representation. Doubly exponential numbers cannot be explicitly represented in polynomial size. Distributions that can be expressed with exponential numbers have polynomial representation. Krishnendu Chatterjee 32

Stationary Strategies Complexity Complexity measure: Patience: Inverse of minimum non-zero probability [Eve57]. Roundedness: The number r such that all probabilities multiple of 1/r. Pat Rou. Significance: Exponential roundedness implies polynomial witness. Doubly exponential patience implies explicit representation requires exponential space (not polynomial witness in explicit representation). Krishnendu Chatterjee 33

Ergodic Games Results [CI 14] Reachability is not relevant. Strategy complexity: For ²-optimal strategies, for ²>0, we show exponential patience is necessary (lower bound) and exponential roundedness is sufficient (upper bound). Lower bound based on a family of games. Upper bound based on a coupling argument. Krishnendu Chatterjee 34

Ergodic Games Results [CI 14] Computational complexity: Value problem (precise decision question): is SQUARE-ROOT- SUM hard. Value problem (precise or approximate): TBS Value problem hard. Approximation problem is in NP. Krishnendu Chatterjee 35

Ergodic Games Results [CI 14] Strategy complexity of optimal strategies: We don t know a precise answer. We have the following result: Exponential patience for optimal strategies would imply SQUARE-ROOT-SUM problem in P. Hence proving exponential patience will be a major breakthrough. Proving super-exponential lower bound would separate optimal and ²-optimal strategies. Krishnendu Chatterjee 36

Summary of Results TB Det TB Stoch Value Conc. Ergodic Value Reach Linear NP and conp Open ques: in P ---- Mean-payoff NP and conp Open ques: in P NP and conp Open ques: in P NP and conp (approx) Hardness (approx) SQRT-SUM-hard (exact) Krishnendu Chatterjee 37

Towards Subclasses Concurrent games, Mean-payoff obj Structural restr. Objective restr. Turn-based stochastic Ergodic Reach obj. Turn-based deterministic Krishnendu Chatterjee 38

CONCURRENT REACH/SAFE GAMES Krishnendu Chatterjee 39

Reachability and Safety Games Reachability/safety games: A set T of terminal or absorbing states with reward 1, all other states have reward 0. Hence the reachability player wishes to reach T, and safety player wishes to avoid T. Positive stochastic games Reachability player. Safety player is the opponent. Krishnendu Chatterjee 40

Reachability and Safety Games Computational complexity: Value problem Exponential time: [dam01]. SQUARE-ROOT-SUM hard: [EY06]. Approximation problem: NP NP [FM13]. Krishnendu Chatterjee 41

Reachability and Safety Games Strategy complexity: Reachability player [Eve57]: Optimal strategies need not exist, but ²-optimal for all ²>0. ²-optimal strategies, for ²>0, are stationary. Safety player [Par71]: Optimal stationary strategies exist. Locally optimal strategies are optimal. Krishnendu Chatterjee 42

Reachability and Safety Games Strategy complexity: Reachability player results. Doubly-exponential patience is necessary and doublyexponential roundedness is sufficient [HKM09]. Krishnendu Chatterjee 43

Reachability and Safety Games Strategy complexity: Reachability/safety player comparison (based on number of value classes). New results [CHI15]. Krishnendu Chatterjee 44

Reachability and Safety Games Strategy complexity: Reachability/safety player comparison (based on number of value classes). New results [CHI15]. Krishnendu Chatterjee 45

Reachability and Safety Games Strategy complexity: Reachability/safety player comparison (based on number of value classes). New results [CHI15]. Krishnendu Chatterjee 46

Reachability and Safety Games Strategy complexity: Reachability/safety player comparison (based on number of value classes). New results [CHI15]. Surprising result Krishnendu Chatterjee 47

Surprising Results 3-state lower bound Two terminal state and one state. Local optimally implies optimality. So basically play strategies of matrix games. In matrix games, only logarithmic patience is necessary. For safety games, in matrix, there is a variable, which depends on the value. This causes an increase from logarithmic to exponential. Krishnendu Chatterjee 48

The Doubly Exponential LB Lower bound for safety is surprising: Two other games which share properties with safety. Discounted games: Local optimality implies optimality and there exponential roundedness suffices. Ergodic games: optimal stationary strategies exist, and again exponential roundedness suffices. First explain the lower bound for reachability. Then the lower bound for safety. Krishnendu Chatterjee 49

An Example: Snow-ball Game [dahk98] run, throw s run, wait hide, throw T [Eve 57] hide, wait Hide Run Play hide 1-², Run ² Throw Wait Krishnendu Chatterjee 50

Snow-ball-in Stages: Purgatory [HKM09] Success event: Move forward one step. Mistake event: Loose the game. Stay event: Back to the start state. To remove cluttering will omit the arrows in next slides. Krishnendu Chatterjee 51

Snow-ball-in Stages: Purgatory [HKM09] (1-² 2n, ² 2n ) (1- ² 2, ² 2 ) (1-², ²) Reachability player: Doubly exponential patience is necessary. In this game, the safety player has positional optimal strategies. We will call this game Pur(n): n stages. Krishnendu Chatterjee 52

Towards the Safety Game Counter Example 1. Consider Pur(n+1). 2. Simplify the start state by making it deterministically go to the next state. SimPur(n). Krishnendu Chatterjee 53

Towards the Safety Game Counter Example 2. SimPur(n). 3. Take its mirror image. Exchange role of players. MirSimPur(n) Krishnendu Chatterjee 54

Towards Safety Game Counter Example SimPur(n): Safety player has positional strategies. MirSimPur(n): Safety player has positional strategies. Krishnendu Chatterjee 55

Towards the Safety Game Counter Example 2. SimPur(n). 3. MirSimPur(n) Krishnendu Chatterjee 56

Towards the Safety Game Counter Example 1/2 2. SimPur (n). 3. MirSimPur(n) 4. Merge start states. PurDuel(n) Krishnendu Chatterjee 57

Towards Safety Game Counter Example PurDuel(n): Safety player requires doubly exponential patience. Merging two games where positional suffices we get a game where doubly exponential patience is necessary. Krishnendu Chatterjee 58

Summary: Concurrent Reachability and Safety Games Computational complexity: Value problem Exponential time (polynomial space): [dam01]. SQUARE-ROOT-SUM hard: [EY06]. Approximation problem: NP NP [FM 13]. Strategy Complexity: Krishnendu Chatterjee 59

Towards Subclasses Concurrent games, Mean-payoff obj Structural restr. Objective restr. Turn-based stochastic Ergodic Reach obj. Turn-based deterministic Krishnendu Chatterjee 60

CONCLUSION AND OPEN PROB Krishnendu Chatterjee 69

Conclusion Strategy and computational complexity of the value problem for stochastic games. Two restrictions: Structural: Turn-based, ergodic. Objective: Reachability. Other restrictions: Value-1 problem. Special classes of strategies. Survey of results: Some polynomial time, some open questions. Krishnendu Chatterjee 70

Major Open Questions Value problem for TBD Mean-payoff in P. Value problem for TBS reach games in P. Krishnendu Chatterjee 71

Collaborators Kristoffer Arnsfelt Hansen Thomas A. Henzinger Rasmus Ibsen-Jensen Rupak Majumdar Krishnendu Chatterjee 73

References [MN81] J. Mertens and A. Neyman. Stochastic games. IJGT, 10:53 66, 1981. [CMH08] K. Chatterjee, R. Majumdar, and T. A. Henzinger. Stochastic limit-average games are in EXPTIME. IJGT, 37(2):219 234, 2008. [HKLMT11] K. A. Hansen, M. Koucky, N. Lauritzen, P. B. Miltersen, and E. P. Tsigaridas. Exact algorithms for solving stochastic games: extended abstract. In STOC, pages 205 214, 2011. [EM79] A. Ehrenfeucht and J. Mycielski. Positional strategies for mean payoff games. IJGT, 8(2):109 113, 1979. [ZP96] U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theoretical Computer Science, 158:343 359, 1996. [Con 92] A. Condon. The complexity of stochastic games. I&C, 96(2):203 224, 1992. [AM09] D. Andersson and P. B. Miltersen: The Complexity of Solving Stochastic Games on Graphs. ISAAC 2009: 112-121 Krishnendu Chatterjee 74

References [HK66] A. J. Hoffman and R. M. Karp. On nonterminating stochastic games. Management Science, 12(5):359 370, 1966. [Eve57] H. Everett. Recursive games. In CTG, volume 39 of AMS, pages 47 78, 1957. [CI14] K. Chatterjee and R. Ibsen-Jensen. The Complexity of Ergodic Mean-payoff Games. In ICALP 2014, pages 122 133, 2014. [MS07] P. B. Miltersen and T. B. Sørensen. A near-optimal strategy for a heads-up no-limit texas hold em poker tournament. In AAMAS 07, pages 191 197, 2007. [dam01] L. de Alfaro and R. Majumdar. Quantitative solution of omega-regular games. In STOC 01, pages 675 683. ACM Press, 2001. [EY06] K. Etessami and M. Yannakakis. Recursive concurrent stochastic games. In ICALP 06 (2), pages 324 335, 2006. [FM13] S. K. S. Frederiksen and P. B. Miltersen. Approximating the value of a concurrent reachability game in the polynomial time hierarchy. In ISAAC, pages 457 467, 2013. Krishnendu Chatterjee 75

References [dahk98] L. de Alfaro, T. A. Henzinger, and O. Kupferman. Concurrent reachability games. FOCS, 1998. [Par 71] T. Parthasarathy. Discounted and positive stochastic games. Bull. Amer. Math. Soc, 77:134 136, 1971. [HKM 09] K. A. Hansen, M. Koucky, and P. B. Miltersen. Winning concurrent reachability games requires doubly-exponential patience. In LICS, pages 332 341, 2009. [CHI15] K. Chatterjee, K. A. Hansen and R. Ibsen-Jensen: Strategy Complexity of Concurrent Stochastic Games with Safety and Reachability Objectives. CoRR abs/1506.02434 (2015). [CI 15a] K. Chatterjee and R. Ibsen-Jensen: Qualitative analysis of concurrent mean-payoff games. I&C. 242: 2-24 (2015) [CI 15b] K. Chatterjee and R. Ibsen-Jensen: The Value 1 Problem Under Finite-memory Strategies for Concurrent Mean-payoff Games. SODA 2015: 1018-1029. [HIK 15] K. A. Hansen, R. Ibsen-Jensen and M. Koucky. Personal communication. For a copy contact Ibsen-Jensen. Krishnendu Chatterjee 76

QUESTIONS? Krishnendu Chatterjee 77