Adversarial Search and Game Theory. CS 510 Lecture 5 October 26, 2017

Similar documents
CS510 \ Lecture Ariel Stolerman

Multiagent Systems: Intro to Game Theory. CS 486/686: Introduction to Artificial Intelligence

Multiagent Systems: Intro to Game Theory. CS 486/686: Introduction to Artificial Intelligence

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game?

Multiagent Systems: Intro to Game Theory. CS 486/686: Introduction to Artificial Intelligence

LECTURE 26: GAME THEORY 1

Multiple Agents. Why can t we all just get along? (Rodney King)

Programming Project 1: Pacman (Due )

CMU-Q Lecture 20:

16.410/413 Principles of Autonomy and Decision Making

Game Tree Search. Generalizing Search Problems. Two-person Zero-Sum Games. Generalizing Search Problems. CSC384: Intro to Artificial Intelligence

CSC384: Introduction to Artificial Intelligence. Game Tree Search

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

Game Playing: Adversarial Search. Chapter 5

Game Playing State-of-the-Art

Adversarial Search. CMPSCI 383 September 29, 2011

Game-Playing & Adversarial Search

CS 5522: Artificial Intelligence II

ARTIFICIAL INTELLIGENCE (CS 370D)

CMU Lecture 22: Game Theory I. Teachers: Gianni A. Di Caro

Adversarial Search Lecture 7

CSE 473: Artificial Intelligence. Outline

Introduction to Game Theory

Artificial Intelligence

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

CS 188: Artificial Intelligence

Artificial Intelligence

Adversarial Search 1

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Mixed Strategies; Maxmin

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

CS 188: Artificial Intelligence

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

CS 188: Artificial Intelligence Spring Announcements

CS 771 Artificial Intelligence. Adversarial Search

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

Adversarial Search and Game Playing. Russell and Norvig: Chapter 5

Domination Rationalizability Correlated Equilibrium Computing CE Computational problems in domination. Game Theory Week 3. Kevin Leyton-Brown

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Games (adversarial search problems)

Artificial Intelligence 1: game playing

Lecture 5: Game Playing (Adversarial Search)

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Artificial Intelligence

1. Introduction to Game Theory

CS 331: Artificial Intelligence Adversarial Search II. Outline

Section Notes 6. Game Theory. Applied Math 121. Week of March 22, understand the difference between pure and mixed strategies.

Artificial Intelligence

Games. Episode 6 Part III: Dynamics. Baochun Li Professor Department of Electrical and Computer Engineering University of Toronto

Artificial Intelligence. Minimax and alpha-beta pruning

CSE 40171: Artificial Intelligence. Adversarial Search: Games and Optimality

Game-playing: DeepBlue and AlphaGo

2/5/17 ADVERSARIAL SEARCH. Today. Introduce adversarial games Minimax as an optimal strategy Alpha-beta pruning Real-time decision making

CS 4700: Foundations of Artificial Intelligence

Adversarial Search. Robert Platt Northeastern University. Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA

CS 188: Artificial Intelligence. Overview

Ar#ficial)Intelligence!!

Computing Nash Equilibrium; Maxmin

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

Game playing. Chapter 5, Sections 1 6

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Introduction to Game Theory

CSE 573: Artificial Intelligence

Game Playing State of the Art

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search

ADVERSARIAL SEARCH. Today. Reading. Goals. AIMA Chapter , 5.7,5.8

CPS 570: Artificial Intelligence Game Theory

UPenn NETS 412: Algorithmic Game Theory Game Theory Practice. Clyde Silent Confess Silent 1, 1 10, 0 Confess 0, 10 5, 5

Today. Types of Game. Games and Search 1/18/2010. COMP210: Artificial Intelligence. Lecture 10. Game playing

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

Foundations of Artificial Intelligence

Pengju

Artificial Intelligence Adversarial Search

Game Engineering CS F-24 Board / Strategy Games

CSCI 699: Topics in Learning and Game Theory Fall 2017 Lecture 3: Intro to Game Theory. Instructor: Shaddin Dughmi

Math 464: Linear Optimization and Game

CS188 Spring 2010 Section 3: Game Trees

Game playing. Outline

CS885 Reinforcement Learning Lecture 13c: June 13, Adversarial Search [RusNor] Sec

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

COMP219: Artificial Intelligence. Lecture 13: Game Playing

1\2 L m R M 2, 2 1, 1 0, 0 B 1, 0 0, 0 1, 1

Game playing. Chapter 6. Chapter 6 1

ADVERSARIAL SEARCH. Today. Reading. Goals. AIMA Chapter Read , Skim 5.7

Game Playing Part 1 Minimax Search

Intuition Mini-Max 2

1. Simultaneous games All players move at same time. Represent with a game table. We ll stick to 2 players, generally A and B or Row and Col.

Session Outline. Application of Game Theory in Economics. Prof. Trupti Mishra, School of Management, IIT Bombay

Game playing. Chapter 5. Chapter 5 1

Game-playing AIs: Games and Adversarial Search I AIMA

Artificial Intelligence

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

Reading Robert Gibbons, A Primer in Game Theory, Harvester Wheatsheaf 1992.

Lecture Notes on Game Theory (QTM)

Chapter 3 Learning in Two-Player Matrix Games

Transcription:

Adversarial Search and Game Theory CS 510 Lecture 5 October 26, 2017

Reminders Proposals due today Midterm next week past midterms online Midterm online BBLearn Available Thurs-Sun, ~2 hours

Overview Game theory (Simultaneous games) Games (Stackelberg)

What is game theory? Game theory is a formal way to analyze strategic interactions among a group of rational players (or agents) who behave strategically Game theory has applications Economics Politics Computer Science

What is game theory? Games are a form of multi-agent environment Key question: How do actions of other agents affect me? Multi-agent environments can be cooperative or competitive Games are generally (but not always) applied in competitive/adversarial environments Each agent is completely self-interested

Relation of Games to Search Search no adversary Solution is (heuristic) method for finding goal Evaluation function: estimate of cost from start to goal through node Examples: path planning, scheduling activities Games adversary Solution is strategy (strategy specifies move for every possible opponent reply). Evaluation function: evaluate goodness of game position Examples: chess, checkers, Othello, backgammon

Types of Games

Assumptions Features of a game: There are at least two rational players Each player has more than one choice The outcome depends on the strategies chosen by all players; there is strategic interaction Example: Six people go to a restaurant. Each person pays his/her own meal a single-agent decision problem Before the meal, every person agrees to split the bill evenly among them a game

Assumptions (cont) Simultaneous-move Each player chooses his/her strategy without knowledge of others choices. No cooperation Each player receives his/her payoff at end of game Complete information Each player s strategies and payoff function are common knowledge among all the players. Assumptions on the players Rationality

Formal Definition of a Game Players P: {P1,P2,... Pn} Actions S: {S1,S2,...,Sn} Payoff Matrix M: Each player chooses an action s1ε S1, s2 ε S2, sn ε Sn M(s1,s2,...,sn) -> {u1,u2,...un} where ui is payoff for Player Pi

Game Representations

Example: Remote Control Wars Players: Chris and Pat Actions: Watch soccer game or watch soap opera Chris prefers soap opera Pat prefers soccer Both want to hang out together Complete information: both know the matrix

Example: Rock, Paper, Scissors Two players, each simultaneously chooses Rock, Paper or Scissors. Rock beats Scissors, Scissors beats Paper, Paper beats Rock. When Σ ui = 0, we call this a zero-sum game. Otherwise, general-sum.

Example: Rock, Paper, Scissors Two players, each simultaneously chooses Rock, Paper or Scissors. Rock beats Scissors, Scissors beats Paper, Paper beats Rock. When Σ ui = 0, we call this a zero-sum game. Otherwise, general-sum.

Definition: Strategy An action selection strategy for a given game specifies (probabilistically) the action player should take. Let π denote the strategy for player i πi(s) denotes probability with which player i should choose action s If exists s such that π i(s) = 1, πi called a pure strategy Else, πi called a mixed strategy Example: Pure strategy πi: π i(rock) = 1, πi(scissors) = 0, πi(paper) = 0 Mixed strategy π i : π i(rock) = 0.3, πi(scissors) = 0.3, πi(paper) = 0.4

Definition: Strategy Profile Strategy profile : collection of strategies πi for each player i Example: Strategy Profile : < πi, πj > πi(rock) = 0.5, πi(scissors) = 0.5, πi(paper) = 0.0 πj(rock) = 0.2, πj(scissors) = 0.6, πj(paper) = 0.2

Definition: Expected Value The expected value (reward) of a game for player i is given by: Σ (over all i, j) Prob(si,sj) * ui(si,sj) Given strategy profile < π 1, π2 >, what is the expected value for player 1?

Definition: Expected Value The expected value (reward) of a game for player i is given by: Σsi Si Σsj Sj Prob(si,sj) * ui(si,sj) Given strategy profile < π 1, π2 >, what is the expected value for player 1?

Definition: Best Response Strategy πi is Best Response for agent i if, given strategies for other agents, πi maximizes expected value for agent i. What is best response for agent i when agent j plays the following strategy? π j(b0) = 0.2, πj(b1) = 0.8 a0 a1 Player i Player j b0 b1 10,10 0, 0 0, 0 12, 12

Dominated Strategies Strategy πi is strictly dominated by πi` if ui (πi, πj) < ui (πi `, πj) for all πj

Prisoners Dilemma Two suspects held in separate cells are charged with a major crime. However, there is not enough evidence. Both suspects are told the following policy: If neither confesses then both will be convicted of a minor offense and sentenced to one month in jail. If both confess then both will be sentenced to jail for 3 months. If one confesses but the other does not, then the confessor will be released but the other will be sentenced to jail for 5 months. The dominant strategy is clearly not the best!

Dominant strategy equilibrium Does not always exist but if it does, irrational to not play it Inferior strategies are called dominated Dominant strategy equilibrium is a strategy profile where each agent has picked its dominant strategy Requires no counterspeculation But doesn t always exist, so Nash Equilibrium (The Beautiful Mind Guy)

Nash Equilibrium

Nash equilibrium A strategy profile is a Nash equilibrium if no player has incentive to deviate from his strategy given that others do not deviate. Or equivalently, A set of strategies, one for each player, such that each player s strategy is best for her, given that all other players are playing their equilibrium strategies Note: Dominant strategy equilibria are Nash equilibria but not vice versa

Why Study Game Helps us in two ways: Agent Design Theory? Help design agents that reason strategically and perform optimally Mechanism (Game) Design Design Multiagent Systems that maximize collective (global) goals Internet routing, robot teams, traffic congestion

Alternating move games Chess (deep blue) - 1997 not quite 1957 b = 35, d=100 Checkers (Chinook) - solved Backgammon, Othello, Go Poker? add uncertainty

Game Trees Games as search Initial State Successor function (move, state) pairs Terminal test Utility Function

Perfect play for deterministic games Assumption: My opponent will make the best possible move Solution: Minimax minimize the maximum possible loss Thm: For every two-person, zero-sum game with finite strategies, there exists a value V and a mixed strategy for each player, such that (a) Given player 2's strategy, the best payoff possible for player 1 is V, and (b) Given player 1's strategy, the best payoff possible for player 2 is -V. Same as mixed-strategy Nash equilibrium for zero-sum games

Minimax value for a node Minimax value: Utility (for MAX) of reaching given state Minimax-value(n) = Utility(n), if n is a terminal node max over all successors(n), if n is a max node min over all successors(n), if n is a min node

Minimax Algorithm

Class exercise: Fill in values

Properties of minimax Complete? Yes (if tree is finite) Optimal? Yes (against an optimal opponent) Time complexity? O(b m ) Space complexity? O(bm) (depth-first exploration) For chess, b 35, m 100 for "reasonable" games! exact solution completely infeasible

Alpha-beta pruning Same result as minimax but more efficient Insight: Do not need to look at all nodes to find minimax value at the root of a game tree α - minimum score of maximizing player (-inf) β - maximum score of minimizing player (inf) if β<α no need to explore further

Alpha beta example When we reach the 5 we know root R>= 5 R α=5 N N is a min, so N <= 4 β = 4 But 4 < 5, so no need to continue looking here (R never chooses N)

α-β pruning example

α-β pruning example

α-β pruning example

α-β pruning example

α-β pruning example

Αlpha-Beta Pruning Algorithm: Explore game tree in Depth First manner Record and update alpha, beta values Discontinue search when alpha > beta (for max nodes) or beta < alpha (for min nodes)

Class exercise: Redo with alpha-beta

Monte Carlo Tree Search Heuristic for end-state for a node Monte Carlo Rollouts - simulations with random play from a node to the end Use back propagation to estimate the value of intermediate nodes based on the sims

Steps of MCTS

Readings for 11/9 Rodney A. Brooks. Intelligence without representation Tambe. Beliefs, Desires, Intentions (BDI), Chapter 2 of CS 499 course reader