CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

Similar documents
CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

CSE 473 Midterm Exam Feb 8, 2018

CS188 Spring 2014 Section 3: Games

CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9

CS 188 Fall Introduction to Artificial Intelligence Midterm 1

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Introduction to Spring 2009 Artificial Intelligence Final Exam

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Name: Your EdX Login: SID: Name of person to left: Exam Room: Name of person to right: Primary TA:

Adversarial Search 1

ARTIFICIAL INTELLIGENCE (CS 370D)

CS188 Spring 2010 Section 3: Game Trees

A. Rules of blackjack, representations, and playing blackjack

CS 188: Artificial Intelligence Spring Announcements

Midterm Examination. CSCI 561: Artificial Intelligence

Phase 10 Masters Edition Copyright 2000 Kenneth R. Johnson For 2 to 4 Players

More on games (Ch )

PHASE 10 CARD GAME Copyright 1982 by Kenneth R. Johnson

AI Approaches to Ultimate Tic-Tac-Toe

Project 1. Out of 20 points. Only 30% of final grade 5-6 projects in total. Extra day: 10%

CS188 Spring 2010 Section 3: Game Trees

Artificial Intelligence

Make better decisions. Learn the rules of the game before you play.

More on games (Ch )

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.)

CS 188: Artificial Intelligence

CSE 473: Artificial Intelligence Fall Outline. Types of Games. Deterministic Games. Previously: Single-Agent Trees. Previously: Value of a State

CS510 \ Lecture Ariel Stolerman

CS 771 Artificial Intelligence. Adversarial Search

CS 188: Artificial Intelligence

Adversarial Search Lecture 7

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

CS 5522: Artificial Intelligence II

Game Playing State-of-the-Art. CS 188: Artificial Intelligence. Behavior from Computation. Video of Demo Mystery Pacman. Adversarial Search

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

game tree complete all possible moves

Adversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012

Game-Playing & Adversarial Search

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

Artificial Intelligence

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Players try to obtain a hand whose total value is greater than that of the house, without going over 21.

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

CS 188: Artificial Intelligence. Overview

Game Playing State-of-the-Art

Multiple Agents. Why can t we all just get along? (Rodney King)

mywbut.com Two agent games : alpha beta pruning

Live Casino game rules. 1. Live Baccarat. 2. Live Blackjack. 3. Casino Hold'em. 4. Generic Rulette. 5. Three card Poker

Announcements. Homework 1 solutions posted. Test in 2 weeks (27 th ) -Covers up to and including HW2 (informed search)

2 person perfect information

Artificial Intelligence

CSE 573: Artificial Intelligence Autumn 2010

Section Marks Agents / 8. Search / 10. Games / 13. Logic / 15. Total / 46

Problem 1. (15 points) Consider the so-called Cryptarithmetic problem shown below.

Homework Assignment #2

Game Playing State of the Art

Game-playing: DeepBlue and AlphaGo

COMP219: Artificial Intelligence. Lecture 13: Game Playing

Artificial Intelligence. Minimax and alpha-beta pruning

CS 210 Fundamentals of Programming I Spring 2015 Programming Assignment 8

Your Name and ID. (a) ( 3 points) Breadth First Search is complete even if zero step-costs are allowed.

The game of poker. Gambling and probability. Poker probability: royal flush. Poker probability: four of a kind

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

2359 (i.e. 11:59:00 pm) on 4/16/18 via Blackboard

CS 188 Introduction to Fall 2014 Artificial Intelligence Midterm

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 210 Fundamentals of Programming I Fall 2015 Programming Project 8

Announcements. CS 188: Artificial Intelligence Fall Local Search. Hill Climbing. Simulated Annealing. Hill Climbing Diagram

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

Intuition Mini-Max 2

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

CS 540: Introduction to Artificial Intelligence

COMP 9 Lab 3: Blackjack revisited

CSC 396 : Introduction to Artificial Intelligence

CSE 40171: Artificial Intelligence. Adversarial Search: Games and Optimality

Table Games Rules. MargaritavilleBossierCity.com FIN CITY GAMBLING PROBLEM? CALL

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13

All Blackjack HOUSE RULES and dealing procedures apply. Dealer will offer insurance when showing an ACE.

Design task: Pacman. Software engineering Szoftvertechnológia. Dr. Balázs Simon BME, IIT

LET S PLAY PONTOON. Pontoon also offers many unique payouts as well as a Super Bonus of up to $5000 on certain hands.

CS325 Artificial Intelligence Ch. 5, Games!

UMBC CMSC 671 Midterm Exam 22 October 2012

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

CS 188: Artificial Intelligence Spring 2007

Optimal Rhode Island Hold em Poker

Today. Types of Game. Games and Search 1/18/2010. COMP210: Artificial Intelligence. Lecture 10. Game playing

Using Artificial intelligent to solve the game of 2048

NUMB3RS Activity: A Bit of Basic Blackjack. Episode: Double Down

CS Programming Project 1

2. Basics of Noncooperative Games

Application of UCT Search to the Connection Games of Hex, Y, *Star, and Renkula!

Artificial Intelligence

Theory of Probability - Brett Bernstein

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

1 Introduction. 1.1 Game play. CSC 261 Lab 4: Adversarial Search Fall Assigned: Tuesday 24 September 2013

Domino Games. Variation - This came can also be played by multiplying each side of a domino.

4. Games and search. Lecture Artificial Intelligence (4ov / 8op)

Transcription:

CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written up individually. Instructions for submitting your assignment can be found on the website via the assignments page: http://inst.eecs.berkeley.edu/ cs88/fa/assignments.html Pacmen Competing Consider a game where multiple Pacmen are competing for dots. Each Pacman s score is the number of dots it has eaten minus the number of dots that any other Pacman has eaten. Whoever has the highest score when all the food is gone is the winner. One Pacman moves each turn, with p first, then p 2, etc. Assume that pacmen can move past each other in the maze and share a square. Finally, assume that they cannot stop. First consider the simple one-player case, in the board below. Pacman can move either West (left), East (right), South (down) or North (up) and is using limited-depth minimax search to choose his next move, with a basic evaluation function consisting of Pacman s score. There is no time step penalty. (a) ( point) For what search depths, if any, will East be an apparently optimal action (i.e., an action that could be returned by depth-limited search) (b) ( point) For what search depths, if any, will West be an apparently optimal action (i.e., an action that could be returned by depth-limited search)? (c) ( point) of the game? For what search depths, if any, will the minimax value returned be the actual minimax value (d) ( point) Now, Pacman is using an evaluation function of (score + 2d closest ), where d closest denotes the manhattan distance to the closest dot. For what search depths, if any, is West an apparently optimal action?

A second, adversarial Pacman (p 2 ) enters the game! The evaluation function p uses is again simply its score. Remember, p s score is the number of dots p has eaten minus the number of dots p 2 has eaten (and so p and p 2 always have opposite scores). It is currently p s turn. Again, there is no time step penalty, for either agent. 2 (e) ( point) search tree? If p uses minimax search to depth 0, what will be the minimax value of root node in his (f) ( point) Now suppose p knows that p 2 just moves randomly. If p uses a depth 0 expectimax search, what will be his optimal action or actions? 2

2 Minimax/Expectimax Consider the following minimax tree where the x, y, z, w, a, and b correspond to the utilities at the leaf nodes. x y z w m b a (a) ( point) Assume that alpha-beta evaluates w; write the condition on the values x, y, z and w (or a subset of those) that allows the evaluation of a to be skipped. (b) ( point) Assume that alpha-beta evaluates node m; write the condition on the values x, y, z, w and a (or a subset of those) that allows the evaluation of b to be skipped. Do not treat m as a number! Consider the following expectimax tree. The outcomes at the chance node are equally likely. For all terminal states, it holds that 0 U 6. 0 y c (c) ( point) Write the condition on y that guarantees that the computation can be safely stopped without considering the c-leaf. Now consider the following, slightly more general expectimax tree. The chance node has N (terminal) outcomes. For each outcome i of the chance node, the probability is p i and the utility is u i. Again, all terminal utilities obey 0 U 6. 0 u u 2 u N (d) ( point) Write the condition that allows the computation of the chance node s value to stop after the first k leaves, without considering the rest of the leaves of the chance node. 3

3 Blackjack In the game of Blackjack, a player gets cards, one at a time, with the goal of achieving a total value as close to 2 as possible without exceeding it. If the current value of a player s hand is less than 2, the player can hit (be dealt a single card) in the hope of acquiring a hand with higher value. However, the player runs the risk of busting, or going over 2, which results in an immediate loss. The player can also decide to stay (stop getting cards). The end result is either a win or a loss, as described below. In the CS88 casino, several variants of blackjack are played; in what follows, you will formulate each variant as an MDP. In all variants, there is only one player (you) and one dealer. The deck is always reset and shuffled after every game. Also, each card type always has a fixed numeric value (so aces always have value, face cards have value 0, etc.). Finally, the utility for the player winning is 0, while the utility for the player losing is -5, and the utility of a tie is 0. Fixed Dealer, Infinite Deck: For this variant, the deck is infinite, which means that the probability of being dealt any given card is independent of the cards already dealt. In addition, the dealer s hand is fixed to the value 5, i.e., staying at 6 or higher is a win. (a) ( point) Describe a minimal state representation for this problem variant. (b) ( point) For how many states s is T (start, hit, s ) non-zero? (c) ( point) Will value iteration converge exactly on this MDP (i.e., is there some finite k for which V k (s) = V (s) for all s)? Briefly justify your answer. Fixed Dealer, Finite Deck: Now assume the infinite deck is replaced by a finite deck, which means that the cards already dealt are not being replaced. Again, the dealer s hand is fixed to the value 5. (d) ( point) Describe a minimal state representation for this problem. 4

Dealer Showing, Infinite Deck: Assume once more that the deck is infinite, but now the dealer actually plays (stays and hits) in alternation with the player, and the dealer always shows her cards. The dealer has fixed behavior: if her cards total less than 5, she hits, while if they total 5 or more, she stays. The player may keep hitting after the dealer stops and vice versa. (e) ( point) Describe a minimal state representation for this problem. (f) ( point) For how many states s is T (start, hit, s ) non-zero? Assume that the start state is before any card has been dealt. Dealer Hiding, Infinite Deck: Assume once more that the deck is infinite, and the dealer still plays (stays and hits) in alternation with the player, according to the same fixed behavior. However, now the dealer does not shows her cards value (though the player does know how many cards the dealer has). (g) ( point) Can this problem still be formalized as an MDP? If so, describe a minimal state representation for this problem. If not, justify why not. 5

4 MDP: Walk or Jump? Consider an MDP with states {0,, 2, 3, 4}, where 0 is the starting state and 4 is a terminal state. In the terminal state, there are no actions that can be taken and the value for that state is defined to be zero. In states k 3, you can Walk (W ) and T (k, W, k + ) =. In states k 2, you can also Jump (J ) and T (k, J, k + 2) = 2/3, T (k, J, k) = /3 (usually jumping is faster, but sometimes you trip and don t make progress). The reward R(s, a, s ) = (s s ) 2 for all (s, a, s ). Use a discount of γ = /2. /3 2/3 0 2 3 4 (a) ( point) Consider the policy π that chooses the action Walk in every state. Compute V π (2). (b) ( point) Compute V (2). (c) ( point) Compute Q (, W ). Now consider a similar MDP, but with N + states {0,..., N} where 0 is the starting state and N is the terminal state. Now the transition probabilities are T (k, J, k + 2) = 0. and T (k, J, k) = 0.9 for k N 2 and T (k, W, k + ) = 0.9 and T (k, W, k) = 0. for k N. Again, you cannot jump from N. However now R(s, a, s ) = 00 for s = N and 0 otherwise. Furthermore, the discount has changed; γ = 0.99. (d) ( point) What is the smallest k such that V k (0) > 0, where V k is the value function after k iterations of Bellman updates? Assume N is even. (e) ( point) Will value iteration converge exactly (i.e., is there some finite k for which V k (s) = V (s) for all s)? Briefly justify your answer. 6

Now consider the same MDP as above (with non-zero rewards at state N only). However, there is now a time limit: the game ends after a known, finite number of actions M, where M < N. Formally, after the M first steps, no further rewards are obtained. One way to treat this situation is with time-dependent values and policies, as optimal values and actions could change depending on the time remaining. (f) ( point) Consider the value function V (s, t), where t is the number of time steps left and s is the current state. Write a general Bellman optimality equation for V (s, t) in terms of other V (s, t) quantities. A full credit answer will not specialize to this MDP. (g) ( point) For this MDP, will the optimal time-dependent policy, π (s, t), ever be different (for any state s or time t) from the optimal policy π(s) for the original, unlimited-time version of the problem? Either provide a specific example of a difference or justify why there will be no difference. 7