CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

Similar documents
CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

CS188 Spring 2010 Section 3: Game Trees

CS188 Spring 2010 Section 3: Game Trees

CS188 Spring 2014 Section 3: Games

CSE 473 Midterm Exam Feb 8, 2018

CS 188 Fall Introduction to Artificial Intelligence Midterm 1

Name: Your EdX Login: SID: Name of person to left: Exam Room: Name of person to right: Primary TA:

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

ARTIFICIAL INTELLIGENCE (CS 370D)

Make better decisions. Learn the rules of the game before you play.

game tree complete all possible moves

Game-Playing & Adversarial Search

CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9

Adversarial Search. Robert Platt Northeastern University. Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA

ADVERSARIAL SEARCH. Today. Reading. Goals. AIMA Chapter , 5.7,5.8

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Game Playing Part 1 Minimax Search

Adversarial Search 1

More on games (Ch )

More on games (Ch )

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

A. Rules of blackjack, representations, and playing blackjack

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.)

CS 188: Artificial Intelligence Spring Announcements

CS 387: GAME AI BOARD GAMES

CS 771 Artificial Intelligence. Adversarial Search

CS510 \ Lecture Ariel Stolerman

ADVERSARIAL SEARCH. Today. Reading. Goals. AIMA Chapter Read , Skim 5.7

Artificial Intelligence. Minimax and alpha-beta pruning

Foundations of Artificial Intelligence

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Adversarial Search and Game Theory. CS 510 Lecture 5 October 26, 2017

Adversarial Search Lecture 7

UNIVERSITY of PENNSYLVANIA CIS 391/521: Fundamentals of AI Midterm 1, Spring 2010

Games (adversarial search problems)

Players try to obtain a hand whose total value is greater than that of the house, without going over 21.

CS 188: Artificial Intelligence

mywbut.com Two agent games : alpha beta pruning

Programming Project 1: Pacman (Due )

CSE 473: Artificial Intelligence Fall Outline. Types of Games. Deterministic Games. Previously: Single-Agent Trees. Previously: Value of a State

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Game Playing State-of-the-Art

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

CS61B Lecture #22. Today: Backtracking searches, game trees (DSIJ, Section 6.5) Last modified: Mon Oct 17 20:55: CS61B: Lecture #22 1

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

Chapter 2. Games of Chance. A short questionnaire part 1

Adversarial Search: Game Playing. Reading: Chapter

CMPUT 396 Tic-Tac-Toe Game

Computer Game Programming Board Games

Fall 2017 March 13, Written Homework 4

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence

Adversary Search. Ref: Chapter 5

Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning

Midterm Examination. CSCI 561: Artificial Intelligence

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

2/5/17 ADVERSARIAL SEARCH. Today. Introduce adversarial games Minimax as an optimal strategy Alpha-beta pruning Real-time decision making

Game Playing Beyond Minimax. Game Playing Summary So Far. Game Playing Improving Efficiency. Game Playing Minimax using DFS.

CS 4700: Artificial Intelligence

Adversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012

CPS331 Lecture: Search in Games last revised 2/16/10

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Theory and Practice of Artificial Intelligence

Introduction to Spring 2009 Artificial Intelligence Final Exam

A UNIQUE COMBINATION OF CHANCE & SKILL.

Artificial Intelligence. 4. Game Playing. Prof. Bojana Dalbelo Bašić Assoc. Prof. Jan Šnajder

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón

COMP219: Artificial Intelligence. Lecture 13: Game Playing

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

All Blackjack HOUSE RULES and dealing procedures apply. Dealer will offer insurance when showing an ACE.

Pengju

Announcements. Homework 1 solutions posted. Test in 2 weeks (27 th ) -Covers up to and including HW2 (informed search)

CS 188 Introduction to Fall 2014 Artificial Intelligence Midterm

The game of poker. Gambling and probability. Poker probability: royal flush. Poker probability: four of a kind

Playing Games. Henry Z. Lo. June 23, We consider writing AI to play games with the following properties:

16.410/413 Principles of Autonomy and Decision Making

Introduc)on to Ar)ficial Intelligence

Homework Assignment #2

Adversarial Search. Read AIMA Chapter CIS 421/521 - Intro to AI 1

Game-playing: DeepBlue and AlphaGo

Game playing. Chapter 5, Sections 1 6

Artificial Intelligence

Adversarial Search Aka Games

Math 152: Applicable Mathematics and Computing

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

CSE 40171: Artificial Intelligence. Adversarial Search: Games and Optimality

Game playing. Chapter 5. Chapter 5 1

Game-playing AIs: Games and Adversarial Search I AIMA

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1

CSC242: Intro to AI. Lecture 8. Tuesday, February 26, 13

Today. Nondeterministic games: backgammon. Algorithm for nondeterministic games. Nondeterministic games in general. See Russell and Norvig, chapter 6

Game-Playing & Adversarial Search Alpha-Beta Pruning, etc.

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS 387/680: GAME AI BOARD GAMES

Bonus Side Bets Analysis

CS 331: Artificial Intelligence Adversarial Search. Games we will consider

Games we will consider. CS 331: Artificial Intelligence Adversarial Search. What makes games hard? Formal Definition of a Game.

Transcription:

Last name: First name: SID: Class account login: Collaborators: CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Due: Monday 2/28 at 5:29pm either in lecture or in 283 Soda Drop Box (no slip days). Policy: Can be solved in groups (acknowledge collaborators) but must be written up individually. Recall to make a photo-copy of your solutions to allow you to resubmit for partial credit recovery. See course webpage for details. 1 [11 pts] Minimax Search and Pruning Consider the zero-sum game tree shown below. Trapezoids that point up, such as at the root, represent choices for the player seeking to maximize; trapezoids that point down represent choices for the minimizer. [1 pt] (a) Assuming both opponents act optimally, carry out the minimax search algorithm. Write the value of each node inside the corresponding trapezoid and highlight the action the maximizer would take in the tree. [3 pt] (b) Now reconsider the same game tree, but use α-β pruning (the tree is printed on the next page). Expand successors from left to right. In the brackets [, ], record the [α, β] pair that is passed down that edge (through a call to MIN-VALUE or MAX-VALUE). In the parentheses ( ), record the value (v) that is passed up the edge (the value returned by MIN-VALUE or MAX-VALUE). Circle all leaf nodes that are visited. Put an X through edges that are pruned off. [1 pt] (c) True / False. Minimax and α-β pruning are guaranteed to find the same value of the top node. [4 pt] (d) Consider again the same game tree, searched using α-β pruning. This time, rather than expanding successors from left to right assume you can decide the order in which you expand successors. Find the order that results in exploring as few nodes as possible for this particular game. As in part (b), record the [α, β] values passed down the tree, and the (v) return values passed up. Circle all leaf nodes that are visited. Put an X through edges that are pruned off. [2 pt] (e) Assume you have an evaluation function which for each node can provide an estimate of the minimax value (though the estimate will not be perfect). How can you use these minimax value estimates to guide the order in which successors are expanded, with the goal of minimizing the number of leaf nodes visited while running the α-β pruning algorithm? 1

(b) (d) 2

2 [8 pts] Expectimax for cs188-blackjack Blackjack is the most widely played casino betting game in the world. The goal of the game is to be dealt a hand whose value is as close to 21 as possible without exceeding it. If the current value of a player s hand is less than 21, the player can hit, or be dealt a single card, in hopes of acquiring a hand with higher value. However, the player runs the risk of busting, or going over 21, which results in an immediate loss. In casino play, players bet independently against a dealer, who plays according to a fixed set of rules that govern when he should hit or stay. In this problem set, we consider a simplified variant called cs188-blackjack. There are only 3 cards in the deck: 5 s, 10 s and 11 s. Each card appears with equal probability. The casino has invented an infinite deck. The probability of being dealt any given card is independent of the cards already dealt. To model the action of a dealer, we assume the casino gives fixed payoffs according to the following schedule (in dollars) Hand Value Payoff 0-14 0 15 3 16 3 17 3 18 3 19 3 20 9 21 12 Bust -6 There are two actions available: Hit, which draws a card uniformly at random and adds its value to your current score, and Stay, which ends the game and yields the above payoff. If your score goes above 21 the game ends immediately with a payoff of -6. It is not possible to hit on 21. Thus if you ever arrive at a hand value of 21, there are no actions possible. You are playing a hand of cs188-blackjack. You have been dealt 1 card, and its value is 11. [2 pts] (a) Build the expectimax tree for this game, starting from your current hand and including all chance and max nodes. In your tree, you should put hit actions to the left of stay actions, and you should order max nodes below the same chance node in increasing order of the hand s value (from left to right). Write the value of each state next to the given node. What is your optimal strategy? Specify your actions at all max nodes in the tree. 3

[2 pt] (b) Unfortunately, you are playing at a table with an unscrupulous dealer who is rigging the deck. Every time he deals a card, instead of dealing you a random card, he gives you the worst possible card you could get at that moment. What is the value of the game now and what is your optimal strategy? [2 pt] (c) When you complain about the cheating dealer to the pit boss, a new dealer is brought in. This dealer is extremely nice: half of the time, when his boss is watching, he deals you a random card. The other half of the time, he deals you the best possible card you could get at that moment. Draw out the game tree for this (using the same instructions as (a)). What is the value of the game now and what is your optimal strategy? [2 pt] (d) The casino owner, anxious about dwindling interest in cs188-blackjack, asks you to help him rework the game. He would like to increase the payouts for a value of 21 to $x. What is the minimal value of $x so that the optimal strategy for a player holding 16 changes? Assume fair dealers (as was assumed in part (a)). 4

3 [12 pts] Mission to Mars You control a solar-powered Mars rover. It can at any time drive fast or slow. You get a reward for the distance crossed, so fast gives +10 while slow gives +4. Your rover can be in one of three states:,, or off. Driving fast tends to heat up the rover, while driving slow tends to it down. If the rover overheats, it shuts off, forever. The transitions are shown to the right. Because critical research depends on the observations of the rover, there is a discount of γ = 0.9. s a s T (s, a, s ) slow 1 fast 1/4 fast 3/4 slow 1/4 slow 3/4 fast 7/8 fast off 1/8 [1pt] (a) How many possible deterministic stationary policies are there? [1 pt] (b) What is the value of the state under the policy that always goes slow? [1 pt] (c) Fill in the following table of depth-limited values from value iteration for this MDP. Note that this part concerns (optimal) value iteration, not evaluation of the always-slow policy. s V 0(s) V 1(s) V 2(s) 0 0 off 0 0 0 [1 pt] (d) How many rounds of value iteration will it take for the values of all states to converge to their exact values? (State infinitely many if you think it will only have converged after infinitely many.) [1pt] (e) What is the optimal policy for γ =.9? s π (s) [1pt] (f) What are the optimal values for the optimal policy when γ =.9? s V (s) off 0 5

[1pt] (g) Central command, demanding results faster, tells you that they don t care about the future of the rover. In particular, they say that your discount parameter γ should be.5. What are the optimal policy and values now? s π (s) s V (s) off 0 [2pt] (h) Now imagine that you do not know in advance what the thermal responses of the rover will be, so you decide to do Q-learning. You observe the following sequence of transitions: 1. (, slow, 4) 2. (, fast, 10) 3. (, fast, 10) 4. (, fast, 10) 5. (, slow, 4) Give the Q-values for each step in this sequence as it is processed by Q-learning, assuming a learning rate (α) of 0.5 and a discount factor γ = 0.9. For example, Q 3(s, a) should be the Q-values after processing transitions 1, 2, and 3. s a Q 0(s, a) Q 1(s, a) Q 2(s, a) Q 3(s, a) Q 4(s, a) Q 5(s, a) slow 0 fast 0 slow 0 fast 0 [3pt] (i) An ε-greedy policy may not be the right choice for Q-learning in this situation given that the rover, once off, is lost forever. On the other hand, it may not be optimal to never risk going fast from a state perhaps the planet is very cold and there is little risk. Imagine that you know that T (,fast,) = T (,fast,off) for all environments. Note: this property is not true for the transitions above! State a modified Q-learning update and procedure that exploits this knowledge and from which you will learn all optimal Q-values without ever visiting the Q-state (,fast), assuming you do visit all other Q-states infinitely often. Be precise (i.e. use math). 6