Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.)
|
|
- Sheryl Stone
- 5 years ago
- Views:
Transcription
1 Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Eric B. Laber February 12, 2008 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how tofebruary count cards 12, in 2008 blackjack 1 / and 32do
2 Framework Introduction Objectives: Define Neuro-Dynamic Programming (NDP) Understand how NDP is used by learning to cheat at blackjack Learn other (more noble) applications of NDP Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how tofebruary count cards 12, in 2008 blackjack 2 / and 32do
3 Framework What is NDP? NDP is about sequential decision making An agent (decision maker) is faced with a series of decisions Each decision results in a reward Each decision changes the environment Agent s objective: maximize accumulated reward over time Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how tofebruary count cards 12, in 2008 blackjack 3 / and 32do
4 What is NDP? Framework Initial State S 0 A 1 R 1 S 1 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how tofebruary count cards 12, in 2008 blackjack 4 / and 32do
5 What is NDP? Framework First Decision S 0 A 1 R 1 S 1 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how tofebruary count cards 12, in 2008 blackjack 5 / and 32do
6 What is NDP? Framework First Reward S 0 A 1 R 1 S 1 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how tofebruary count cards 12, in 2008 blackjack 6 / and 32do
7 What is NDP? Framework Second State S 0 A 1 R 1 S 1 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how tofebruary count cards 12, in 2008 blackjack 7 / and 32do
8 What is NDP? Framework Actions affect future states so myopic decision making is NOT sufficient r2 = 50 a21 r1 = 1 S1 A2 a22 r2 = 100 a11 S0 A1 a12 r1 = 2 S 1 A 2 a 21 r2 = 100 a 22 r2 = 0 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how tofebruary count cards 12, in 2008 blackjack 8 / and 32do
9 What is NDP? Framework Actions affect future states so myopic decision making is NOT sufficient r2 = 50 a21 r1 = 1 S1 A2 a22 r2 = 100 a11 S0 A1 a12 r1 = 2 S 1 A 2 a 21 r2 = 100 a 22 r2 = 0 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how tofebruary count cards 12, in 2008 blackjack 9 / and 32do
10 What is NDP? Framework Actions affect future states so myopic decision making is NOT sufficient r2 = 50 a21 r1 = 1 S1 A2 a22 r2 = 100 a11 S0 A1 a12 r1 = 2 S 1 A 2 a 21 r2 = 100 a 22 r2 = 0 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 10 / and 32do
11 What is NDP? Framework Actions affect future states so myopic decision making is NOT sufficient r2 = 50 a21 r1 = 1 S1 A2 a22 r2 = 100 a11 S0 A1 a12 r1 = 2 S 1 A 2 a 21 r2 = 100 a 22 r2 = 0 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 11 / and 32do
12 What is NDP? Framework Actions affect future states so myopic decision making is NOT sufficient r2 = 50 a21 r1 = 1 S1 A2 a22 r2 = 100 a11 S0 A1 a12 r1 = 2 S 1 A 2 a 21 r2 = 100 a 22 r2 = 0 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 12 / and 32do
13 What is NDP? Framework Solution: Go backwards! r2 = 50 a21 r1 = 1 S1 A2 a22 r2 = 100 a11 S0 A1 a12 r1 = 2 S 1 A 2 a 21 r2 = 100 a 22 r2 = 0 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 13 / and 32do
14 What is NDP? Framework Solution: Go backwards! r2 = 50 a21 r1 = 1 S1 A2 a22 r2 = 100 a11 S0 A1 a12 r1 = 2 S 1 A 2 a 21 r2 = 100 a 22 r2 = 0 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 14 / and 32do
15 What is NDP? Framework Solution: Go backwards! r2 = 50 a21 r1 = 101 r1 = 1 S1 A2 a22 r2 = 100 a11 S0 A1 a12 r1 = 2 r1 = 2 S 1 A 2 a 21 r2 = 100 a 22 r2 = 0 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 15 / and 32do
16 What is NDP? Framework Solution: Go backwards! r2 = 50 a21 r1 = 101 r1 = 1 S1 A2 a22 r2 = 100 a11 S0 A1 a12 r1 = 2 r1 = 2 S 1 A 2 a 21 r2 = 100 a 22 r2 = 0 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 16 / and 32do
17 What is NDP? Framework Solution: Go backwards! r2 = 50 a21 r1 = 101 r1 = 1 S1 A2 a22 r2 = 100 a11 S0 A1 a12 r1 = 2 r1 = 2 S 1 A 2 a 21 r2 = 100 a 22 r2 = 0 Backup diagrams put the DP in NDP Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 16 / and 32do
18 Framework What is NDP? Real sequential problems are more sophisticated Systems are stochastic System dynamics are unknown: Reward function is unknown Transition probabilities between states are unknown Number of states and actions may be large or even infinite Must use data to estimate some (or all) of the above Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 17 / and 32do
19 Framework What is NDP? NDP is a method for approximating the backup diagram method for sequential decision problems with unknown system dynamics, large state or action spaces, or both. The term Neuro in Neuro-Dynamic Programming refers to approximation of elements in backup diagram (uses something called Neural Networks in computer science) The term Dynamic Programming refers to solving the system with approximated components using backup diagram approach Often the above steps of approximation and evaluation are alternated repeatedly Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 18 / and 32do
20 Example: Cheating at Blackjack Cheating at Blackjack Example: Counting cards in Blackjack Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 19 / and 32do
21 Example: Cheating at Blackjack Intro to Blackjack Blackjack aka Twenty-one or Pontoon is a popular casino game. Object is to obtain cards whose numerical sum is large without exceeding 21 Player draws cards until he is satisfied with his total or it exceeds 21 (loses) Dealer draws cards according to a fixed policy: hit until total is 17 or higher Winner is person with highest numerical total less than our equal to 21 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 20 / and 32do
22 Example: Cheating at Blackjack Intro to Blackjack Available information at time t: All cards used prior to time t Players current total One of dealers cards Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 21 / and 32do
23 Intro to Blackjack Example: Cheating at Blackjack Beginning of a blackjack hand as sequential decision problem: Card History Number Aces Number Twos Number Threes... Number Kings Choose bet b in {minbest,maxbet} Reward : r1 = 0 Card History Number Aces Number Twos Number Threes... Number Kings Player s Hand One Dealer Card hit: take another card stand: take no more cards Notice we essentially need two strategies One for deciding which bet to place One for deciding when to hit/stand Should the strategies be independent? Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 22 / and 32do
24 Intro to Blackjack Example: Cheating at Blackjack Is the following hand a good one? Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 23 / and 32do
25 Example: Cheating at Blackjack Intro to Blackjack The goodness of a particular hand depends on the strategy being employed Betting strategy depends on estimated goodness of next hand Formally, we define goodness of a hand using a particular strategy as the expected total winnings from that hand and all future hands We must estimate betting and playing strategies simultaneously Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 24 / and 32do
26 Example: Cheating at Blackjack NDP and Blackjack Why solving blackjack directly is difficult: 1 No explicit model 2 Large number of states and actions 3 Variable number of decks (1,2,4, or 8) Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 25 / and 32do
27 Example: Cheating at Blackjack NDP and Blackjack Why solving blackjack directly is difficult: 1 No explicit model 2 Large number of states and actions 3 Variable number of decks (1,2,4, or 8) What makes this a good NPD problem: 1 Easy to simulate blackjack 2 Important features of the game are easy to summarize 3 Can simultaneously solve for any number of decks Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 25 / and 32do
28 Example: Cheating at Blackjack NDP and Blackjack Features for blackjack: We could keep track of the total number of Aces, Twos, Threes, etc. Better to keep track of total percentage of Aces, Twos, Threes, etc. that have appeared (IE at time t we ve seen 23% of all Aces) It is usually sufficient to keep track of less information Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 26 / and 32do
29 NDP and Blackjack Example: Cheating at Blackjack NDP Algorithm: For k = 1, 2,...: 1 Choose strategy π k which decides action for EVERY possible state so that it improves on previous strategy π k 1 2 Estimate expected performance of π k on every possible scenario using computer simulation Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 27 / and 32do
30 NDP and Blackjack Example: Cheating at Blackjack NDP Algorithm: Improve Evaluate Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 28 / and 32do
31 Example: Cheating at Blackjack NDP and Blackjack Questions: How to improve a strategy π k? Suppose at time t we observe state s t 1 We estimate performance of choosing action π k (s t) and following π k afterward 2 We also estimate performance of choosing alternate actions when faced with s t and following π k afterward If improvement can be made at any state s t we can improve π k by choosing the optimal action at s t and leaving π k unchanged at other states Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 29 / and 32do
32 Example: Cheating at Blackjack NDP and Blackjack Questions: How to improve a strategy π k? Suppose at time t we observe state s t 1 We estimate performance of choosing action π k (s t) and following π k afterward 2 We also estimate performance of choosing alternate actions when faced with s t and following π k afterward If improvement can be made at any state s t we can improve π k by choosing the optimal action at s t and leaving π k unchanged at other states How long to run algorithm? We run algorithm until no further improvements can be made Convergence to near-optimal strategy is guaranteed Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 29 / and 32do
33 Example: Cheating at Blackjack NDP and Blackjack Questions: How to improve a strategy π k? Suppose at time t we observe state s t 1 We estimate performance of choosing action π k (s t) and following π k afterward 2 We also estimate performance of choosing alternate actions when faced with s t and following π k afterward If improvement can be made at any state s t we can improve π k by choosing the optimal action at s t and leaving π k unchanged at other states How long to run algorithm? We run algorithm until no further improvements can be made Convergence to near-optimal strategy is guaranteed How to choose starting policy? Any starting policy will do, but some choices will lead to faster convergence Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 29 / and 32do
34 Example: Cheating at Blackjack NDP and Blackjack The preceding algorithm produces a strategy π which is near optimal. However, Using π requires memorizing every possible scenario! Fortunately, NDP allows us to restrict ourselves to simpler strategies Linear strategies like: Bet Large if: 2 NumberAcesLeft+NumberFaceCardsLeft NumberLowCardsLeft > 0 are currently popular Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 30 / and 32do
35 Other NDP Applications Other Applications NDP is utilized in a large number of applications including: Autonomous flight Tailored medical treatments for chronic illness Adaptive standard tests (e.g. GRE) Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 31 / and 32do
36 End Further Information There exist several standard references for NDP: Dynamic Programming and Optimal Control by Bertsekas, Athena Scientific Neuro-Dynamic Programming by Bertseaks and Tsitsiklis, Athena Scientific Reinforcement Learning by Sutton and Barto, MIT Press Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or, how to February count cards 12, 2008 in blackjack 32 / and 32do
Make better decisions. Learn the rules of the game before you play.
BLACKJACK BLACKJACK Blackjack, also known as 21, is a popular casino card game in which players compare their hand of cards with that of the dealer. To win at Blackjack, a player must create a hand with
More informationCS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s
CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written
More informationCSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9
CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9 Learning to play blackjack In this assignment, you will implement
More informationA. Rules of blackjack, representations, and playing blackjack
CSCI 4150 Introduction to Artificial Intelligence, Fall 2005 Assignment 7 (140 points), out Monday November 21, due Thursday December 8 Learning to play blackjack In this assignment, you will implement
More informationCS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs
Last name: First name: SID: Class account login: Collaborators: CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Due: Monday 2/28 at 5:29pm either in lecture or in 283 Soda Drop Box (no slip days).
More informationNUMB3RS Activity: A Bit of Basic Blackjack. Episode: Double Down
Teacher Page 1 : A Bit of Basic Blackjack Topic: Probability involving sampling without replacement Grade Level: 8-12 and dependent trials. Objective: Compute the probability of winning in several blackjack
More informationLET S PLAY PONTOON. Pontoon also offers many unique payouts as well as a Super Bonus of up to $5000 on certain hands.
How to play PONTOON LET S PLAY PONTOON Pontoon is a popular game often played in homes around Australia. Pontoon is great fun on its own or as an introduction to other more strategic casino card games
More informationPlayers try to obtain a hand whose total value is greater than that of the house, without going over 21.
OBJECT OF THE GAME Players try to obtain a hand whose total value is greater than that of the house, without going over 21. CARDS Espacejeux 3-Hand Blackjack uses five 52-card decks that are shuffled after
More informationReinforcement Learning in Games Autonomous Learning Systems Seminar
Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract
More informationMore on games (Ch )
More on games (Ch. 5.4-5.6) Alpha-beta pruning Previously on CSci 4511... We talked about how to modify the minimax algorithm to prune only bad searches (i.e. alpha-beta pruning) This rule of checking
More informationCSEP 573 Applications of Artificial Intelligence Winter 2011 Assignment 3 Due: Wednesday February 16, 6:30PM
CSEP 573 Applications of Artificial Intelligence Winter 2011 Assignment 3 Due: Wednesday February 16, 6:30PM Q 1: [ 9 points ] The purpose of this question is to show that STRIPS is more expressive than
More informationA UNIQUE COMBINATION OF CHANCE & SKILL.
A UNIQUE COMBINATION OF CHANCE & SKILL. The popularity of blackjack stems from its unique combination of chance and skill. The object of the game is to form a hand closer to 21 than the dealer without
More informationMathematical Analysis Player s Choice Poker
Mathematical Analysis Player s Choice Poker Prepared for John Feola New Vision Gaming 5 Samuel Phelps Way North Reading, MA 01864 Office 978-664 -1515 Cell 617-852 -7732 Fax 978-664 -5117 www.newvisiongaming.com
More informationBlazing 7s Blackjack Progressive
Blazing 7s Blackjack Progressive Page 2 Blazing 7s Oxford Casino Rules Manual Establishing Limits on Bets and Aggregate Payouts Casino management may choose to adhere to the following: Define and post
More informationReinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara
Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:
More informationCashback Blackjack TO PLAY THE GAME. The objective of the game is to get closer to 21 than the dealer without going over.
Cashback Blackjack The objective of the game is to get closer to 21 than the dealer without going over. TO PLAY THE GAME This game is played with 6 decks of cards. In order to play, you must place the
More informationSTATION 1: ROULETTE. Name of Guesser Tally of Wins Tally of Losses # of Wins #1 #2
Casino Lab 2017 -- ICM The House Always Wins! Casinos rely on the laws of probability and expected values of random variables to guarantee them profits on a daily basis. Some individuals will walk away
More informationCS221 Final Project Report Learn to Play Texas hold em
CS221 Final Project Report Learn to Play Texas hold em Yixin Tang(yixint), Ruoyu Wang(rwang28), Chang Yue(changyue) 1 Introduction Texas hold em, one of the most popular poker games in casinos, is a variation
More informationarxiv:math/ v1 [math.oc] 15 Dec 2004
arxiv:math/0412311v1 [math.oc] 15 Dec 2004 Finding Blackjack s Optimal Strategy in Real-time and Player s Expected Win Jarek Solowiej February 1, 2008 Abstract We describe the probability theory behind
More informationOUTSIDE IOWA, CALL
WWW.1800BETSOFF.ORG OUTSIDE IOWA, CALL 1-800-522-4700 IOWA DEPARTMENT OF PUBLIC HEALTH, GAMBLING TREATMENT PROGRAM PROMOTING AND PROTECTING THE HEALTH OF IOWANS Printing is made possible with money from
More informationBonus Side Bets Analysis
HOUSE WAY PAI GOW Poker Bonus Side Bets Analysis Prepared for John Feola New Vision Gaming 5 Samuel Phelps Way North Reading, MA 01864 Office 978-664 - 1515 Cell 617-852 - 7732 Fax 978-664 - 5117 www.newvisiongaming.com
More informationMore on games (Ch )
More on games (Ch. 5.4-5.6) Announcements Midterm next Tuesday: covers weeks 1-4 (Chapters 1-4) Take the full class period Open book/notes (can use ebook) ^^ No programing/code, internet searches or friends
More informationBlazing 7 s Blackjack Progressive
Blazing 7 s Blackjack Progressive Page 2 Blazing 7 S Oxford Casino Rules Manual Establishing Limits on Bets & Aggregate Payouts Casino management may choose to adhere to the following: Define and post
More informationBLACKJACK. Game Rules. Definitions Mode of Play How to Play Settlement Irregularities
BLACKJACK Game Rules 1. Definitions 2. Mode of Play 3. 4. How to Play Settlement 5. Irregularities 21 1. Definitions 1.1. In these rules: 1.1.1. Blackjack means an Ace and any card having a point value
More informationELKS TOWER CASINO and LOUNGE. EZ BACCARAT Panda 8
ELKS TOWER CASINO and LOUNGE EZ BACCARAT Panda 8 *EZ Baccarat is owned, patented and/or copyrighted by DEQ Systems Corp. Please submit your agreement with the Owner authorizing play of Game in your gambling
More informationProject 2 - Blackjack Due 7/1/12 by Midnight
Project 2 - Blackjack Due 7//2 by Midnight In this project we will be writing a program to play blackjack (or 2). For those of you who are unfamiliar with the game, Blackjack is a card game where each
More information1. Definitions 2. Mode of Play 3. How to Play 4. Settlement 5. Irregularities
7 UP BACCARAT (MBS) Games Rules w.e.f. 2 February 2011 1. Definitions 2. Mode of Play 3. How to Play 4. Settlement 5. Irregularities - 1 - 1. Definitions 1.1. In these rules: 1.1.1. "Hand" means the cards
More informationCS107L Handout 06 Autumn 2007 November 2, 2007 CS107L Assignment: Blackjack
CS107L Handout 06 Autumn 2007 November 2, 2007 CS107L Assignment: Blackjack Much of this assignment was designed and written by Julie Zelenski and Nick Parlante. You're tired of hanging out in Terman and
More informationStrategy Evaluation in Extensive Games with Importance Sampling
Michael Bowling BOWLING@CS.UALBERTA.CA Michael Johanson JOHANSON@CS.UALBERTA.CA Neil Burch BURCH@CS.UALBERTA.CA Duane Szafron DUANE@CS.UALBERTA.CA Department of Computing Science, University of Alberta,
More informationTemporal-Difference Learning in Self-Play Training
Temporal-Difference Learning in Self-Play Training Clifford Kotnik Jugal Kalita University of Colorado at Colorado Springs, Colorado Springs, Colorado 80918 CLKOTNIK@ATT.NET KALITA@EAS.UCCS.EDU Abstract
More informationGuide. Odds. Understanding. The THE HOUSE ADVANTAGE
THE HOUSE ADVANTAGE A Guide The Odds to Understanding AMERICAN GAMING ASSOCIATION 1299 Pennsylvania Avenue, NW Suite 1175 Washington, DC 20004 202-552-2675 www.americangaming.org 2005 American Gaming Association.
More informationLive Casino game rules. 1. Live Baccarat. 2. Live Blackjack. 3. Casino Hold'em. 4. Generic Rulette. 5. Three card Poker
Live Casino game rules 1. Live Baccarat 2. Live Blackjack 3. Casino Hold'em 4. Generic Rulette 5. Three card Poker 1. LIVE BACCARAT 1.1. GAME OBJECTIVE The objective in LIVE BACCARAT is to predict whose
More informationBlackjack Project. Due Wednesday, Dec. 6
Blackjack Project Due Wednesday, Dec. 6 1 Overview Blackjack, or twenty-one, is certainly one of the best-known games of chance in the world. Even if you ve never stepped foot in a casino in your life,
More informationIteration. Many thanks to Alan Fern for the majority of the LSPI slides.
Approximate Click to edit Master titlepolicy style Iteration Click to edit Emma Master Brunskill subtitle style Many thanks to Alan Fern for the majority of the LSPI slides. https://web.engr.oregonstate.edu/~afern/classes/cs533/notes/lspi.pdf
More informationIntelligent Agents & Search Problem Formulation. AIMA, Chapters 2,
Intelligent Agents & Search Problem Formulation AIMA, Chapters 2, 3.1-3.2 Outline for today s lecture Intelligent Agents (AIMA 2.1-2) Task Environments Formulating Search Problems CIS 421/521 - Intro to
More informationHow to Increase Your Earnings with the Red 7 Part I
1 How to Increase Your Earnings with the Red 7 Part I By Conrad O. Membrino January 2010 With thanks to ET Fan for his valuable guidance and assistance in his initial review of this paper. This is a short
More informationProtec 21
www.digitace.com Protec 21 Catch card counters in the act Catch shuffle trackers Catch table hoppers players working in a team Catch cheaters by analyzing abnormal winning patterns Clear non-counting suspects
More informationLearning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer
Learning via Delayed Knowledge A Case of Jamming SaiDhiraj Amuru and R. Michael Buehrer 1 Why do we need an Intelligent Jammer? Dynamic environment conditions in electronic warfare scenarios failure of
More information4.12 Practice problems
4. Practice problems In this section we will try to apply the concepts from the previous few sections to solve some problems. Example 4.7. When flipped a coin comes up heads with probability p and tails
More informationmake the faux pas of touching them! They are dealt face up for a reason, primarily to prevent a few types of player cheating.
Rules Of Black Jack The rules of BlackJack differ slightly from area to area and /or from casino to casino. For example, a casino in downtown Vegas may have different rules than one of the Vegas Strip
More informationGame Playing for a Variant of Mancala Board Game (Pallanguzhi)
Game Playing for a Variant of Mancala Board Game (Pallanguzhi) Varsha Sankar (SUNet ID: svarsha) 1. INTRODUCTION Game playing is a very interesting area in the field of Artificial Intelligence presently.
More informationhow TO PLAY blackjack
how TO PLAY blackjack Blackjack is SkyCity s most popular table game. It s a fun and exciting game so have a go and you ll soon see why it s so popular. Getting started To join the action, simply place
More informationTable Games Rules. MargaritavilleBossierCity.com FIN CITY GAMBLING PROBLEM? CALL
Table Games Rules MargaritavilleBossierCity.com 1 855 FIN CITY facebook.com/margaritavillebossiercity twitter.com/mville_bc GAMBLING PROBLEM? CALL 800-522-4700. Blackjack Hands down, Blackjack is the most
More informationSimple Poker Game Design, Simulation, and Probability
Simple Poker Game Design, Simulation, and Probability Nanxiang Wang Foothill High School Pleasanton, CA 94588 nanxiang.wang309@gmail.com Mason Chen Stanford Online High School Stanford, CA, 94301, USA
More informationDynamic Programming in Real Life: A Two-Person Dice Game
Mathematical Methods in Operations Research 2005 Special issue in honor of Arie Hordijk Dynamic Programming in Real Life: A Two-Person Dice Game Henk Tijms 1, Jan van der Wal 2 1 Department of Econometrics,
More informationThe game of poker. Gambling and probability. Poker probability: royal flush. Poker probability: four of a kind
The game of poker Gambling and probability CS231 Dianna Xu 1 You are given 5 cards (this is 5-card stud poker) The goal is to obtain the best hand you can The possible poker hands are (in increasing order):
More informationGame Design Verification using Reinforcement Learning
Game Design Verification using Reinforcement Learning Eirini Ntoutsi Dimitris Kalles AHEAD Relationship Mediators S.A., 65 Othonos-Amalias St, 262 21 Patras, Greece and Department of Computer Engineering
More informationMath Steven Noble. November 24th. Steven Noble Math 3790
Math 3790 Steven Noble November 24th The Rules of Craps In the game of craps you roll two dice then, if the total is 7 or 11, you win, if the total is 2, 3, or 12, you lose, In the other cases (when the
More informationReinforcement Learning Simulations and Robotics
Reinforcement Learning Simulations and Robotics Models Partially observable noise in sensors Policy search methods rather than value functionbased approaches Isolate key parameters by choosing an appropriate
More informationAlternation in the repeated Battle of the Sexes
Alternation in the repeated Battle of the Sexes Aaron Andalman & Charles Kemp 9.29, Spring 2004 MIT Abstract Traditional game-theoretic models consider only stage-game strategies. Alternation in the repeated
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Slides borrowed from Katerina Fragkiadaki Solving known MDPs: Dynamic Programming Markov Decision Process (MDP)! A Markov Decision Process
More informationChapter 3 Learning in Two-Player Matrix Games
Chapter 3 Learning in Two-Player Matrix Games 3.1 Matrix Games In this chapter, we will examine the two-player stage game or the matrix game problem. Now, we have two players each learning how to play
More informationThe student will explain and evaluate the financial impact and consequences of gambling.
What Are the Odds? Standard 12 The student will explain and evaluate the financial impact and consequences of gambling. Lesson Objectives Recognize gambling as a form of risk. Calculate the probabilities
More informationGame Playing State of the Art
Game Playing State of the Art Checkers: Chinook ended 40 year reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer
More information18.S34 (FALL, 2007) PROBLEMS ON PROBABILITY
18.S34 (FALL, 2007) PROBLEMS ON PROBABILITY 1. Three closed boxes lie on a table. One box (you don t know which) contains a $1000 bill. The others are empty. After paying an entry fee, you play the following
More informationExploitability and Game Theory Optimal Play in Poker
Boletín de Matemáticas 0(0) 1 11 (2018) 1 Exploitability and Game Theory Optimal Play in Poker Jen (Jingyu) Li 1,a Abstract. When first learning to play poker, players are told to avoid betting outside
More informationBlackjack for Dummies CSE 212 Final Project James Fitzgerald and Eleazar Fernando
Blackjack for Dummies CSE 212 Final Project James Fitzgerald and Eleazar Fernando 1 Abstract Our goal was to use Microsoft Visual Studio 2003 to create the card game Blackjack. Primary objectives for implementing
More informationHOW TO PLAY BLACKJACK
Gaming Guide HOW TO PLAY BLACKJACK Blackjack, one of the most popular casino table games, is easy to learn and exciting to play! The object of the game of Blackjack is to achieve a hand higher than the
More information73 Assignee: Four Queens, Inc., Las Vegas, Nev. (21) Appl. No.: 840, Filed: Feb. 24, Int. Cl... A63F1/00 52 U.S. C...
United States Patent (19) LeVasseur 54 METHD F PLAYING MULTIPLE ACTIN BLACKJACK 75 Inventor: Richard A. LeVasseur, Las Vegas, Nev. 73 Assignee: Four Queens, Inc., Las Vegas, Nev. (21) Appl. No.: 840,393
More informationAll Blackjack HOUSE RULES and dealing procedures apply. Dealer will offer insurance when showing an ACE.
Start the game by placing the main Blackjack wager along with the optional "BUST ANTE" wager. The wagers DO NOT have to be equal. "BUST ANTE" WAGER IS PAID EVEN MONEY IF THE DEALER BUSTS. All Blackjack
More informationDice Games and Stochastic Dynamic Programming
Dice Games and Stochastic Dynamic Programming Henk Tijms Dept. of Econometrics and Operations Research Vrije University, Amsterdam, The Netherlands Revised December 5, 2007 (to appear in the jubilee issue
More informationTowards Strategic Kriegspiel Play with Opponent Modeling
Towards Strategic Kriegspiel Play with Opponent Modeling Antonio Del Giudice and Piotr Gmytrasiewicz Department of Computer Science, University of Illinois at Chicago Chicago, IL, 60607-7053, USA E-mail:
More informationMath 152: Applicable Mathematics and Computing
Math 152: Applicable Mathematics and Computing May 8, 2017 May 8, 2017 1 / 15 Extensive Form: Overview We have been studying the strategic form of a game: we considered only a player s overall strategy,
More informationThe Teachers Circle Mar. 20, 2012 HOW TO GAMBLE IF YOU MUST (I ll bet you $5 that if you give me $10, I ll give you $20.)
The Teachers Circle Mar. 2, 22 HOW TO GAMBLE IF YOU MUST (I ll bet you $ that if you give me $, I ll give you $2.) Instructor: Paul Zeitz (zeitzp@usfca.edu) Basic Laws and Definitions of Probability If
More informationSession 11, Investment Section Breakfast. Moderator: Kelly Lynn Featherstone, FSA, ACIA. Presenter: David G. Schwartz, Ph.D.
Session 11, Investment Section Breakfast Moderator: Kelly Lynn Featherstone, FSA, ACIA Presenter: David G. Schwartz, Ph.D. SOA Antitrust Disclaimer SOA Presentation Disclaimer Hand-to-Hand Risk Management
More informationLearning a Value Analysis Tool For Agent Evaluation
Learning a Value Analysis Tool For Agent Evaluation Martha White Michael Bowling Department of Computer Science University of Alberta International Joint Conference on Artificial Intelligence, 2009 Motivation:
More informationAutomated Suicide: An Antichess Engine
Automated Suicide: An Antichess Engine Jim Andress and Prasanna Ramakrishnan 1 Introduction Antichess (also known as Suicide Chess or Loser s Chess) is a popular variant of chess where the objective of
More informationAfter receiving his initial two cards, the player has four standard options: he can "Hit," "Stand," "Double Down," or "Split a pair.
Black Jack Game Starting Every player has to play independently against the dealer. The round starts by receiving two cards from the dealer. You have to evaluate your hand and place a bet in the betting
More informationGAMES AND STRATEGY BEGINNERS 12/03/2017
GAMES AND STRATEGY BEGINNERS 12/03/2017 1. TAKE AWAY GAMES Below you will find 5 different Take Away Games, each of which you may have played last year. Play each game with your partner. Find the winning
More informationA Reinforcement Learning Scheme for Adaptive Link Allocation in ATM Networks
A Reinforcement Learning Scheme for Adaptive Link Allocation in ATM Networks Ernst Nordström, Jakob Carlström Department of Computer Systems, Uppsala University, Box 325, S 751 05 Uppsala, Sweden Fax:
More informationCS 229 Final Project: Using Reinforcement Learning to Play Othello
CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.
More informationTo play the game player has to place a bet on the ANTE bet (initial bet). Optionally player can also place a BONUS bet.
ABOUT THE GAME OBJECTIVE OF THE GAME Casino Hold'em, also known as Caribbean Hold em Poker, was created in the year 2000 by Stephen Au- Yeung and is now being played in casinos worldwide. Live Casino Hold'em
More information3. If you can t make the sum with your cards, you must draw one card. 4. Players take turns rolling and discarding cards.
1 to 10 Purpose: The object of the game is to get rid of all your cards. One player gets all the red cards, the other gets all the black cards. Players: 2-4 players Materials: 2 dice, a deck of cards,
More informationPolicy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen
Policy Teaching Through Reward Function Learning Haoqi Zhang, David Parkes, and Yiling Chen School of Engineering and Applied Sciences Harvard University ACM EC 2009 Haoqi Zhang (Harvard University) Policy
More informationChapter 2. Games of Chance. A short questionnaire part 1
Chapter 2 Games of Chance A short questionnaire part Question Rank the following gambles: A: win $5 million with probability win $ million with probability win $ with probability B: win $5 million with
More informationLecture 11 Strategic Form Games
Lecture 11 Strategic Form Games Jitesh H. Panchal ME 597: Decision Making for Engineering Systems Design Design Engineering Lab @ Purdue (DELP) School of Mechanical Engineering Purdue University, West
More informationAn Artificially Intelligent Ludo Player
An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported
More informationLearning to play Dominoes
Learning to play Dominoes Ivan de Jesus P. Pinto 1, Mateus R. Pereira 1, Luciano Reis Coutinho 1 1 Departamento de Informática Universidade Federal do Maranhão São Luís,MA Brazil navi1921@gmail.com, mateus.rp.slz@gmail.com,
More informationTRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill
TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS Thomas Keller and Malte Helmert Presented by: Ryan Berryhill Outline Motivation Background THTS framework THTS algorithms Results Motivation Advances
More informationUMBC 671 Midterm Exam 19 October 2009
Name: 0 1 2 3 4 5 6 total 0 20 25 30 30 25 20 150 UMBC 671 Midterm Exam 19 October 2009 Write all of your answers on this exam, which is closed book and consists of six problems, summing to 160 points.
More informationCS 210 Fundamentals of Programming I Fall 2015 Programming Project 8
CS 210 Fundamentals of Programming I Fall 2015 Programming Project 8 40 points Out: November 17, 2015 Due: December 3, 2015 (Thursday after Thanksgiving break) Problem Statement Many people like to visit
More informationComparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage
Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage Richard Kelly and David Churchill Computer Science Faculty of Science Memorial University {richard.kelly, dchurchill}@mun.ca
More informationfinal examination on May 31 Topics from the latter part of the course (covered in homework assignments 4-7) include:
The final examination on May 31 may test topics from any part of the course, but the emphasis will be on topic after the first three homework assignments, which were covered in the midterm. Topics from
More informationUsing Artificial intelligent to solve the game of 2048
Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial
More informationFebruary 11, 2015 :1 +0 (1 ) = :2 + 1 (1 ) =3 1. is preferred to R iff
February 11, 2015 Example 60 Here s a problem that was on the 2014 midterm: Determine all weak perfect Bayesian-Nash equilibria of the following game. Let denote the probability that I assigns to being
More informationBuster Blackjack. BGC ID: GEGA (October 2011)
*Pure 21.5 Blackjack is owned, patented and/or copyrighted by TXB Industries Inc. *Buster Blackjack is owned, patented and/or copyrighted by Betwiser Games, LLC. Please submit your agreement with the Owner
More informationStatistical House Edge Analysis for Proposed Casino Game Jacks
Statistical House Edge Analysis for Proposed Casino Game Jacks Prepared by: Precision Consulting Company, LLC Date: October 1, 2011 228 PARK AVENUE SOUTH NEW YORK, NEW YORK 10003 TELEPHONE 646/553-4730
More informationBlackjack Card Counting Primer
Blackjack Card Counting Primer Because of the dynamic nature of the Internet, any web addresses or links contained in this book may have changed since publication and may no longer be valid. 1 Table of
More informationAgenda. Intro to Game Theory. Why Game Theory. Examples. The Contractor. Games of Strategy vs other kinds
Agenda Intro to Game Theory AUECO 220 Why game theory Games of Strategy Examples Terminology Why Game Theory Provides a method of solving problems where each agent takes into account how others will react
More informationGame Theory two-person, zero-sum games
GAME THEORY Game Theory Mathematical theory that deals with the general features of competitive situations. Examples: parlor games, military battles, political campaigns, advertising and marketing campaigns,
More informationLearning in 3-Player Kuhn Poker
University of Manchester Learning in 3-Player Kuhn Poker Author: Yifei Wang 3rd Year Project Final Report Supervisor: Dr. Jonathan Shapiro April 25, 2015 Abstract This report contains how an ɛ-nash Equilibrium
More informationCorticon - Making Change Possible
Corticon - Making Change Possible Decision Modeling Challenge February 2015 Use Case How can a given amount of money be made with the least number of coins of given denominations? Let S be a given sum
More informationStat 20: Intro to Probability and Statistics
Stat 20: Intro to Probability and Statistics Lecture 17: Using the Normal Curve with Box Models Tessa L. Childers-Day UC Berkeley 23 July 2014 By the end of this lecture... You will be able to: Draw and
More informationCasino Lab AP Statistics
Casino Lab AP Statistics Casino games are governed by the laws of probability (and those enacted by politicians, too). The same laws (probabilistic, not political) rule the entire known universe. If the
More informationCMU-Q Lecture 20:
CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro ICE-CREAM WARS http://youtu.be/jilgxenbk_8 2 GAME THEORY Game theory is the formal study of conflict and cooperation in (rational) multi-agent
More informationMIT 15.S50 LECTURE 5. Friday, January 27 th, 2012
MIT 15.S50 LECTURE 5 Friday, January 27 th, 2012 INDEPENDENT CHIP MODEL (ICM) In a cash game, clearly you should make decisions that maximize your expected # of chips (dollars). I ve always told you do
More information4. Games and search. Lecture Artificial Intelligence (4ov / 8op)
4. Games and search 4.1 Search problems State space search find a (shortest) path from the initial state to the goal state. Constraint satisfaction find a value assignment to a set of variables so that
More informationWelcome to the Best of Poker Help File.
HELP FILE Welcome to the Best of Poker Help File. Poker is a family of card games that share betting rules and usually (but not always) hand rankings. Best of Poker includes multiple variations of Home
More informationReinforcement Learning
Reinforcement Learning Reinforcement Learning Assumptions we made so far: Known state space S Known transition model T(s, a, s ) Known reward function R(s) not realistic for many real agents Reinforcement
More informationBLACKJACK Perhaps the most popular casino table game is Blackjack.
BLACKJACK Perhaps the most popular casino table game is Blackjack. The object is to draw cards closer in value to 21 than the dealer s cards without exceeding 21. To play, you place a bet on the table
More information