Ultimate X Bonus Streak Analysis

Similar documents
Fall 2018 #11 Games and Nimbers. A. Game. 0.5 seconds, 64 megabytes

Review: Our Approach 2. CSC310 Information Theory

Understanding the Spike Algorithm

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

UNIT 11 TWO-PERSON ZERO-SUM GAMES WITH SADDLE POINT

Learning Ensembles of Convolutional Neural Networks

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

Control Chart. Control Chart - history. Process in control. Developed in 1920 s. By Dr. Walter A. Shewhart

problems palette of David Rock and Mary K. Porter 6. A local musician comes to your school to give a performance

Calculation of the received voltage due to the radiation from multiple co-frequency sources

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

Uncertainty in measurements of power and energy on power networks

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

Adaptive Modulation for Multiple Antenna Channels

MTBF PREDICTION REPORT

4.3- Modeling the Diode Forward Characteristic

Test 2. ECON3161, Game Theory. Tuesday, November 6 th

Digital Transmission

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation

Network Reconfiguration in Distribution Systems Using a Modified TS Algorithm

Introduction to Coalescent Models. Biostatistics 666 Lecture 4

NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION

High Speed ADC Sampling Transients

Secure Transmission of Sensitive data using multiple channels

A TWO-PLAYER MODEL FOR THE SIMULTANEOUS LOCATION OF FRANCHISING SERVICES WITH PREFERENTIAL RIGHTS

ANNUAL OF NAVIGATION 11/2006

N( E) ( ) That is, if the outcomes in sample space S are equally likely, then ( )

Priority based Dynamic Multiple Robot Path Planning

Introduction to Coalescent Models. Biostatistics 666

Analysis of Time Delays in Synchronous and. Asynchronous Control Loops. Bj rn Wittenmark, Ben Bastian, and Johan Nilsson

Ensemble Evolution of Checkers Players with Knowledge of Opening, Middle and Endgame

Location of Rescue Helicopters in South Tyrol

Guidelines for CCPR and RMO Bilateral Key Comparisons CCPR Working Group on Key Comparison CCPR-G5 October 10 th, 2014

NETWORK 2001 Transportation Planning Under Multiple Objectives

Chapter 1. On-line Choice of On-line Algorithms. Yossi Azar Andrei Z. Broder Mark S. Manasse

Tile Values of Information in Some Nonzero Sum Games

Decomposition Principles and Online Learning in Cross-Layer Optimization for Delay-Sensitive Applications

The Spectrum Sharing in Cognitive Radio Networks Based on Competitive Price Game

Networks. Backpropagation. Backpropagation. Introduction to. Backpropagation Network training. Backpropagation Learning Details 1.04.

Joint Power Control and Scheduling for Two-Cell Energy Efficient Broadcasting with Network Coding

Throughput Maximization by Adaptive Threshold Adjustment for AMC Systems

Performance Analysis of Multi User MIMO System with Block-Diagonalization Precoding Scheme

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

Double-oracle Algorithm for Computing an Exact Nash Equilibrium in Zero-sum Extensive-form Games

Fast Code Detection Using High Speed Time Delay Neural Networks

EMA. Education Maintenance Allowance (EMA) Financial Details Form 2017/18. student finance wales cyllid myfyrwyr cymru.

Generalized Incomplete Trojan-Type Designs with Unequal Cell Sizes

Graph Method for Solving Switched Capacitors Circuits

EE 508 Lecture 6. Degrees of Freedom The Approximation Problem

The Byzantine Generals Problem

High Speed, Low Power And Area Efficient Carry-Select Adder

Research on the Process-level Production Scheduling Optimization Based on the Manufacturing Process Simplifies

1 GSW Multipath Channel Models

antenna antenna (4.139)

Prevention of Sequential Message Loss in CAN Systems

Opportunistic Beamforming for Finite Horizon Multicast

A Lower Bound for τ(n) of Any k-perfect Numbers

A General Framework for Codes Involving Redundancy Minimization

Distributed Uplink Scheduling in EV-DO Rev. A Networks

Chinese Remainder. Discrete Mathematics Andrei Bulatov

Exploiting Dynamic Workload Variation in Low Energy Preemptive Task Scheduling

Application of Intelligent Voltage Control System to Korean Power Systems

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate

Comparison of Two Measurement Devices I. Fundamental Ideas.

Joint Adaptive Modulation and Power Allocation in Cognitive Radio Networks

Dynamic Lightpath Protection in WDM Mesh Networks under Wavelength Continuity Constraint

Multiband Jamming Strategies with Minimum Rate Constraints

A Simple Satellite Exclusion Algorithm for Advanced RAIM

Traffic balancing over licensed and unlicensed bands in heterogeneous networks

Space Time Equalization-space time codes System Model for STCM

Revision of Lecture Twenty-One

The Impact of Spectrum Sensing Frequency and Packet- Loading Scheme on Multimedia Transmission over Cognitive Radio Networks

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

ECE315 / ECE515 Lecture 5 Date:

VRT014 User s guide V0.8. Address: Saltoniškių g. 10c, Vilnius LT-08105, Phone: (370-5) , Fax: (370-5) ,

International Journal of Network Security & Its Application (IJNSA), Vol.2, No.1, January SYSTEL, SUPCOM, Tunisia.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Proprietary Page 1

On Sensor Fusion in the Presence of Packet-dropping Communication Channels

Performance Analysis of the Weighted Window CFAR Algorithms

Network Theory. EC / EE / IN. for

Multi-Robot Map-Merging-Free Connectivity-Based Positioning and Tethering in Unknown Environments

29. Network Functions for Circuits Containing Op Amps

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems

Sorting signed permutations by reversals, revisited

Models for Intra-Hospital Patient Routing

Latency Insertion Method (LIM) for IR Drop Analysis in Power Grid

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

Medium Term Load Forecasting for Jordan Electric Power System Using Particle Swarm Optimization Algorithm Based on Least Square Regression Methods

Low Switching Frequency Active Harmonic Elimination in Multilevel Converters with Unequal DC Voltages

Utility Maximization for Uplink MU-MIMO: Combining Spectral-Energy Efficiency and Fairness

EE301 AC Source Transformation and Nodal Analysis

Chaotic Filter Bank for Computer Cryptography

Section 5. Signal Conditioning and Data Analysis

Distributed Channel Allocation Algorithm with Power Control

Shunt Active Filters (SAF)

Utility-based Routing

Transcription:

Ultmate X Bonus Streak Analyss Gary J. Koehler John B. Hgdon Emnent Scholar, Emertus Department of Informaton Systems and Operatons Management, 35 BUS, The Warrngton College of Busness, Unversty of Florda, Ganesvlle, FL 326, (koehler@ufl.edu). Ths paper extends an analyss of Ultmate X Vdeo Poker to a new varaton on ts theme. Instead of an outcome generatng an mmedate return plus establshng a multpler of the next round s return, n Bonus Streak a set of multplers s establshed for subsequent hands. Ths paper analyzes ths new type of game. Key words: Gamblng, non-dscounted Markov Decson Problem, Vdeo Poker, Ultmate X. January, 207 Revsed March, 207 Revsed September, 208 (added vulturng secton) Copyrght 207 Gary J. Koehler

. Introducton We refer the reader to our earler paper analyzng Ultmate X Poker [2] for basc concepts. Ultmate X Bonus Steak alters the basc dea of Ultmate X Poker by offerng a stream of multplers (a streak) for dfferent outcomes to be appled to subsequent hands of play, not just a sngle multpler for the next hand as n the orgnal Ultmate X games. Lke Ultmate X, ths game costs twce the normal underlyng game s maxmum bet amount to actvate the Bonus Streak (e.g., the normal maxmal bet amount s 5 cons per lne n Jacks or Better). That s, t cost 0 cons per lne n Ultmate X. As s usual for mult-lne games, each game starts wth the same hand dealt to all lnes of play and the held cards apply to each lne. The outcomes come from ndependent draws from decks wth the cards of the ntal hand removed. Table shows per con payouts (based on the ntal 5 cons) and multpler streaks for each possble outcome for a Deuces Wld game. For example, f on a lne of play the current multpler s and one gets a Straght Flush then he wll be pad 80 cons (5 tmes the outcome payout of 3). The 5 s because we are showng payouts on a per-con bet bass and 5 cons were bet (the addtonal 5 cons wagered were to enable the bonus streak feature). Ths wn sets up a streak so the next hand s multpler wll be 2, the subsequent 4 and so forth. However, f when n the mdst of usng a streak s multplers, the player gets an outcome wth another nonunt streak, then the current streak s remanng multplers are changed to multplers of 2. Outcome Per Con Payout Streak Royal Straght Flush 800 2,4,7,0,2 Four Deuces 200 2,4,7,0,2 Wld Royal Straght Flush 25 2,4,7,0,2 Fve of a Knd 6 2,4,7,0,2 Straght Flush 3 2,4,7,0,2 Four of a Knd (4K) 4 2,2,4 Full House (FH) 3 2,2,4 Flush 2 2,2,4 Straght 2 Three of a Knd (3K) Nothng 0 Table : Ultmate X Bonus Steak Multples, Deuces Wld Both Ultmate X and Ultmate X Bonus Streak were created by IGT (https://www.gt.com/) and are offered n ther vdeo poker machnes. 2

For example, suppose there s just one multpler n place n the current streak for a lne of play. Then let s track what happens wth the followng sequence of hands and outcomes shown n Table 2. The frst hand results n a Three of a Knd and the payout s multpled by the Outcome Multpler of. The new streak s just. The Straght Flush wth Hand 2 sets up a streak of future multplers (2,4,7,0,2). We see these successvely appled n the next two hands. However, the Full House outcome at Hand 4 would normally establsh a streak of 2,2,4 but snce we already have a streak longer than one element, the current remanng streak (7,0,2) s changed to all 2 multplers (.e., to 2,2,2). Hand Startng Streak Outcome Outcome Multpler New Streak Three of Knd 2 Straght Flush 2,4,7,0,2 3 2,4,7,0,2 Nothng 2 4,7,0,2 4 4,7,0,2 Full House 4 2,2,2 5 2,2,2 Three of Knd 2 2,2 6 2,2 Nothng 2 2 7 2 Nothng 2 8 Nothng Table 2: Example of Multpler Evoluton Table 3 shows the possble streaks one mght see at the start of a hand. Streak Streak Values 2 4 3 2 4 2,4 5 0,2 6 2,2 7 2,2,4 8 7,0,2 9 2,2,2 0 4,7,0,2 2,2,2,2 2 2,4,7,0,2 Table 3: Possble Observable Multpler Streaks 3

2. Expected Value Analyss Let M be the set of possble startng multpler streaks. For example, for the streaks n Table 3 we have ( ),( 2, 2, 4 ),( 2, 4 ),( 4 ),( 2,2 ),( 2 ),( 2, 4,7,0,2 ), ( 4,7,0,2 ),( 7,0,2 ),( 0,2 ),( 2,2,2,2 ),( 2,2,2) M =. Lkewse, let Ω be the set of permutatons of the elements of M taken L (the number of lnes) at a tme wth repetton. So for a 3-Lne game, each Ω looks lke (,, ) M and the j th multpler of s ( j) mght see for the L lnes before startng a hand of play. = where 2 3. Ω gves all of the possble streak states a player Techncally, the startng state of each round of play s (, H ) where Ω results from the prevous hands outcomes and H s a randomly generated next hand and s the set of all possble startng hands. Snce the outcome of any acton depends on just (, H ) and what a decson maker chooses to hold n H, and not the hstory leadng one to ths state, the Markov property holds and the resultng problem s a Markov Decson problem 2. Ths s not to say that all states can be reached n one step as was the case wth the Ultmate X game n [2]. For example, for a one-lne game, f the startng state s (( 2, 2, 4 ), H ), the only states that could be reached are (( 2, 4 ),*) and (( 2,2 ),*). That s, the only realzable endng streaks are ( ) ( 2,2 ). 2, 4 and As n [2], we choose to study the non-dscounted stream of returns and, for practcal matters, assume the horzon s nfnte. Thus we focus on solvng the nfnte horzon, non-dscounted, Markov Decson problem (ndmdp) whch s represented by L v + g = PH max ( ) RH + P, ( H ) g vg Ω H = g Ω Pv = 0 2 The assocated Markov chans are readly shown to be ergodc. 4

Here g s the maxmal gan per round of play, v s the relatve bas for state Ω, P s the steady-state probablty of beng n state (before a hand s dealt) under optmal decsons, and P s the probablty of beng dealt hand H. Note that g ( 2 ) H L s the optmal expected return per bet unt for the game, the value we wsh to compute. The 2 comes from the game costng twce the normal amount on whch the payouts are based. For each hand, one must decde whch of the possble =, 2, 32 ways to hold subsets of H, desgnated by H. Each possble decson results n an expected outcome for the hand, R H, and a probablty of transtonng to state γ of γ, ( H ). Note that n the formulaton above, we have reduced the startng state from (, H ) P by averagng out the mpact of the random startng hand (hence the ). to Snce as R s ndependent of the multplers, let m( ) ( ) H L = = and we can rewrte the problem v + g = PH max m( ) RH + P, ( H ) g vg Ω g Ω () Pv = 0 Consder P ( H ) γ,. Ths s the probablty of startng n state and transtonng to state γ. Ths depends on whch cards n H are held (desgnated by decson leadng to holdng H ) and the varous possble outcomes (Straght, Flush, etc.) afterwards. Let be the set of possble outcomes and ( ) Po H H be the probablty of outcome o when cards H are held from hand H. For each outcome there s a payout and a streak (see Table for example). The resultng streak s a functon of the startng streak and the outcome represented by s( o, ). Note, for regular Ultmate X, s( o, ) s ndependent of, t depends only on the hand s outcome. States n Bonus Streak havng only sngle-length streaks also exhbt ths property. 5

( ) For example n a 2-Lne game, f the startng state has ( 2, 2, 4 ),( ) streaks are Thus ( 2, 4 ) o { Straght,3 K, Nothng} ( 2, 2, 4) ( 2,2) otherwse ( ) o { Straght,3 K, Nothng} ( ) ( 2, 2, 4) o { 4 K, FH, Flush} ( 2, 4, 7,0,2) otherwse γ, and P ( H ) ( ) ( ) P H P o H L o γ = so (, ) ( ) = ( ) = ( ) P H P H P o H γ, γ, = = o γ = so (, ) γ, L = the possble resultng s the probablty of outcomes havng an assocated multpler of γ gven one starts n state (, H ) and chooses to hold H. As n [2], we can teratvely solve () by n+ n+ n+ n v + g = e = PH max m( ) RH + P, ( H ) g vg Ω g Ω g = P e n+ n+ n+ ( ) P = PP H Ω n+ n * γ γ, γ Ω * The term P, ( H ) γ stands for the value of P, ( H ) γ wth an optmal decson. (2) As dscussed n [2], the number of permutatons (wth repetton) of M thngs L at a tme s M L, so a 0-Lne verson of Ultmate X Bonus Streak wth the multplers shown n Table has 0 2 = 6,97,364,224 multpler patterns a player may see. So the true number of states s n L M where n s the sze of the deck of cards used (assumng order of the cards s not 5 mportant). For example, for decks of 52 cards and a 0-lne game, the number of states s on the order of 7 0, over 00 quadrllon. 6

Fortunately, some of the problem sze reductons dscussed n [2] can be used n the Bonus Streak game. In partcular, the reductons are:. Use equvalent sute permutatons of hands to reduce H to unque hands H. Ths s easly mplemented by lettng P H reflect the number of dfferent sute permutatons for a gven hand. For games wth 52 cards, ths reduces the sze of from 2,598,960 hands to 34,459 n. 2. Use state permutatons to reduce the state space. For example, n a 3-Lne game, state {( ),( 2, 4 ),( 2,2 )} wll gve the same expected payouts as state {( 2,4,,2,2 ) ( ) ( )} and state {( ),( 2,2 ),( 2, 4 )} snce the order of the multplers across the lnes of play s not mportant. As n [2] we let C Ω contan just the unque combnatons (say those n sorted order) and denote equvalent states γ n Ω for each γ C. Unfortunately, a thrd reducton n [2] frst suggested by Mchael Shackelford [4] s not vald here. That reducton stated that all states havng the same value of m( ) are equvalent. The proof gven n [2] reled on the fact that P ( H ) γ, was ndependent of whch s not the case wth Bonus Streak unless the states are composed of sngle-length streaks. Let C Ω contan just the unque combnatons (say those n sorted order). So M + L C =. M Wth the reductons, we wsh to solve n+ n+ n+ n v + g = e = PH max m( ) RH + P, ( H ) g vg C g C g = P e n+ n+ n+ ( ) γ, n+ n P PP γ γ, H S H γ Ω = Ω Wth the reductons, we need to adjust our defnton of P ( H ) γ,. Let (3) 7

L ( ) ( ) ( ) P H = P H = P o H γ, Ω γ,, = = η η Ω η Ω o η γ η γ η = so (, ) L Note, the orgnal values are v = v for γ Ω/ C, γ. As n [2], we stop (3) when n+ n+ γ 0. (4) n+ n n+ n n+ n 0 g g + v v + P P < C C C We solved a hypothetcal 3 -Lne verson of Deuces Wld n Table to get a gan (g) of.94665 and steady state values shown n Table 4. The Expected Value (EV) s.94665/2 = 0.973325. Deuces Wld Lne v P -2.506 0.680794 4 0.363605 0.06955 2 8.08679 0.03268 2,4.3052 0.079446 0,2 5.866 0.006048 2,2 7.759 0.04929 2,2,4 3.38067 0.09454 7,0,2 20.867 0.006858 2,2,2 27.47 0.002 4,7,0,2 23.5686 0.007795 2,2,2,2 37.0822 0.0074 2,4,7,0,2 25.2629 0.008969 Table 4: Soluton to one lne verson of the game wth multples n Table 2 Table 5 gves the outcomes for the -3 Lne versons of ths Deuces Wld game. Actual machnes n casnos currently only offer 3, 5 and 0-Lne versons, so the -Lne and 2-Lne versons are hypothetcal. Deuces Wld Vdeo Poker g EV -Lne.94665 0.973325 2-Lnes 3.88404 0.9700 3-Lnes 5.8832 0.96972 Table 5: Optmal expected returns for Deuces Wld Ultmate X Bonus Streak. 3 Although we have not seen a -Lne verson of the game, we antcpate ther ntroducton just as -Lne games of Ultmate X were eventually released by IGT. 8

Interestngly, the Bonus Streak game appears to exhbt the same phenomenon that the Ultmate X games showed (Page 6, [2]): the mpact on expected return as the number of lnes ncreases s negatve Note the EVs reduce as the number of lnes ncrease n Table 5. As another example, Table 6 gves the payouts and streaks for 7-5 Bonus Poker Deluxe. Outcome Payout Streak Royal Straght Flush 800 2,5,8,0,2 Straght Flush 50 2,5,8,0,2 Four of a Knd (4K) 80 2,5,8,0,2 Full House (FH) 7 2,5,8,0,2 Flush 5 2,5,8 Straght 4 2,5 Three of a Knd (3K) 3 2,5 Two Par Jacks or Better Par Nothng 0 Table 6: Ultmate X Bonus Steak Multples, Bonus Poker Deluxe Table 7 gves the outcomes for the -3 Lne versons of Bonus Poker Deluxe and Table 8 ts steady-state values for -Lne. Bonus Deluxe g EV -Lne.9388 0.969092 2-Lnes 3.86879 0.96798 3-Lnes 5.79847 0.96642 Table 7: Optmal expected returns for Bonus Poker Deluxe Ultmate X Bonus Streak. 9

Bonus Deluxe Lne v P -2.43554 0.750595 5.3749 0.064623 8 4.2542 0.0538 2 8.0293 0.02368 2,5 2.4009 0.073308 5,8 7.5652 0.03025 0,2 5.79 0.008086 2,2 7.75 0.00536 2,5,8 8.7366 0.04784 8,0,2 2.7695 0.0097 2,2,2 27.3272 0.00257 5,8,0,2 25.2758 0.00296 2,2,2,2 36.9393 0.0039 2,5,8,0,2 26.6274 0.0688 Table 8: Optmal relatve bases and steady state probabltes for Bonus Deluxe. The challenge wth analyzng games beyond 3-Lnes s easly seen n Table 9 where we show the szes of the states for the Deuces Wld game of Table. L -Lne 3-Lnes 5-Lnes 0-Lnes Ω= M 2,728 248,832 6,97,364,224 M + L C = L 2 364 4,368 352,76 Table 9: Sze of Sets for Ultmate X Bonus Streak Deuces Wld For example, usng the state reducton to C for a 0-Lne game gves 352,75 states. For each state we need to fnd the optmal hold of 34,459 hands, each requrng 32 probablty vectors and expected value calculatons. That s, over.5 trllon calculatons for each are needed at each teraton n (3). Wth Ultmate X, the thrd state sze reducton (whch s not generally applcable here) to set D (n [2]) reduced the state space sze dramatcally. For the Deuces Wld game examned n [2], the szes are as shown n Table 0. Notce that the 0-Lne Ultmate X game was easer to solve than the 3-Lne game of Bonus Streak Ultmate X. 0

L -Lne 3-Lnes 5-Lnes 0-Lnes Ω= M 7 343 6,807 282,475,249 C M + L = L 7 84 462 8,008 D 7 29 5 06 Table 0: Sze of Sets n [2] for Ultmate X Deuces Wld In short, wthout some massvely parallel computng platform, some new nsghts are needed to solve the Bonus Streak versons of Ultmate X for 0-Lne games. 5-Lne games are wthn reach but wll take weeks to solve. 3. Possble Speed-ups Some obvous computatonal speed-ups nclude precomputng the followng values whch don t change from teraton to teraton: Pm R H 2 = 2. ( ),,, 32 H H PP H γ, H H 2, γ, C, =, 2, 32 2. ( ) The second suggeston above may be mpractcal because C grows so fast and s large. Smlarly, dvdng the teratons to parallel computatons over and C are easly done. Wth most processors mplementng multple cores and hyper-threadng, parallel computng s possble 4. As mentoned when dscussng state-space reductons, t was noted we can have a small reducton of states by collapsng those states havng all sngle-length streaks and equal ( ) m values. The mpact s mnmal, however. For example, n the Jacks or Better game shown n Secton 4 below, the 3-Lne game has 560 states n C and only 7 can be reduced usng ths 4 We used 0 of our 2 cores on a Xeon E5645 Intel processor.

equvalence. The overhead to mplement ths reducton hardly covers the slght reducton n state space sze. Another possble speed-up can be acheved usng a termnaton crteron frst suggested by Odon [3]. He showed that L L g L L n n+ n+ n L = max e v n n+ n L = mn e v n n+ n So, stoppng when n+ n+ L L ε < wll provde a good estmate of g for small enough ε. For examples, for the frst Jacks or better game shown later usng ε values shown n the Table below, we found the followng number of teratons needed to acheve the stoppng condton: Lnes 2 3 Iteratons wth Condton (4) 28 29 30 8 Iteratons wth ε = 0 25 25 26 7 Iteratons wth ε = 0 22 24 24 6 Iteratons wth ε = 0 2 2 2 Ths stoppng crteron may not leave us wth as accurate estmates of the steady state probabltes or relatve bas values as the stoppng crteron dscussed earler wth Equaton (4), but t could save teraton rounds f we are nterested n just computng the gan of a game. In [2] we dscussed some addtonal computatonal reductons. One was to use other forms of teraton where both storage requrements and rate of convergence mproved when applcable. Such methods exst for solvng dscounted, nfnte-horzon, Markov Decson problems. However, we know of no way to mplement these for the non-dscounted problem wthout frst convertng t to a form where they can be appled (as done by Koehler et al. n []) whch tself requred solvng a Markov decson problem. We also mentoned t s possble to permanently elmnate sub-optmal decsons as the teraton proceeds, thus, n prncple, reducng the problem sze. In our exploratons of ths approach, the overhead ntroduced dd not justfy the mprovement n convergence speed. 2

4. Results Below are the results we found for a selecton of games, pay tables and bonus streaks for -Lne and 3-Lne versons of the game. Game Pays Streak EVs 3K STR, FL HIGHER Regular -Lne 3-Lne Double Double Bonus 9-5 2,4 2,4,8 2,4,8,0,2 0.978729 0.987373 0.98409 Double Double Bonus 8-5 2,4 2,4,8 2,4,8,0,2 0.96786 0.976225 0.972826 Double Double Bonus 7-5 2,4 2,4,8 2,4,8,0,2 0.95720 0.96594 0.96655 Double Double Bonus 6-5 2,4 2,4,8 2,4,8,0,2 0.946569 0.954333 0.950554 Trple Double Bonus 9-6 2,4 2,4,8 2,4,8,0,2 0.98540 0.99389 0.990460 Trple Double Bonus 9-5 2,4 2,4,8 2,4,8,0,2 0.970204 0.978236 0.974948 Trple Double Bonus 8-5 2,4 2,4,8 2,4,8,0,2 0.959687 0.967222 0.963846 Trple Double Bonus 7-5 2,4 2,4,8 2,4,8,0,2 0.94978 0.956277 0.952820 Double Bonus 9-6-5 2,4 2,4,7 2,4,7,,2 0.978062 0.982587 0.980935 Double Bonus 9-6-4 2,4 2,4,8 2,4,8,0,2 0.963754 0.976847 0.97479 Double Bonus 9-5-4 2,4 2,4,8 2,4,8,0,2 0.952738 0.96297 0.959335 Double Bonus 8-5-4 2,4 2,4,8 2,4,8,0,2 0.94897 0.95099 0.947926 Bonus Poker 7-5 2,4 2,4,8 2,4,8,0,2 0.98047 0.987757 0.98463 Bonus Poker 6-5 2,4 2,4,8 2,4,8,0,2 0.968687 0.97627 0.97329 Jacks or Better 9-5 2,4 2,4,8 2,4,8,0,2 0.984498 0.992208 0.989064 Jacks or Better 8-5 2,4 2,4,8 2,4,8,0,2 0.972984 0.980650 0.977559 Jacks or Better 7-5 2,4 2,4,8 2,4,8,0,2 0.96472 0.969092 0.966057 Jacks or Better 6-5 2,4 2,4,8 2,4,8,0,2 0.94996 0.957538 0.954556 3K STR FLUSH HIGHER Bonus Poker Deluxe 8-6 2,5 2,5,7 2,5,7,,2 0.984928 0.99525 0.987909 Bonus Poker Deluxe 8-5 2,5 2,5,8 2,5,8,0,2 0.974009 0.980375 0.977853 Bonus Poker Deluxe 7-5 2,5 2,5,8 2,5,8,0,2 0.962526 0.969092 0.96642 Bonus Poker Deluxe 6-5 2,5 2,5,8 2,5,8,0,2 0.9536 0.95830 0.955063 FL, FH, 4K HIGHER Deuces Wld 20-2-0-4-4-3 2,2,4 2,4,4,,2 n/a 0.97579 0.98447 0.98442 Deuces Wld 20-2-0-4-4-3 2,2,4 2,4,4,0,2 n/a 0.97579 0.98346 0.978687 Deuces Wld 25-6-3-4-3-2 2,2,4 2,4,7,0,2 n/a 0.96765 0.973327 0.96972 Deuces Wld 20-0-8-4- 4-3 2,2,4 2,4,5,0,2 n/a 0.959638 0.966627 0.963748 Deuces Wld 25-5-0-2,2,4 2,4,8,0,2 n/a 0.94882 0.95486 0.950898 4-3-2 Bonus Deuces Wld 0-4-3-3 2,2,4 2,4,5,0,2 n/a 0.973644 0.98850 0.983837 Bonus Deuces Wld 2-4-3-2 2,2,4 2,4,8,0,2 n/a 0.96283 0.975882 0.9770 Bonus Deuces Wld 2-4-3-2 2,2,4 2,4,6,0,2 n/a 0.96283 0.969432 0.965357 Bonus Deuces Wld 0-4-3-2 2,2,4 2,4,6,0,2 n/a 0.953368 0.958227 0.953696 Tables from actual 3- and 5-Lne Games 3

Streak EVs Game Pays 3K, STR, FL HIGHER Regular -Lne 3-Lne Double Double Bonus 9-5 2,3,4 2,3,4,8,2 0.978729 0.986802 0.984845 Double Double Bonus 8-5 2,3,4 2,3,4,8,2 0.96786 0.975780 0.973698 Trple Double Bonus 9-6 2,3,4 2,3,4,8,2 0.98540 0.99724 0.99054 Trple Double Bonus 9-5 2,3,4 2,3,4,8,2 0.970204 0.977546 0.975763 Double Bonus 9-6-5 2,3,4 2,3,4,8,2 0.978062 0.99708 0.99053 Double Bonus 9-6-4 2,3,4 2,3,4,8,2 0.963754 0.97594 0.973749 Bonus Poker 7-5 2,3,4 2,3,4,8,2 0.98047 0.98683 0.985326 Bonus Poker 6-5 2,3,4 2,3,4,8,2 0.968687 0.975253 0.973772 Jacks or Better 8-6 2,3,4 2,3,4,7,2 0.983927 0.989636 0.98834 Jacks or Better 8-5 2,3,4 2,3,4,8,2 0.972984 0.979645 0.97864 Bonus Poker Deluxe 8-5 2,3,4 2,3,4,8,2 0.974009 0.983523 0.98652 Bonus Poker Deluxe 7-5 2,3,4 2,3,4,8,2 0.962526 0.972368 0.970263 FL, FH, 4K HIGHER Deuces Wld 20-2-9-4- 4-3 2,2,4 2,4,8,0,2 0.970554 0.988058 0.9857 Deuces Wld 25-6-3-2,2,4 2,4,8,0,2 0.96765 0.976643 0.972999 4-3-2 Bonus Deuces Wld 0-4-3-3 2,2,4 2,4,7,0,2 0.973644 0.993026 0.99008 Bonus Deuces Wld 2-4-3-2 2,2,4 2,4,8,0,2 0.96283 0.975882 0.9770 Tables from actual 0-Lne Games 5. Vulturng Vulturng refers to the process of scavengng left-over multplers from prevous players. For Ultmate-X games, f there are any multplers greater than one, the expected value of playng a hand at a 5 con bet s postve. In Bonus Streak games, that strategy doesn t work because the multplers are dsabled for any bet less than the maxmum bet. However, the left-over multplers may stll lead to a postve expected value. Suppose one fnds the followng left-over multplers n a 9-5 Jacks or Better game: {( 2,4,, ) ( ) ( )}. Should one vulture ths? If only one hand wll be played (at a max bet) for a 3-lne game, the expected value s 0.984498 * 4/6 = 0.656332. The 0.984498 s the expected value for the normal 9-5 game (snce we are playng just one hand). The 4/6 s the average multpler per con-n. So ths s not attractve. However, snce we are playng at the max bet, we have the potental of new streaks and the next state may compensate for the expected loss for the current state. So, lke normal play, we must antcpate future hands, even n vulturng. 4

So whch states should we vulture? Two condtons can be consdered. From a soluton to () we have one condton. Vulture state f C: v + g 2L. Ths rule takes nto account future hands. Another condton s also obvous, vulture a state f C2: ( ) max 2 m P R L. H 2 H H Ths latter condton just consders only one hand (played perfectly for the underlyng game) and gnores any future possbltes. It s possble that a state satsfes the second condton wthout satsfyng the frst one. For example, n the Deuces Wld game used throughout ths paper, the state {( ),( 4 ),( 2,4 )} has m( ) PH max RH = 7 0.9676505 = 6.7735535 > 6 but v + g = -0.8098685+5.883247=5.0084562<6. Ths means that playng the hand myopcally for one hand s better than usng perfect play for the regular Bonus Streak game. Of course, one may get lucky wth a new set of attractve multplers. Lkewse, t s possble a state satsfes the frst condton wthout satsfyng the second. For example, the state {( ),( ),( 4,7,0,2) } has m PH RH but ( ) max = 6 0.9676505 = 5.805903 < 6 v + g = 8.4245963+5.883247=24.24292>6. One can see that, although one hand wll be played wth a negatve expected value, the subsequent three hands wll all have a postve expected value. Strctly speakng, condton (C) assumes play wll follow the normal optmal play for the Bonus Streak game. However, we won t be playng a normal Bonus Streak game but rather one that termnates wth unattractve states. Lkewse, (C2) assumes we wll play the hand myopcally, 5

gnorng any future hands (at least untl we see the next state whch mght be good). We propose usng C where the gan and relatve bas values are determned by what we call the Optmal Vulturng problem and C2 only when a state does not satsfy C but does satsfy C2. Here we gve a formulaton for the Optmal Vulturng problem. For any state f we should vulture the game under rule C n state and 0 otherwse. C let δ ( ) be max g δ v + g = PH max m( ) RH + P, ( H ) ( ) g vgδ g Ω g Ω Pv = 0 These gan and bas values may be dfferent from those determned by (). (4) Defne the followng for a fxed set C. v + g = PH max m( ) RH + P, ( H ) ( ) g vgδ g Ω g Ω Pv = 0 0 δ ( ) = ( ) Theorem Gven a soluton to ( ) then for any * v * v ( ) δ ( ) < 0 = 0 ( ) δ ( ) > 0 = Proof: We have for any state s 6

So gves Now, f vs + g = PH max m s RH + P g, H v g Ω ( ) ( ) ( ) ( ) δ ( g) = P max m( s) R + P ( H ) v δ ( g) + P H v { } ( ) ( ) ( ) δ ( ) H H s, g g s, g Ω/ ( ) δ ( g) = vδ PP H + P ms R + P H v s H s, * * s H H * s, g ( s) ( s) g g C/ { } Pv = 0 s s ( ) vδ ( ) Ps PP H s, H * ( s) s g = + Ps PH m( s) RH + P ( * ( ) ( ) ) ( ) * s, g H v s s gδ g s g C/ { } * v < 0 and δ ( ) = or * v > 0 and ( ) 0 g δ = we reach a contradcton of the optmalty of our soluton snce wth all the other decsons held constant, we could acheve a better gan value snce s ( *( ) ) P PP H > s H s, 0 s. Theorem suggests a greedy algorthm to solve (4). Let condtons of Theorem 2 are not satsfed for some set < δ ( ) = * 0 v 0 * v 0 = C. Gven a soluton to (), f any Then use the followng greedy algorthm:. Solve for the steady state values usng the followng teratve approach: n+ n+ n v + g = PH max m( ) RH + P, ( H ) ( ) g vgδ g C g C 7

g = P e n+ n+ n+ ( ) γ, = Ω n+ n P PP γ γ, H S H γ Ω 2. If any values for a state do not meet the condtons of Theorem, pck one and change ts δ value and return to Step. Otherwse stop. Snce the gan ncreases wth each cycle, the soluton monotoncally ncreases untl no further opportuntes exst. Ths does not guarantee that the greedy algorthm stops wth an optmal soluton. However, we have not seen any solutons better than the ones we have found usng the greedy algorthm. Here are the steady-state results for the Deuces Wld game hghlghted n ths paper. Deuces Wld Vdeo Poker EV -Lne.65837 2-Lnes.527503 3-Lnes tba These values are not ndcatve of one s vulturng EV snce no smple scheme dctates what collecton of multplers a person mght abandon. The reasons one stops playng a game and leavng unused multplers are vared and ndetermnate and could easly nclude factors lke fatgue, alcohol consumpton, fnancal resources, superstton, other oblgatons, unacceptable condtons (lke an obnoxous player, too cold, too much nose, etc.), and so forth. So not knowng the probablty of fndng an abandon game state, computng an overall expected value s mpossble. So, assumng we have an optmal soluton to (4), we have the followng vulturng rules. Vulture a state f C: v + g 2L and play accordng to optmal decsons usng (4) values. Otherwse, vulture a state f 8

C2: ( ) max 2 m P R L H 2 H H underlyng vdeo poker game. and play t myopcally usng perfect play for the Of some nterest s the sze of δ ( δ) for the Deuces Wld Bonus Streak game used earler. : m PH max RH < 2L. Table shows the szes of H 2 L -Lne 3-Lnes 5-Lnes 0-Lnes Ω= M 2,728 248,832 6,97,364,224 C M + L = L 2 364 4,368 352,76 4 22 76? Table : Sze of Sets for Ultmate X Bonus Streak Deuces Wld 6. Summary Ths paper presented an analyss of Ultmate X Bonus Streak games. Ths generalzes the results of Ultmate X games [2] snce Ultmate X can be consdered as a specal case of Ultmate X Bonus Streak. However, Ultmate X can be solved faster usng reductons that can t be used wth Bonus Streak games. At the present tme, we are unable to solve Bonus Streak games wth 0-Lnes because the state space s so large. 5-Lne games are wthn reach, but we have not solved them yet. We are workng on new nsghts and algorthmc mprovements. Lastly, varous condtons for determnng proftable vulturng states were determned. 9

7. Acknowledgements We apprecate the many e-mal dscussons wth Mchael Shackleford, The Wzard of Odds. We thank also Rck Percy from Columbus, Oho who caught a typo n one pay table and an nconsstency n another table where our Streaks ddn t match what we used n our code. Thanks go also to Nel Shatz whose comments on vulturng nspred the new secton on vulturng. 20

References [] An Iteratve Procedure for Non-Dscounted Dscrete-Tme Markov Decsons, G. J. Koehler, A. B. Whnston, and G. P. Wrght, Naval Research Logstcs Quarterly, pp. 79-723, December, 974. [2] Koehler, G. J., 200. Ultmate X Poker Analyss. http://playperfectllc.com/uploads/3/4/9/0/34902374/ultmatex.pdf [3] Odon, A.R., "On Fndng the Maxmal Gan for Markov Decson Processes," Operatons Research, 7, pp. 857-860 (969). [4] Shackleford, M., 200. http://wzardofodds.com/ultmatex 2