Learning in 3-Player Kuhn Poker

Size: px
Start display at page:

Download "Learning in 3-Player Kuhn Poker"

Transcription

1 University of Manchester Learning in 3-Player Kuhn Poker Author: Yifei Wang 3rd Year Project Final Report Supervisor: Dr. Jonathan Shapiro April 25, 2015

2 Abstract This report contains how an ɛ-nash Equilibrium of 3-Player Kuhn Poker, a simple 3-player hidden-information game, is found. Temporal Difference Learning and Policy Hill-Climbing algorithm have been applied to the sample games of 2-player perfect-information game, 3-player perfect-information game and 2-player hidden-information game respectively and found that Policy Hill-Climbing algorithm is the better one which can be used on a 3-player hidden-information game. When applied this algorithm on 3-Player Kuhn Poker, a Nash Equilibrium is gotten. 1

3 Contents 1 Introduction Overview of Project Overview of Kuhn Poker and 3-Player Kuhn Poker Game Theory Nash Equilibrium ɛ-nash Equilibrium Other Important Concepts in Game Game Tree Strategy Reinforcement Learning in Games Reinforcement Learning Temporal-Difference Learning Policy Hill-Climbing Algorithm WoLF Policy Hill-Climbing Algorithm Algorithm Test Tic-Tac-Toe Gunmen Dilemma Player Kuhn Poker Design and Implementation Structure State AI Game Result and Evaluation Result Evaluation Mathematical Calculation Best Response Comparison Conclusion Achievement Future Works

4 List of Figures 1 Game Tree for (2-Player) Kuhn Poker Game Tree for 3-Player Kuhn Poker One Possible Trace for Tic-Tac-Toe Tic-Tac-Toe board Test Result for Tic-Tac-Toe using TD-learning Game Tree for Gunmen Dilemma The convergence of the strategy of player 1 when using TDlearning in Kuhn Poker The convergence of the strategies generated when using PHC in Kuhn Poker Game Tree for 3-Player Kuhn Poker with Label System diagram of 3-Player Kuhn Poker The convergence of the expected payoff for 3 players in 3-Player Kuhn Poker

5 List of Tables 1 pseudo code of Temporal Difference Learning pseudo code of Policy Hill-Climbing pseudo code of WoLF Policy Hill-Climbing Test Result of Tic-Tac-Toe using TD-learning test result of Gunmen Dilemma (Policy when all alive) test result of Gunmen Dilemma (Simulation results) Strategies in Kuhn Poker Strategies Generated by TD-learning in Kuhn Poker Strategies Generated by WoLF PHC in Kuhn Poker Information in Decision Nodes of 3-Player Kuhn Poker Information in Decision Nodes of 3-Player Kuhn Poker (Simplified) pseudo code of game process of 3-Player Kuhn Poker pseudo code of learning process of 3-Player Kuhn Poker Strategies Generated by WoLF PHC in 3-Player Kuhn Poker Formula table for the probability to reach ending node expected payoff for each player Equilibrium of 3-Player Kuhn Poker Expected Payoff and Best Response Payoff

6 1 Introduction 1.1 Overview of Project This aim of this project is to use machine learning to develop an AI which can play 3-Player Kuhn Poker well. And my aim is to develop the AIs which will play the strategies that lead the conditions to Nash Equilibrium. However, as using machine learning would not give a exact result, so an ɛ-nash Equilibrium in which ɛ < 0.01 should be a proper ending for this project. Developing AI for games can provide some useful information, such as how to maximize player s payoff in gambling and whether the game is a fair game or a banker-favoured game. 1.2 Overview of Kuhn Poker and 3-Player Kuhn Poker Kuhn Poker is a simplified poker game introduced by Harold W. Kuhn [7]. Kuhn Poker is usually used when testing some learning algorithms as it has a quite simple set of rules: Each player antes 1 chip before the cards are dealt. Each player deals one card from the deck. The deck has 3 cards with ranking. If there is no outstanding bet, a player can check or bet 1 chip. If there is an outstanding bet, a player can fold or call. Betting round ends when there is an outstanding bet and both player has made a decision to join this bet or not, or both player has checked. If there is a bet, betting players show their cards and the one held the largest win all chips in the pot.(thus if a player fold, then the other bet player holds the largest card and get all chips in the pot) If there is no bet, both players show their cards and the one held the largest win all chips. Like other poker games that usually played, Kuhn Poker requires players to bet, check, fold and call, and the players who bet need to compare their cards at last. However, in order to simplify the rules, each player can only hold 1 card and the comparison is just based on the point on the card, so that players do not have to apply complex rules to check the value of their cards. And for Kuhn Poker, only one round of betting is allowed, one player cannot bet more, which decrease the time of the game. And 3-Player Kuhn Poker applies a similar rule: 5

7 Each player antes 1 chip before the cards are dealt. Each player deals one card from the deck. The deck has 4 cards with ranking. If there is no outstanding bet, a player can check or bet 1 chip. If there is an outstanding bet, a player can fold or call. Betting round ends when there is an outstanding bet and next player bet before or there is no outstanding bet and next player checked before. If there is a bet, betting players show their cards and the one held the largest win all chips in the pot. If there is no bet, all players show their cards and the one held the largest win all chips. For convenience, 2-Player Kuhn Poker would use a deck contained J, Q and K with K > Q > J and 3-Player Kuhn Poker would use a deck contained J, Q, K and A with A > K > Q > J in this report. Both 2-Player Kuhn Poker and 3-Player Kuhn Poker are just models, that nobody would play in the real world. They are usually used to do some algorithm tests on hidden-information game in game theory, as their rules are quite simple. Theory or algorithm applied to hidden-information game should also applied to 2-Player or 3-Player Kuhn Poker, so the premise of a theory or algorithm applied to all hidden-information game is applied to Kuhn Poker. And 3-Player Kuhn Poker is also the one of the games in Annual Computer Poker Competition[1] in 2014 and 2015, which encourages researchers to solve this game. 2-Player Kuhn Poker has already been solved by Harold W. Kuhn, it has a Nash Equilibrium strategy for both players.[7] For 3-Player Kuhn Poker, the best attempt until now is done by Risk and Szafron, they get the strategies that would lead to a ɛ-nash equilibrium with ɛ = [9] Besides Kuhn Poker, Leduc Holdem and Texas Holdem are also the simple hidden-information games which can be used to test. And they are much more complex than Kuhn Poker, algorithm that performs well on Kuhn Poker well could have a poor performance.[9] 6

8 2 Game Theory 2.1 Nash Equilibrium Game in game theory has 7 main elements: players, actions, information, strategies, payoffs, outcomes and equilibria.[8] In a game, players are the individuals who make decisions, and their goal is to make their payoff as large as possible. So they should adjust their strategies to make the expected payoff larger, and finally, players would lead the game to a situation that none of the player can improve their payoff by changing their strategies. This situation is so called a Nash Equilibrium. In other word, Nash Equilibrium is the strategy set (s 1, s 2, s 3,..., s k ) for k players, that no player has incentive to deviate from his strategy given that the other players do not deviate.[8] It is also possible to represent it in mathematical expression: i, π i (s i, s i ) π i(s i, s i ), s i [8] Where, π i is the payoff got by player i; s i and s i are the strategies followed by player i and his opponent. 2.2 ɛ-nash Equilibrium In some games, it is not possible to find the strategies that lead to a Nash Equilibrium, but we can find the strategies that would lead the game to a nearly Nash Equilibrium. That is no one can improve their payoff by ɛ, by changing their strategies. This is so called ɛ-nash Equilibrium. When represent it in mathematical expression: i, π i (s i, s i ) + ɛ π i(s i, s i ), s i Obviously, when ɛ = 0, this is a Nash Equilibrium. And the aim of this project is to find a ɛ-nash Equilibrium for 3-Player Kuhn Poker in which ɛ < Other Important Concepts in Game Zero-sum game is the game that the sum of payoffs of all players is zero whatever the strategies they choose. A game that is not zero-sum is called non-zero-sum game.[8] 7

9 Hidden-information game is the game that at least one of the players do not know all the information in the game. A game that all information is known by each player is Perfect information game. The information set of a player in a hidden-information game contains all possible states that the player cannot distinguished with the direct observation. Nature is a non-player actor who will take actions with specified probabilities at some specified points.[8] For a card game, nature is often the deck, which will do the dealing process with a uniform distribution. As the rules described in 1.2, 3-Player Kuhn Poker is a zero-sum hiddeninformation game as the sum of the payoffs of 3 players is always 0 and players do not know the card that others holds. 2.4 Game Tree A game tree is a tree that would contain all possible strategies combinations in a game. The node in the game is a point that either players or nature would take actions. And the node which has no successors is an ending node where the game ends and give payoffs to all players. Figure 1: Game Tree for (2-Player) Kuhn Poker For the game trees of 2-Player and 3-Player Kuhn Poker, the branches of different cards holding combinations share a same structure, thus the nodes in the same position of the branches that player holds the same card construct an information set of this player. 8

10 Figure 2: Game Tree for 3-Player Kuhn Poker We can get much information from the game tree. For example, from the game tree of 3-Player Kuhn Poker, we can infer that in such game, each player has at most 2 decision nodes in one single round. And there are 13 different ending nodes in one card combinations. As there are 24 different cards combinations, so there are = 312 different ending nodes in 3-Player Kuhn Poker. It is really a large amount compared to 2-Player one, which has only 6 5 = 30 different combinations. Game tree is a common tool used to represent a game, this will make the process of the game clear. When game tree is combined with the payoffs at each ending node, it will become the extensive representation of the game. Most games described in this report will be shown with its game tree. 2.5 Strategy Strategy is a rule that tells the player which action should be chosen at each instant of the game.[8] In some solved game, it is possible for a player to do a same action under the same situation. For example, in Tic-Tac-Toe, player 1 can always choose the center at the beginning and he won t lose. This kind of strategy is called pure strategy. However, in some hidden-information games, like poker games. If you always 9

11 do the same action, your opponent may guess what the card you are holding now. So player need to bluff the opponent by choosing his action based on a probability distribution, meaning that even in the same state, a player may choose different actions so that the opponent cannot infer the card holding by the actions performed. This kind of strategy is called mixed strategy. And, of course, to play 3-Player Kuhn Poker, a mixed strategy is required. 10

12 3 Reinforcement Learning in Games 3.1 Reinforcement Learning Reinforcement learning is an area of the machine learning that usually used in game theory research. Unlike supervised learning, that machine is learning from the examples given by a supervisor. The performance would depend on the qualities of the examples. And supervised learning usually perform bad in interactive problems. In reinforcement learning, machine would adjust its actions based on the environment and how it perform before in the same situation. So for interactive problem, like card games, reinforcement learning is more effective.[10] There are 4 main element in reinforcement learning: a policy, a reward function, a value function and a model of the environment. A policy defines the learning agent s way of behaving at a given time.[10] For game, a policy is the probability distribution of different actions at a given decision node. And all this policies for one player construct the strategy of this player. A reward function defines the goal in a reinforcement learning problem.[10] For game, it suggests the payoff of a player when given the current states and the actions that player has done. A value function specifies what is good in the long run. The value of a state is the total amount of the award an agent can expect to accumulate over the future, starting from that state. [10] Generally speaking, the value of the state shows how much payoff the player is expected to get finally if this player visits this state. A model of the environment is something that mimics the behavior of the environment. For example, when given a state and ac action, the model should predict the resultant next state and next reward.[10] In game, that is the game rule. From the game rule, we can easily predict the next state and reward if there exists. However, model is an option element for a Reinforcement Learning algorithm, not all Reinforcement Learning method need a model to tell it how the game works, meaning that some methods just use the experience to learn what is the best method to solve the problem. In reinforcement learning of game, a large amount of rounds of game would be played and then update the policies of the players in all decision nodes by adjusting the value function to make more accurate prediction. So that players would try to get the largest payoff in each node and the Nash Equilibrium would achieve. 11

13 3.2 Temporal-Difference Learning Temporal-Difference Learning(TD-Learning) is a common algorithm used in Reinforcement Learning. It takes the idea from both Monte Carlo and dynamic programming.[10] TD-learning does not need the model, it would adjust the value function by accepting the final payoff. The value V (s) at state s would be updated by the following formula: V (s) = V (s) + α[r + γv (s ) V (s)] Where s is the next state, r is the payoff (usually 0 if not the final state),α is the learning rate which in range [0, 1], γ is the discount factor which suggests how import the value of the next state is, also in range [0, 1]. The pseudo code during the learning phrase is shown below: for all state s do V (s) = 0 end for repeat do self play with the policy calculated by states and get all visited states S and reward r for all s in S do V (s) := V (s) + α[r + γv (s ) V (s)] end for until enough rounds of self-play Table 1: pseudo code of Temporal Difference Learning A naive method to deal with the policy is always choose the state that has the highest value from all possible next states. However, it will result in a pure strategy. One appropriate approach is to use Boltzmann distribution to deal with the policy: π(s) = e V (s)/τ ns i=1 ev (s i )/τ Where π(s) is the probability to choose s as the next state, n s is the number of all possible next states, τ is a constant. So that machine would choose the state that has a high value with a high probability, but also has a chance to choose the state that has a small value. However, even using distribution, there is still no guarantee that it will output a mixed strategy. 12

14 3.3 Policy Hill-Climbing Algorithm Policy Hill-Climbing Algorithm (PHC) is an advanced form of Q-learning introduced by Micheal Bowling and Manuela Veloso.[3] In order to get a mixed strategy, PHC does not calculate the policy by using values of the possible next states. PHC would match the values to actions, meaning the value in PHC means the expected payoff would get when doing this action. And there is also a probability attached with the action, probabilities of all possible actions in one state would have a sum of 1. So that PHC would output a mixed strategy. The pseudo code of PHC is shown below: Let α and δ be learning rates. for all state s do for all action a do Q(s, a) := 0 π(s, a) := 1 A i end for end for repeat a. From state s select action a with probability π(s, a) with some exploration b. Observing reward r and next state s Q(s, a) := (1 α)q(s, a) + α(r + γmaxq(s, a )) a c. Update π(s, a) and constrain it to a legal probability distribution { δ ifa = arg maxa Q(s, a ) π(s, a) := π(s, a) + δ A i 1 otherwise until enough rounds of self-play Table 2: pseudo code of Policy Hill-Climbing From the pseudo code, we can find that PHC uses a similar way to update the values as TD-Learning does. And each time, PHC would move its policy to choose the action that has a largest value more. 3.4 WoLF Policy Hill-Climbing Algorithm In testing, PHC does not perform well, it need quite a long duration for policy to converge. So Bowling and Veloso introduced an advanced version of PHC, WoLF Policy Hill-Climbing Algorithm.[3] WoLF(Win-or-Learn-Fast) principle is quite simple, it requires agent to learn 13

15 quickly while losing and slowly while winning. It is not suitable to use game rule to decide whether an agent is losing or not, as lots of games are unfair game, some players may be at disadvantage. So WoLF Policy Hill-Climbing would calculate the average policy, and if an agent gets more payoff than playing the average payoff, we would consider this agent is wining and vice versa. And the pseudo code of WoLF PHC is shown below: Let α, δ l > δ w be learning rates. for all state s do for all action a do Q(s, a) := 0 π(s, a) := 1 A i C(s) := 0 end for end for repeat a. From state s select action a with probability π(s, a) with some exploration b. Observing reward r and next state s Q(s, a) := (1 α)q(s, a) + α(r + γmax a Q(s, a )) c. Update estimate of average policy π C(s) := C(s) + 1 a A i, π(s, a ) := π(s, a ) + 1 C(s) (π(s, a ) π(s, a )) d. Update π(s, a) and constrain it to a legal probability distribution. { δ ifa = arg maxa Q(s, a ) π(s, a) := π(s, a) + δ A i 1 otherwise where, { δ w ifσ a π(s, a)q(s, a) > Σ a π(s, a)q(s, a) δ = δ l otherwise until enough rounds of self-play Table 3: pseudo code of WoLF Policy Hill-Climbing 14

16 4 Algorithm Test In order to choose the suitable algorithm, some tests should be done on other games. The 3 tests here is to show whether the algorithm can be applied to 2- player perfect information game, 3-player perfect information game and 2-player hidden-information game and perform well. However, there should also be a test to show if the algorithm can be applied to 3-player hidden-information game, the type of 3-Player Kuhn Poker. But it is hard to find a suitable game to do such test, as 3-Player Kuhn Poker is one of the simplest 3-Player hidden-information game. 4.1 Tic-Tac-Toe Tic-Tac-Toe (or Noughts and Crosses) is a simple 2-player game. Two players would draw O and X in a 9-cell board in turn, and the one who makes 3 of his symbols in a line wins. Figure 3: One Possible Trace for Tic-Tac-Toe As Tic-Tac-Toe has already been solved, so the test here is to have a look whether the algorithm would output the optimal strategy or not, that is to occupy the center first and then the corner for player 1 and occupy the corner first and then block player 1 for player 2. If both player play the optimal strategies, they would have a draw and that is the Nash Equilibrium for Tic- Tac-Toe. When using TD-learning, the first thing is to decide how many states is there in this game. As there are 9 cells in the board of Tic-Tac-Toe, and each of them can be empty, occupied by player 1 or occupied by player 2. So there are at most 3 9 = states in this game. Although there are some states that are not reachable in this game, such as all cells occupied by player 1. However, for convenience to find a specified state in Figure 4: Tic-Tac-Toe board all states and this amount is still acceptable, I use all those states. Although it is also important about who is the next one to choose cell, however, this can be inferred from the numbers of O and X on the board, so state could not include this. When use the number to show the cell in the board as shown in Figure 4. The part of test result for using TD-learning is shown on the next page (Suppose 15

17 Cell Probability of 1st step (Player 1) each player would choose the one has the highest probability): Probability of 1st step (Player 2) Probability of 2nd step (Player 1) Probability of 2nd step (Player 2) Probability of 3rd step (Player 1) Probability of 3rd step (Player 2) Probability of 4th step (Player 1) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Table 4: Test Result of Tic-Tac-Toe using TD-learning Probability of 4th step (Player 2) From the test data, we can find that player 1 almost always choose cell 5, the center of the board, on the first step. And if player 1 do so, player 2 has a high probability to choose cell 7 and cell 9, which are the corners of the board. And then player 1 would choose another corner and then 2 of player 1 s symbol has been in 1 line. And player 2 has the only choice to block it. And then both of them would block each other until the 4th step. And the following figure shows the whole flow of the game if both of them choose the highest probability at each step. Figure 5: Test Result for Tic-Tac-Toe using TD-learning And as shown they would finally draw. It is also the Nash Equilibrium of this game. Which means that when using TD-learning to deal with Tic-Tac-Toe would make both player play the optimal strategies and finally lead to the Nash Equilibrium. 4.2 Gunmen Dilemma Gunmen Dilemma (or Truel) is a duel participated by three players (say A, B and C ). A truel can have many different rules, the rule used in this section is a sequential (fixed order) rule with some extra rules[6]: The players fire one at a time in a fixed, repeating sequence, such as A, B, C, A, B, C, A, B,... 16

18 If it is a dead player s turn to shoot, then skip it. Players may shoot at others or shoot at air. Each player has infinite bullets. The hit rates of A, B and C are 1 3, 1 2 and 1 respectively. To avoid infinite game, C, the one who never misses, is not allowed to shoot at air. The player who survives at last is the winner. Note: this game is not so violent as described. Figure 6: Game Tree for Gunmen Dilemma From the game tree above, it can be found that this game is very complex and many states can be converted to each other. The reason why Gunmen Dilemma is chosen as the 3-Player Perfect Information Game is that it has nature, that is to check whether a player s shoot hits or not. So that besides checking an algorithm can be used on 3-player game, it can also check if this algorithm can be used on the game has nature. In this game, A has the highest probability to survive if he uses the optimal strategy, that is to shoot at the air first.[12] A simple proof of the optimal strategy is here[2]: 17

19 a. If A shoots at B and hits, then C can just shoots at A and win this game. b. If A shoots at C and hits, then it becomes the duel between A and B with B shoots first. c. So if A decides to shoot, then shoot at C is a better choice. d. A shoots at air and shoots but misses would both lead to the same situation, that is still 3 players alive with B shoots first. If B decides to shoot at someone, then A would not be a first choice as A has a low hit rate with the reason similar as c. (a) If B shoots at C and misses, then it would becomes the duel between A and B with A shoots first. (b) If B misses or shoots at air, then it is C s turn to shoot. With the reason similar as c., C would shoot at B and hits. And then it will become a duel between A and C with A shoots first. e. Compare results between b. and d., A would choose d. as in a duel the first one to shoot would have a huge advantage. So the test for this game is to see if an algorithm can produce the optimal strategy for A (i.e. shoots at air first) and A can have the highest probabilities to win in this game. Just like what has been done in Tic-Tac-Toe, it still need to determine the amount of states and what should be included in a state. From the game tree, it can be found that there are totally 9 states. To list who are alive and who is the next one to shoot is a possible choice to specify these 9 states. And the result of applying TD-Learning is shown below: Target Probability for A Probability for B Probability for C A N/A B 0.00 N/A 1.00 C N/A air N/A Table 5: test result of Gunmen Dilemma (Policy when all alive) We can find that, in this strategy, A would always shoot at the air when the probability is rounded to 2 significant figures. 18

20 And if use the strategies produced by TD-learning to do the simulation for rounds, the statistical data of winners is shown below: A B C Wining Times Table 6: test result of Gunmen Dilemma (Simulation results) Toral and Amengual suggest the formulas to calculate the winning probabilities of each player if given their hit rate.[12] When use the hit rate in this example(i.e. 1 3, 1 2 and 1), we can get the winning probabilities of A, B and C are 0.417, and respectively, which are quite close to the data in table 6. Those results mean that TD-learning can be applied to Gunmen Dilemma to get an optimal strategy. So TD-learning can be used to a 3-player perfect game with nature Player Kuhn Poker The rule of the 2-Player Kuhn Poker has been described in 1.2, and it is a 2-Player Hidden Information game. As Kuhn has determined the equilibrium strategies for 2 players [7] [5], so the test here is to see whether an algorithm can generate a mixed-strategy for Nash Equilibrium or not in a 2-player hiddeninformation game and would whether the strategy would converge or not. Bet with K Call with K Bet with Q Call with Q Bet with J Call with J Player 1 γ 1 0 β α 0 Player η 0 ξ 0 Table 7: Strategies in Kuhn Poker Kuhn suggests that the strategy for player 1 can be represented by 3 parameters and player 2 by 2 parameters as shown above. And the equilibrium strategy for player 1 is that (α, β, γ) = (γ/3, (1 + γ)/3, γ) and the equilibrium for player 2 is (ξ, η) = (1/3, 1/3). Fig. 1 shows that there are 8 different states in one cards holding combination. Since there are 6 different cards holding combinations, so the state amount in 2-Player Kuhn Poker is 8 6 = 48. However, as it is a hidden-information game, player would not know the exact state that he is in. So the state used in TDlearning would be different from the real state, the states used should be the information sets for each player, and thus there are 8 3 = 24 states for each player. And one state should contains the information of the the card held by this player, whether each player is in the bet or not and who is the next one to make decision. 19

21 When using TD-learning, the strategies generated is shown below: Bet with K Call with K Bet with Q Call with Q Bet with J Call with J Player Player Table 8: Strategies Generated by TD-learning in Kuhn Poker And this is very far from the Nash Equilibrium strategy, which means TDlearning could not produce a proper mixed-strategy for hidden-information game. And the line chart below shows the convergence of the strategy of player 1 in the training process. Figure 7: The convergence of the strategy of player 1 when using TD-learning in Kuhn Poker This shows that the strategy does not converge at all, almost every probability keeps shaking during the learning process. It provides another evidence that TD-learning is not suitable for a hidden-information game. Because PHC is an algorithm used to generate mixed-strategy, the following results are generated by PHC(for quick converge, WoLF PHC is used). Bet with K Call with K Bet with Q Call with Q Bet with J Call with J Player Player Table 9: Strategies Generated by WoLF PHC in Kuhn Poker 20

22 The strategies generated here are very closed to the Nash Equilibrium strategy stated above. And the following line charts show the convergence of the strategies. (a) Player 1 (b) Player 2 Figure 8: The convergence of the strategies generated when using PHC in Kuhn Poker 21

23 As shown, α(bet with J for player 1), β(call with Q for player 1), γ(bet with K for player 1), ξ(bet with J for player 2) and η(bet with Q for player 2) clearly converge. So that PHC can be used for 2-Player Hidden Information Game to generate a Nash Equilibrium mixed strategy. And thus PHC will be the algorithm used in 3-Player Kuhn Poker implementation. 22

24 5 Design and Implementation 5.1 Structure To implement a game, at least 3 classes are required, State class, AI class and Game class. State class will be used for creating a State object and dealing the simple problem based on State, such as the winner of the game and the final payoff for each player. AI class will be used for creating AI for each player, and contains methods of value and policy updating and moving based on the policy. Game class is the main class to deal with the main process of the game and the training process. 5.2 State Just as the work in Tic-Tac-Toe, Gunmen Dilemma and 2-Player Kuhn Poker, in the implementation, the first thing to do is to decide how many states should be considered for each player and what information should be contained for each state to identify them. So come back to the game tree of 3-Player Kuhn Poker and label each decision node as following: Figure 9: Game Tree for 3-Player Kuhn Poker with Label Node I to Node IV are the decision nodes for player 1, and V to VIII for player 2, IX to XII for player 3. 23

25 All information for each player in this game include card holding, whether bet before, whether check before, whether fold before and whether call before. As card holding decide the branch of the game tree, and all 12 nodes stated before are in the same branch, so card holding must be in the state in this game. And for each branch, the table below shows the other information in each decision node. Player 1 Player 2 Player 3 Node Bet Check Call Fold Bet Check Call Fold Bet Check Call Fold I F F F F F F F F F F F F II F T F F F T F F T F F F III F T F F T F F F F F T F IV F T F F T F F F F F F T V F T F F F F F F F F F F VI F T T F F T F F T F F F VII F T F T F T F F T F F F VIII T F F F F F F F F F F F IX F T F F F T F F F F F F X F T F F T F F F F F F F XI T F F F T F F F F F F F XII T F F F F T F F F F F F Table 10: Information in Decision Nodes of 3-Player Kuhn Poker As shown, each state has a different combination of the 12 Boolean variables, so that these 12 Boolean variables can be used to identify the state. However, different from TD-learning, in PHC, each player would not care about all the states in the game, he would only consider the decision nodes that he would visit. So in PHC, it is enough for a state object to contain the information to just identify the states that a player can meet. In this case, that is to identify I to IV, V to VIII, IX to XII. So state in PHC is not necessary to contain all these 12 Boolean variables. To simplify the state, many Boolean variables can be merged together. The merge method used here is to use only one Boolean to represent one player if he is in a bet or not(i.e. has bet or called before). And then the information in each decision node in one branch is shown on the next page: And as shown, it is enough to identify the states that each player can meet as each of these 4 states has a different combination of these 3 Boolean variables. So for a state object in 3-Player Kuhn Poker, it will contain the cards holding of 3 players and whether each player is in a bet or not. 24

26 Node Player 1 in Bet Player 2 in Bet Player 3 in Bet I F F F II F F T III F T T IV F T F V F F F VI T F T VII F F T VIII T F F IX F F F X F T F XI T T F XII T F F Table 11: Information in Decision Nodes of 3-Player Kuhn Poker (Simplified) 5.3 AI Each player in the game should have an AI object, which stores the strategy of this player. Thus, it should contain all the possible states that this player can meet and together with the policy at this state. For PHC, the visit times and the values and the probabilities of all possible actions should be contained. Following text is an example of a game state with its policy: false false false! Which means that player 1 holds the lowest card(i.e. Jack), and he does not know what player 2 and player 3 are holding. None of them are in the bet now. This state has been visited for times. Player 1 has 2 actions in this state, that is to bet and to check. Bet has the value of , and the probability to bet is 0.000, average probability is also Check has the value of , and the probability to bet is 1.000, average probability is also Game The game class is the main class of this program, it contains the main process of the game and the training process. The table on the next page is the pseudo code of the game process: Learning phrase is similar with the game process expect that the states visited and the actions done should be stored, which is also shown on the next page: 25

27 Initialise the state of the game. Shuffle the deck. Each player draw one card from the deck. while game is not finished do Based on the strategy of the current player, set him bet or not. Set the next player as the current player. end while Table 12: pseudo code of game process of 3-Player Kuhn Poker Create 6 arrays to store the states and actions of 3 players Initialise the state of the game. Shuffle the deck. Each player draw one card from the deck. while game is not finished do Add the player view of the current state to the current player s state array. Based on the strategy of the current player, set him bet or not. Add that action to the current player s action array. Set the next player as the current player. end while Based the states and actions, update the strategy of each player Table 13: pseudo code of learning process of 3-Player Kuhn Poker Overall system diagram of the whole program is shown below (then name of the game is used as the name of game class, that is Kuhn): Figure 10: System diagram of 3-Player Kuhn Poker 26

28 6 Result and Evaluation 6.1 Result The following table shows the strategy generated by the WoLF PHC(1st Node for Player 1, 2 and 3 is Node I, V and IX respectively on Figure 9 and 2nd Node is Node II, VI and X. etc.): Player 1 Player 2 Player 3 Bet/Call at 1st Node with J Bet/Call at 1st Node with Q Bet/Call at 1st Node with K Bet/Call at 1st Node with A Bet/Call at 2nd Node with J Bet/Call at 2nd Node with Q Bet/Call at 2nd Node with K Bet/Call at 2nd Node with A Bet/Call at 3rd Node with J Bet/Call at 3rd Node with Q Bet/Call at 3rd Node with K Bet/Call at 3rd Node with A Bet/Call at 4th Node with J Bet/Call at 4th Node with Q Bet/Call at 4th Node with K Bet/Call at 4th Node with A Table 14: Strategies Generated by WoLF PHC in 3-Player Kuhn Poker And based on the strategies, we can easily calculate the expected payoff for each player in one single round of the game by the following formula: p i = c 1=0 c 2=0 c ( 13 3=0 N=1 P (c 1, c 2, c 3, N) p i (c 1, c 2, c 3, N)) In which, p i is the expected payoff for player i. p i (c 1, c 2, c 3, N) is the expected payoff for player i at the ending node N with player 1 holds c 1, player 2 holds c 2 and player 3 holds c 3. P (c 1, c 2, c 3, N) is the probability to reach ending node N with these holdings, which is calculated by the table on next page: In the table, P (N, c i ) is the probability for player i to bet/call at Node N with holding the card c i, which can be found at table 14. And the expected payoff calculated for each player is shown on the next page as well: 27

29 Ending Node Formula 1 (1 P (I, c 1 )) (1 P (V, c 2 )) (1 P (IX, c 3 )) 2 (1 P (I, c 1 )) (1 P (V, c 2 )) P (IX, c 3 ) P (II, c 1 ) P (VI, c 2 ) 3 (1 P (I, c 1 )) (1 P (V, c 2 )) P (IX, c 3 ) P (II, c 1 ) (1 P (VI, c 2 )) 4 (1 P (I, c 1 )) (1 P (V, c 2 ) P (IX, c 3 ) (1 P (II, c 1 )) P (VII, c 2 ) 5 (1 P (I, c 1 )) (1 P (V, c 2 ) P (IX, c 3 ) (1 P (II, c 1 )) (1 P (VII, c 2 )) 6 (1 P (I, c 1 )) P (V, c 2 ) P (X, c 3 ) P (III, c 1 ) 7 (1 P (I, c 1 )) P (V, c 2 ) P (X, c 3 ) (1 P (III, c 1 )) 8 (1 P (I, c 1 )) P (V, c 2 ) (1 P (X, c 3 )) P (IV, c 1 ) 9 (1 P (I, c 1 )) P (V, c 2 ) (1 P (X, c 3 )) (1 P (IV, c 1 )) 10 P (I, c 1 ) P (VIII, c 2 ) P (XI, c 3 ) 11 P (I, c 1 ) P (VIII, c 2 ) (1 P (XI, c 3 )) 12 P (I, c 1 ) (1 P (VIII, c 2 )) P (XII, c 3 ) 13 P (I, c 1 ) (1 P (VIII, c 2 )) (1 P (XII, c 3 )) Table 15: Formula table for the probability to reach ending node Player 1 Player 2 Player 3 Expected Payoff Table 16: expected payoff for each player Figure 11: The convergence of the expected payoff for 3 players in 3-Player Kuhn Poker For convergence, it is not a good idea to plot all these 48 variables to a line chart. So the alternative method to plot the expected payoff of 3 players on 28

30 the line chart. If the strategies converge, then the expected payoffs should also converge. And if the expected payoffs converge, then there is a high probability that the strategies converge as well. The line chart on the previous page shows the convergence of expected payoffs: 6.2 Evaluation Have a look at the expected payoff, we can find that player 3 has an advantage in this game, player 3 is the only one that has the positive expected payoff. It is sensible in 3-Player Kuhn Poker as player 3 is the last one to make decision so that player 3 may know more things from the actions of the other player than player 1 and player 2. However, unlike Tic-Tac-Toe, Gunmen Dilemma and 2-Player Kuhn Poker. The equilibrium or the optimal strategy for 3-Player Kuhn Poker is unknown. So it is hard to check if this strategy is the Nash Equilibrium strategy or not. The following text would suggest different ways of verifying that Mathematical Calculation One way to check a strategy is to just calculate the strategy. As said above, we can calculate the expected payoff for each player if we know the policy for every player on every decision node. So if make all these policy parameter as variables. Then to reach Nash Equilibrium, it is an Optimization problem to maximum the expected payoff for each player. Szafron, Gibson and Sturtevant have calculated the Nash Equilibrium of 3-Play Kuhn Poker using a pure mathematical method.[11] They suggest that one kind of Nash Equilibrium strategies should obey the rules on the next page: In which, β = max(b 11, b 21 ) When substitute the strategy in table 14 to table 17. We can find that expect b 23 = , c 41 = 1 0 and c 33 = 0 < 0.181, all other variables satisfies the rule in table 17. So, it is not exactly the Nash Equilibrium Best Response Comparison As said on the previous section, the strategy generated is not a Nash Equilibrium. However, as it is quite close to the equilibrium point, would it be an ɛ-nash Equilibrium, the aim of this project? And if yes, what is the value of ɛ? The following method would solve these problems. Risk and Szafron suggest a way to check if a strategy is ɛ-nash Equilibrium 29

31 c min( 1, Player 1 Player 2 Player 3 Bet/Call at 1st Node with J a 11 = 0 b 11 b 21, if c 11 = 2 b11 3+2b 11+2b 21 ) Bet/Call at 1st Node with Q a 21 = ; b ,otherwise 2 1 b 21 4, if c 11 = c 21 = 1 2 c 11 0 ; b 21 = b 11, if 0 < c 11 < 1 2 ; b 21 min(b 11, 1 2 2b 11 ), otherwise Bet/Call at 1st Node with K a 31 = 0 b 31 = 0 c 31 = 0 Bet/Call at 1st Node with A a 41 = 0 b 41 = 2b b 21 c 41 = 0 Bet/Call at 2nd Node with J a 12 = 0 b 14 = 0 c 12 = 0 Bet/Call at 2nd Node with Q a 22 = 0 b 24 = 0 c 22 = 0 Bet/Call at 2nd Node with K a 32 = 0 b 34 = 0 c 32 = 0 Bet/Call at 2nd Node with A a 42 = 1 b 44 = 1 c 42 = 1 Bet/Call at 3rd Node with J a 14 = 0 b 13 = 0 c 14 = 0 Bet/Call at 3rd Node with Q a 24 = 0 b 23 max(0, b11 b12 2(1 b ) 21) c 24 = 0 Bet/Call at 3rd Node with K a 34 = 0 b 33 = (b 11 + b 21 ) + 0 c 34 1 β 2 b 23(1 b 21 ) Bet/Call at 3rd Node with A a 44 = 1 b 43 = 1 c 44 = 1 Bet/Call at 4th Node with J a 13 = 0 b 12 = 0 c 13 = 0 Bet/Call at 4th Node with Q a 23 = 0 b 22 = 0 c 23 = 0 Bet/Call at 4th Node with K a 33 = 1 2 b (b 11 + b 21 ) + β b 32 < c 33 < 1 2 b (b 11 + b 21 ) + β 4 Bet/Call at 4th Node with A a 43 = 1 b 42 = 1 c 43 = 1 Table 17: Equilibrium of 3-Player Kuhn Poker or not and what ɛ is.[9] That is to calculate a best response strategy against strategies of opponent. A best response strategy, as it name suggests, is the strategy that would maximum this player s payoff against the strategies of opponent. Assume applying best response strategy would increase the payoff of this player by δ if the other player fix their strategies, and δ 1, δ 2 and δ 3 be the increment of payoff for player 1, player 2 and player 3 respectively. Then the origin strategies for 3 players should form an ɛ-nash Equilibrium, where ɛ = max(δ 1, δ 2, δ 3 ) One possible method to calculate the best response strategy is to fix all other 32 variables of other 2 players and then consider it as an optimal problem which has 16 variables. It would generate the expected payoff exactly, however, it would also take a long time to produce the result. One other method is to check all the pure strategies. As there are a large amount of variables in 3-Player Kuhn Poker, so all pure strategies would cover a large area of all the possible payoffs. And the strategy that can increase the expected 30

32 payoff largest in all the pure strategies could be regarded as a approximate best response strategy. And the following table shows the expected payoff and the expected payoff when play the best response strategy and the other fix their strategies: Player 1 Player 2 Player 3 Expected Payoff Best Response Payoff Difference Table 18: Expected Payoff and Best Response Payoff As shown, the largest difference between expected payoff and best response payoff is So we can say that this would lead to an ɛ-nash Equilibrium, with an ɛ = 0.003, which is less than the upper limit stated in

33 7 Conclusion 7.1 Achievement The largest achievement in this project is that an ɛ-nash Equilibrium with ɛ = of 3-Player Kuhn Poker has been found. And this is very closed to the best attempt of 3-Player Kuhn Poker, which is done by Risk and Szafron, who got an ɛ-nash Equilibrium with ɛ = [9] Besides that, the optimal strategies of Tic-Tac-Toe, Gunmen Dilemma, 2-Player Kuhn Poker and other games have also been found by applying Temporal Difference Learning and Policy Hill-Climbing algorithm on them. 7.2 Future Works In Policy Hill-Climbing algorithm, there are many factors that might influence the result, including the discount factor stated in table 3 and the exploration factor in ɛ-greedy strategy. The discount factor used to get the result in 6.1 is 1, which means that all payoff should count, and the exploration factor used is Some works could be done to find the relationship between these factor and the result. Besides that, some works can also been done on Counterfactual Regret Minimization, the algorithm used by University of Alberta[4], the winner of 3- Player Kuhn Poker on Annual Computer Poker Competition on 2014[1]. 32

34 References [1] Annual computer poker competition. computerpokercompetition.org/. Accessed on [2] Three way duel problem(gunmen dilemma). three-way-duel/. Accessed on [3] Michael Bowling and Manuela Veloso. Rational and convergent learning in stochastic games. In International joint conference on artificial intelligence, volume 17, pages LAWRENCE ERLBAUM ASSO- CIATES LTD, [4] Richard Gibson. Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents. PhD thesis, University of Alberta, [5] Bret Hoehn, Finnegan Southey, Robert C Holte, and Valeriy Bulitko. Effective short-term opponent exploitation in simplified poker [6] D Marc Kilgour and Steven J Brams. The truel. Mathematics Magazine, pages , [7] Harold W Kuhn. A simplified two-person poker. Contributions to the Theory of Games, 1:97 103, [8] E. Rasmusen. Games and Information: An Introduction to Game Theory. Blackwell, [9] Nick Abou Risk and Duane Szafron. Using counterfactual regret minimization to create competitive multiplayer poker agents. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1, pages International Foundation for Autonomous Agents and Multiagent Systems, [10] Richard S Sutton and Andrew G Barto. Introduction to reinforcement learning. MIT Press, [11] Duane Szafron, Richard Gibson, and Nathan Sturtevant. A parameterized family of equilibrium profiles for three-player kuhn poker. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, pages International Foundation for Autonomous Agents and Multiagent Systems, [12] Raúl Toral and Pau Amengual. Distribution of winners in truel games. arxiv preprint cond-mat/ ,

CS221 Final Project Report Learn to Play Texas hold em

CS221 Final Project Report Learn to Play Texas hold em CS221 Final Project Report Learn to Play Texas hold em Yixin Tang(yixint), Ruoyu Wang(rwang28), Chang Yue(changyue) 1 Introduction Texas hold em, one of the most popular poker games in casinos, is a variation

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

Fictitious Play applied on a simplified poker game

Fictitious Play applied on a simplified poker game Fictitious Play applied on a simplified poker game Ioannis Papadopoulos June 26, 2015 Abstract This paper investigates the application of fictitious play on a simplified 2-player poker game with the goal

More information

CS510 \ Lecture Ariel Stolerman

CS510 \ Lecture Ariel Stolerman CS510 \ Lecture04 2012-10-15 1 Ariel Stolerman Administration Assignment 2: just a programming assignment. Midterm: posted by next week (5), will cover: o Lectures o Readings A midterm review sheet will

More information

ARTIFICIAL INTELLIGENCE (CS 370D)

ARTIFICIAL INTELLIGENCE (CS 370D) Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-5) ADVERSARIAL SEARCH ADVERSARIAL SEARCH Optimal decisions Min algorithm α-β pruning Imperfect,

More information

Using Sliding Windows to Generate Action Abstractions in Extensive-Form Games

Using Sliding Windows to Generate Action Abstractions in Extensive-Form Games Using Sliding Windows to Generate Action Abstractions in Extensive-Form Games John Hawkin and Robert C. Holte and Duane Szafron {hawkin, holte}@cs.ualberta.ca, dszafron@ualberta.ca Department of Computing

More information

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 3 Fall 2010

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 3 Fall 2010 Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 3 Fall 21 Peter Bro Miltersen November 1, 21 Version 1.3 3 Extensive form games (Game Trees, Kuhn Trees)

More information

Optimal Rhode Island Hold em Poker

Optimal Rhode Island Hold em Poker Optimal Rhode Island Hold em Poker Andrew Gilpin and Tuomas Sandholm Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {gilpin,sandholm}@cs.cmu.edu Abstract Rhode Island Hold

More information

Multiple Agents. Why can t we all just get along? (Rodney King)

Multiple Agents. Why can t we all just get along? (Rodney King) Multiple Agents Why can t we all just get along? (Rodney King) Nash Equilibriums........................................ 25 Multiple Nash Equilibriums................................. 26 Prisoners Dilemma.......................................

More information

Strategy Evaluation in Extensive Games with Importance Sampling

Strategy Evaluation in Extensive Games with Importance Sampling Michael Bowling BOWLING@CS.UALBERTA.CA Michael Johanson JOHANSON@CS.UALBERTA.CA Neil Burch BURCH@CS.UALBERTA.CA Duane Szafron DUANE@CS.UALBERTA.CA Department of Computing Science, University of Alberta,

More information

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written

More information

A Brief Introduction to Game Theory

A Brief Introduction to Game Theory A Brief Introduction to Game Theory Jesse Crawford Department of Mathematics Tarleton State University April 27, 2011 (Tarleton State University) Brief Intro to Game Theory April 27, 2011 1 / 35 Outline

More information

final examination on May 31 Topics from the latter part of the course (covered in homework assignments 4-7) include:

final examination on May 31 Topics from the latter part of the course (covered in homework assignments 4-7) include: The final examination on May 31 may test topics from any part of the course, but the emphasis will be on topic after the first three homework assignments, which were covered in the midterm. Topics from

More information

Simple Poker Game Design, Simulation, and Probability

Simple Poker Game Design, Simulation, and Probability Simple Poker Game Design, Simulation, and Probability Nanxiang Wang Foothill High School Pleasanton, CA 94588 nanxiang.wang309@gmail.com Mason Chen Stanford Online High School Stanford, CA, 94301, USA

More information

Programming Project 1: Pacman (Due )

Programming Project 1: Pacman (Due ) Programming Project 1: Pacman (Due 8.2.18) Registration to the exams 521495A: Artificial Intelligence Adversarial Search (Min-Max) Lectured by Abdenour Hadid Adjunct Professor, CMVS, University of Oulu

More information

"Students play games while learning the connection between these games and Game Theory in computer science or Rock-Paper-Scissors and Poker what s

Students play games while learning the connection between these games and Game Theory in computer science or Rock-Paper-Scissors and Poker what s "Students play games while learning the connection between these games and Game Theory in computer science or Rock-Paper-Scissors and Poker what s the connection to computer science? Game Theory Noam Brown

More information

Learning a Value Analysis Tool For Agent Evaluation

Learning a Value Analysis Tool For Agent Evaluation Learning a Value Analysis Tool For Agent Evaluation Martha White Michael Bowling Department of Computer Science University of Alberta International Joint Conference on Artificial Intelligence, 2009 Motivation:

More information

Imperfect Information. Lecture 10: Imperfect Information. What is the size of a game with ii? Example Tree

Imperfect Information. Lecture 10: Imperfect Information. What is the size of a game with ii? Example Tree Imperfect Information Lecture 0: Imperfect Information AI For Traditional Games Prof. Nathan Sturtevant Winter 20 So far, all games we ve developed solutions for have perfect information No hidden information

More information

Chapter 3 Learning in Two-Player Matrix Games

Chapter 3 Learning in Two-Player Matrix Games Chapter 3 Learning in Two-Player Matrix Games 3.1 Matrix Games In this chapter, we will examine the two-player stage game or the matrix game problem. Now, we have two players each learning how to play

More information

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning

Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Poker AI: Equilibrium, Online Resolving, Deep Learning and Reinforcement Learning Nikolai Yakovenko NVidia ADLR Group -- Santa Clara CA Columbia University Deep Learning Seminar April 2017 Poker is a Turn-Based

More information

Adversarial Search and Game Theory. CS 510 Lecture 5 October 26, 2017

Adversarial Search and Game Theory. CS 510 Lecture 5 October 26, 2017 Adversarial Search and Game Theory CS 510 Lecture 5 October 26, 2017 Reminders Proposals due today Midterm next week past midterms online Midterm online BBLearn Available Thurs-Sun, ~2 hours Overview Game

More information

Richard Gibson. Co-authored 5 refereed journal papers in the areas of graph theory and mathematical biology.

Richard Gibson. Co-authored 5 refereed journal papers in the areas of graph theory and mathematical biology. Richard Gibson Interests and Expertise Artificial Intelligence and Games. In particular, AI in video games, game theory, game-playing programs, sports analytics, and machine learning. Education Ph.D. Computing

More information

CMPUT 396 Tic-Tac-Toe Game

CMPUT 396 Tic-Tac-Toe Game CMPUT 396 Tic-Tac-Toe Game Recall minimax: - For a game tree, we find the root minimax from leaf values - With minimax we can always determine the score and can use a bottom-up approach Why use minimax?

More information

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Last name: First name: SID: Class account login: Collaborators: CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Due: Monday 2/28 at 5:29pm either in lecture or in 283 Soda Drop Box (no slip days).

More information

CS188 Spring 2014 Section 3: Games

CS188 Spring 2014 Section 3: Games CS188 Spring 2014 Section 3: Games 1 Nearly Zero Sum Games The standard Minimax algorithm calculates worst-case values in a zero-sum two player game, i.e. a game in which for all terminal states s, the

More information

A Brief Introduction to Game Theory

A Brief Introduction to Game Theory A Brief Introduction to Game Theory Jesse Crawford Department of Mathematics Tarleton State University November 20, 2014 (Tarleton State University) Brief Intro to Game Theory November 20, 2014 1 / 36

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence CS482, CS682, MW 1 2:15, SEM 201, MS 227 Prerequisites: 302, 365 Instructor: Sushil Louis, sushil@cse.unr.edu, http://www.cse.unr.edu/~sushil Non-classical search - Path does not

More information

Contents. MA 327/ECO 327 Introduction to Game Theory Fall 2017 Notes. 1 Wednesday, August Friday, August Monday, August 28 6

Contents. MA 327/ECO 327 Introduction to Game Theory Fall 2017 Notes. 1 Wednesday, August Friday, August Monday, August 28 6 MA 327/ECO 327 Introduction to Game Theory Fall 2017 Notes Contents 1 Wednesday, August 23 4 2 Friday, August 25 5 3 Monday, August 28 6 4 Wednesday, August 30 8 5 Friday, September 1 9 6 Wednesday, September

More information

Strategy Grafting in Extensive Games

Strategy Grafting in Extensive Games Strategy Grafting in Extensive Games Kevin Waugh waugh@cs.cmu.edu Department of Computer Science Carnegie Mellon University Nolan Bard, Michael Bowling {nolan,bowling}@cs.ualberta.ca Department of Computing

More information

1. Simultaneous games All players move at same time. Represent with a game table. We ll stick to 2 players, generally A and B or Row and Col.

1. Simultaneous games All players move at same time. Represent with a game table. We ll stick to 2 players, generally A and B or Row and Col. I. Game Theory: Basic Concepts 1. Simultaneous games All players move at same time. Represent with a game table. We ll stick to 2 players, generally A and B or Row and Col. Representation of utilities/preferences

More information

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu DeepStack: Expert-Level AI in Heads-Up No-Limit Poker Surya Prakash Chembrolu AI and Games AlphaGo Go Watson Jeopardy! DeepBlue -Chess Chinook -Checkers TD-Gammon -Backgammon Perfect Information Games

More information

Heads-up Limit Texas Hold em Poker Agent

Heads-up Limit Texas Hold em Poker Agent Heads-up Limit Texas Hold em Poker Agent Nattapoom Asavareongchai and Pin Pin Tea-mangkornpan CS221 Final Project Report Abstract Our project aims to create an agent that is able to play heads-up limit

More information

Decision Making in Multiplayer Environments Application in Backgammon Variants

Decision Making in Multiplayer Environments Application in Backgammon Variants Decision Making in Multiplayer Environments Application in Backgammon Variants PhD Thesis by Nikolaos Papahristou AI researcher Department of Applied Informatics Thessaloniki, Greece Contributions Expert

More information

Five-In-Row with Local Evaluation and Beam Search

Five-In-Row with Local Evaluation and Beam Search Five-In-Row with Local Evaluation and Beam Search Jiun-Hung Chen and Adrienne X. Wang jhchen@cs axwang@cs Abstract This report provides a brief overview of the game of five-in-row, also known as Go-Moku,

More information

Game-Playing & Adversarial Search

Game-Playing & Adversarial Search Game-Playing & Adversarial Search This lecture topic: Game-Playing & Adversarial Search (two lectures) Chapter 5.1-5.5 Next lecture topic: Constraint Satisfaction Problems (two lectures) Chapter 6.1-6.4,

More information

Extensive Form Games. Mihai Manea MIT

Extensive Form Games. Mihai Manea MIT Extensive Form Games Mihai Manea MIT Extensive-Form Games N: finite set of players; nature is player 0 N tree: order of moves payoffs for every player at the terminal nodes information partition actions

More information

Game Theory: Normal Form Games

Game Theory: Normal Form Games Game Theory: Normal Form Games CPSC 322 Lecture 34 April 3, 2006 Reading: excerpt from Multiagent Systems, chapter 3. Game Theory: Normal Form Games CPSC 322 Lecture 34, Slide 1 Lecture Overview Recap

More information

game tree complete all possible moves

game tree complete all possible moves Game Trees Game Tree A game tree is a tree the nodes of which are positions in a game and edges are moves. The complete game tree for a game is the game tree starting at the initial position and containing

More information

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search COMP19: Artificial Intelligence COMP19: Artificial Intelligence Dr. Annabel Latham Room.05 Ashton Building Department of Computer Science University of Liverpool Lecture 1: Game Playing 1 Overview Last

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence CS482, CS682, MW 1 2:15, SEM 201, MS 227 Prerequisites: 302, 365 Instructor: Sushil Louis, sushil@cse.unr.edu, http://www.cse.unr.edu/~sushil Games and game trees Multi-agent systems

More information

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game?

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game? CSC384: Introduction to Artificial Intelligence Generalizing Search Problem Game Tree Search Chapter 5.1, 5.2, 5.3, 5.6 cover some of the material we cover here. Section 5.6 has an interesting overview

More information

Multiagent Systems: Intro to Game Theory. CS 486/686: Introduction to Artificial Intelligence

Multiagent Systems: Intro to Game Theory. CS 486/686: Introduction to Artificial Intelligence Multiagent Systems: Intro to Game Theory CS 486/686: Introduction to Artificial Intelligence 1 Introduction So far almost everything we have looked at has been in a single-agent setting Today - Multiagent

More information

Robust Algorithms For Game Play Against Unknown Opponents. Nathan Sturtevant University of Alberta May 11, 2006

Robust Algorithms For Game Play Against Unknown Opponents. Nathan Sturtevant University of Alberta May 11, 2006 Robust Algorithms For Game Play Against Unknown Opponents Nathan Sturtevant University of Alberta May 11, 2006 Introduction A lot of work has gone into two-player zero-sum games What happens in non-zero

More information

Creating a New Angry Birds Competition Track

Creating a New Angry Birds Competition Track Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference Creating a New Angry Birds Competition Track Rohan Verma, Xiaoyu Ge, Jochen Renz Research School

More information

Probabilistic State Translation in Extensive Games with Large Action Sets

Probabilistic State Translation in Extensive Games with Large Action Sets Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Probabilistic State Translation in Extensive Games with Large Action Sets David Schnizlein Michael Bowling

More information

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment BLUFF WITH AI CS297 Report Presented to Dr. Chris Pollett Department of Computer Science San Jose State University In Partial Fulfillment Of the Requirements for the Class CS 297 By Tina Philip May 2017

More information

Exploitability and Game Theory Optimal Play in Poker

Exploitability and Game Theory Optimal Play in Poker Boletín de Matemáticas 0(0) 1 11 (2018) 1 Exploitability and Game Theory Optimal Play in Poker Jen (Jingyu) Li 1,a Abstract. When first learning to play poker, players are told to avoid betting outside

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 1

CS 188 Fall Introduction to Artificial Intelligence Midterm 1 CS 188 Fall 2018 Introduction to Artificial Intelligence Midterm 1 You have 120 minutes. The time will be projected at the front of the room. You may not leave during the last 10 minutes of the exam. Do

More information

Learning to play Dominoes

Learning to play Dominoes Learning to play Dominoes Ivan de Jesus P. Pinto 1, Mateus R. Pereira 1, Luciano Reis Coutinho 1 1 Departamento de Informática Universidade Federal do Maranhão São Luís,MA Brazil navi1921@gmail.com, mateus.rp.slz@gmail.com,

More information

Multiagent Systems: Intro to Game Theory. CS 486/686: Introduction to Artificial Intelligence

Multiagent Systems: Intro to Game Theory. CS 486/686: Introduction to Artificial Intelligence Multiagent Systems: Intro to Game Theory CS 486/686: Introduction to Artificial Intelligence 1 1 Introduction So far almost everything we have looked at has been in a single-agent setting Today - Multiagent

More information

Game Playing. Philipp Koehn. 29 September 2015

Game Playing. Philipp Koehn. 29 September 2015 Game Playing Philipp Koehn 29 September 2015 Outline 1 Games Perfect play minimax decisions α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information 2 games

More information

BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang

BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang Introduction BetaPoker: Reinforcement Learning for Heads-Up Limit Poker Albert Tung, Eric Xu, and Jeffrey Zhang Texas Hold em Poker is considered the most popular variation of poker that is played widely

More information

CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9

CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9 CSCI 4150 Introduction to Artificial Intelligence, Fall 2004 Assignment 7 (135 points), out Monday November 22, due Thursday December 9 Learning to play blackjack In this assignment, you will implement

More information

ADVERSARIAL SEARCH. Chapter 5

ADVERSARIAL SEARCH. Chapter 5 ADVERSARIAL SEARCH Chapter 5... every game of skill is susceptible of being played by an automaton. from Charles Babbage, The Life of a Philosopher, 1832. Outline Games Perfect play minimax decisions α

More information

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements

CS 1571 Introduction to AI Lecture 12. Adversarial search. CS 1571 Intro to AI. Announcements CS 171 Introduction to AI Lecture 1 Adversarial search Milos Hauskrecht milos@cs.pitt.edu 39 Sennott Square Announcements Homework assignment is out Programming and experiments Simulated annealing + Genetic

More information

Ar#ficial)Intelligence!!

Ar#ficial)Intelligence!! Introduc*on! Ar#ficial)Intelligence!! Roman Barták Department of Theoretical Computer Science and Mathematical Logic So far we assumed a single-agent environment, but what if there are more agents and

More information

ECON 282 Final Practice Problems

ECON 282 Final Practice Problems ECON 282 Final Practice Problems S. Lu Multiple Choice Questions Note: The presence of these practice questions does not imply that there will be any multiple choice questions on the final exam. 1. How

More information

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel Foundations of AI 6. Adversarial Search Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard & Bernhard Nebel Contents Game Theory Board Games Minimax Search Alpha-Beta Search

More information

Games and Adversarial Search

Games and Adversarial Search 1 Games and Adversarial Search BBM 405 Fundamentals of Artificial Intelligence Pinar Duygulu Hacettepe University Slides are mostly adapted from AIMA, MIT Open Courseware and Svetlana Lazebnik (UIUC) Spring

More information

Using Fictitious Play to Find Pseudo-Optimal Solutions for Full-Scale Poker

Using Fictitious Play to Find Pseudo-Optimal Solutions for Full-Scale Poker Using Fictitious Play to Find Pseudo-Optimal Solutions for Full-Scale Poker William Dudziak Department of Computer Science, University of Akron Akron, Ohio 44325-4003 Abstract A pseudo-optimal solution

More information

arxiv: v1 [cs.gt] 23 May 2018

arxiv: v1 [cs.gt] 23 May 2018 On self-play computation of equilibrium in poker Mikhail Goykhman Racah Institute of Physics, Hebrew University of Jerusalem, Jerusalem, 91904, Israel E-mail: michael.goykhman@mail.huji.ac.il arxiv:1805.09282v1

More information

Game Tree Search. Generalizing Search Problems. Two-person Zero-Sum Games. Generalizing Search Problems. CSC384: Intro to Artificial Intelligence

Game Tree Search. Generalizing Search Problems. Two-person Zero-Sum Games. Generalizing Search Problems. CSC384: Intro to Artificial Intelligence CSC384: Intro to Artificial Intelligence Game Tree Search Chapter 6.1, 6.2, 6.3, 6.6 cover some of the material we cover here. Section 6.6 has an interesting overview of State-of-the-Art game playing programs.

More information

Opponent Modelling by Expectation-Maximisation and Sequence Prediction in Simplified Poker

Opponent Modelling by Expectation-Maximisation and Sequence Prediction in Simplified Poker IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 1 Opponent Modelling by Expectation-Maximisation and Sequence Prediction in Simplified Poker Richard Mealing and Jonathan L. Shapiro Abstract

More information

Multiagent Systems: Intro to Game Theory. CS 486/686: Introduction to Artificial Intelligence

Multiagent Systems: Intro to Game Theory. CS 486/686: Introduction to Artificial Intelligence Multiagent Systems: Intro to Game Theory CS 486/686: Introduction to Artificial Intelligence 1 Introduction So far almost everything we have looked at has been in a single-agent setting Today - Multiagent

More information

Endgame Solving in Large Imperfect-Information Games

Endgame Solving in Large Imperfect-Information Games Endgame Solving in Large Imperfect-Information Games Sam Ganzfried and Tuomas Sandholm Computer Science Department Carnegie Mellon University {sganzfri, sandholm}@cs.cmu.edu ABSTRACT The leading approach

More information

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.)

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Eric B. Laber February 12, 2008 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or,

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence selman@cs.cornell.edu Module: Adversarial Search R&N: Chapter 5 1 Outline Adversarial Search Optimal decisions Minimax α-β pruning Case study: Deep Blue

More information

Data Biased Robust Counter Strategies

Data Biased Robust Counter Strategies Data Biased Robust Counter Strategies Michael Johanson johanson@cs.ualberta.ca Department of Computing Science University of Alberta Edmonton, Alberta, Canada Michael Bowling bowling@cs.ualberta.ca Department

More information

CS 771 Artificial Intelligence. Adversarial Search

CS 771 Artificial Intelligence. Adversarial Search CS 771 Artificial Intelligence Adversarial Search Typical assumptions Two agents whose actions alternate Utility values for each agent are the opposite of the other This creates the adversarial situation

More information

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search

CS 2710 Foundations of AI. Lecture 9. Adversarial search. CS 2710 Foundations of AI. Game search CS 2710 Foundations of AI Lecture 9 Adversarial search Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square CS 2710 Foundations of AI Game search Game-playing programs developed by AI researchers since

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificial Intelligence Spring 2007 Lecture 7: CSP-II and Adversarial Search 2/6/2007 Srini Narayanan ICSI and UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell or

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Adversarial Search Prof. Scott Niekum The University of Texas at Austin [These slides are based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Adversarial Search Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]

More information

Advanced Microeconomics: Game Theory

Advanced Microeconomics: Game Theory Advanced Microeconomics: Game Theory P. v. Mouche Wageningen University 2018 Outline 1 Motivation 2 Games in strategic form 3 Games in extensive form What is game theory? Traditional game theory deals

More information

Game Playing: Adversarial Search. Chapter 5

Game Playing: Adversarial Search. Chapter 5 Game Playing: Adversarial Search Chapter 5 Outline Games Perfect play minimax search α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information Games vs. Search

More information

Local Search. Hill Climbing. Hill Climbing Diagram. Simulated Annealing. Simulated Annealing. Introduction to Artificial Intelligence

Local Search. Hill Climbing. Hill Climbing Diagram. Simulated Annealing. Simulated Annealing. Introduction to Artificial Intelligence Introduction to Artificial Intelligence V22.0472-001 Fall 2009 Lecture 6: Adversarial Search Local Search Queue-based algorithms keep fallback options (backtracking) Local search: improve what you have

More information

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1 Announcements Homework 1 Due tonight at 11:59pm Project 1 Electronic HW1 Written HW1 Due Friday 2/8 at 4:00pm CS 188: Artificial Intelligence Adversarial Search and Game Trees Instructors: Sergey Levine

More information

Best Response to Tight and Loose Opponents in the Borel and von Neumann Poker Models

Best Response to Tight and Loose Opponents in the Borel and von Neumann Poker Models Best Response to Tight and Loose Opponents in the Borel and von Neumann Poker Models Casey Warmbrand May 3, 006 Abstract This paper will present two famous poker models, developed be Borel and von Neumann.

More information

Learning from Hints: AI for Playing Threes

Learning from Hints: AI for Playing Threes Learning from Hints: AI for Playing Threes Hao Sheng (haosheng), Chen Guo (cguo2) December 17, 2016 1 Introduction The highly addictive stochastic puzzle game Threes by Sirvo LLC. is Apple Game of the

More information

ECON 312: Games and Strategy 1. Industrial Organization Games and Strategy

ECON 312: Games and Strategy 1. Industrial Organization Games and Strategy ECON 312: Games and Strategy 1 Industrial Organization Games and Strategy A Game is a stylized model that depicts situation of strategic behavior, where the payoff for one agent depends on its own actions

More information

On Range of Skill. Thomas Dueholm Hansen and Peter Bro Miltersen and Troels Bjerre Sørensen Department of Computer Science University of Aarhus

On Range of Skill. Thomas Dueholm Hansen and Peter Bro Miltersen and Troels Bjerre Sørensen Department of Computer Science University of Aarhus On Range of Skill Thomas Dueholm Hansen and Peter Bro Miltersen and Troels Bjerre Sørensen Department of Computer Science University of Aarhus Abstract At AAAI 07, Zinkevich, Bowling and Burch introduced

More information

Adversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012

Adversarial Search. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 9 Feb 2012 1 Hal Daumé III (me@hal3.name) Adversarial Search Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 9 Feb 2012 Many slides courtesy of Dan

More information

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH Santiago Ontañón so367@drexel.edu Recall: Problem Solving Idea: represent the problem we want to solve as: State space Actions Goal check Cost function

More information

Endgame Solving in Large Imperfect-Information Games

Endgame Solving in Large Imperfect-Information Games Endgame Solving in Large Imperfect-Information Games Sam Ganzfried and Tuomas Sandholm Computer Science Department Carnegie Mellon University {sganzfri, sandholm}@cs.cmu.edu Abstract The leading approach

More information

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Valentijn Muijrers 3275183 Valentijn.Muijrers@phil.uu.nl Supervisor: Gerard Vreeswijk 7,5 ECTS

More information

Game theory and AI: a unified approach to poker games

Game theory and AI: a unified approach to poker games Game theory and AI: a unified approach to poker games Thesis for graduation as Master of Artificial Intelligence University of Amsterdam Frans Oliehoek 2 September 2005 Abstract This thesis focuses on

More information

Announcements. CS 188: Artificial Intelligence Fall Local Search. Hill Climbing. Simulated Annealing. Hill Climbing Diagram

Announcements. CS 188: Artificial Intelligence Fall Local Search. Hill Climbing. Simulated Annealing. Hill Climbing Diagram CS 188: Artificial Intelligence Fall 2008 Lecture 6: Adversarial Search 9/16/2008 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 Announcements Project

More information

To play the game player has to place a bet on the ANTE bet (initial bet). Optionally player can also place a BONUS bet.

To play the game player has to place a bet on the ANTE bet (initial bet). Optionally player can also place a BONUS bet. ABOUT THE GAME OBJECTIVE OF THE GAME Casino Hold'em, also known as Caribbean Hold em Poker, was created in the year 2000 by Stephen Au- Yeung and is now being played in casinos worldwide. Live Casino Hold'em

More information

Documentation and Discussion

Documentation and Discussion 1 of 9 11/7/2007 1:21 AM ASSIGNMENT 2 SUBJECT CODE: CS 6300 SUBJECT: ARTIFICIAL INTELLIGENCE LEENA KORA EMAIL:leenak@cs.utah.edu Unid: u0527667 TEEKO GAME IMPLEMENTATION Documentation and Discussion 1.

More information

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask Set 4: Game-Playing ICS 271 Fall 2017 Kalev Kask Overview Computer programs that play 2-player games game-playing as search with the complication of an opponent General principles of game-playing and search

More information

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13

Algorithms for Data Structures: Search for Games. Phillip Smith 27/11/13 Algorithms for Data Structures: Search for Games Phillip Smith 27/11/13 Search for Games Following this lecture you should be able to: Understand the search process in games How an AI decides on the best

More information

Summary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility

Summary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility Summary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility theorem (consistent decisions under uncertainty should

More information

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Programming an Othello AI Michael An (man4), Evan Liang (liange) Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black

More information

Pengju

Pengju Introduction to AI Chapter05 Adversarial Search: Game Playing Pengju Ren@IAIR Outline Types of Games Formulation of games Perfect-Information Games Minimax and Negamax search α-β Pruning Pruning more Imperfect

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Adversarial Search Instructors: David Suter and Qince Li Course Delivered @ Harbin Institute of Technology [Many slides adapted from those created by Dan Klein and Pieter Abbeel

More information

Game Playing State-of-the-Art

Game Playing State-of-the-Art Adversarial Search [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Game Playing State-of-the-Art

More information

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5 Adversarial Search CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2017 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5 Outline Game

More information

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search CSE 473: Artificial Intelligence Fall 2017 Adversarial Search Mini, pruning, Expecti Dieter Fox Based on slides adapted Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Dan Weld, Stuart Russell or Andrew Moore

More information

CS 380: ARTIFICIAL INTELLIGENCE

CS 380: ARTIFICIAL INTELLIGENCE CS 380: ARTIFICIAL INTELLIGENCE ADVERSARIAL SEARCH 10/23/2013 Santiago Ontañón santi@cs.drexel.edu https://www.cs.drexel.edu/~santi/teaching/2013/cs380/intro.html Recall: Problem Solving Idea: represent

More information

Statistical Analysis of Nuel Tournaments Department of Statistics University of California, Berkeley

Statistical Analysis of Nuel Tournaments Department of Statistics University of California, Berkeley Statistical Analysis of Nuel Tournaments Department of Statistics University of California, Berkeley MoonSoo Choi Department of Industrial Engineering & Operations Research Under Guidance of Professor.

More information