AN ABSTRACT OF THE THESIS OF

Size: px
Start display at page:

Download "AN ABSTRACT OF THE THESIS OF"

Transcription

1

2 AN ABSTRACT OF THE THESIS OF Paul Lewis for the degree of Master of Science in Computer Science presented on June 1, Title: Ensemble Monte-Carlo Planning: An Empirical Study Abstract approved: Alan Fern Monte-Carlo planning algorithms such as UCT make decisions at each step by intelligently expanding a single search tree given the available time and then selecting the best root action. Recent work has provided evidence that it can be advantageous to instead construct an ensemble of search trees and make a decision according to a weighted vote. However, these prior investigations have only considered the application domains of Go and Solitaire and were limited in the scope of ensemble configurations considered. In this paper, we conduct a large scale empirical study of ensemble Monte-Carlo planning using the UCT algorithm in a set of five additional diverse and challenging domains. In particular, we evaluate the advantages of a broad set of ensemble configurations in terms of space and time efficiency in both parallel and sequential time models. Our results show that ensembles are an effective way to improve performance given a parallel model, can significantly reduce space requirements and in some

3 cases may improve performance in a sequential model. Additionally, from our work we produced an open-source planning library.

4 c Copyright by Paul Lewis June 1, 2010 All Rights Reserved

5 Ensemble Monte-Carlo Planning: An Empirical Study by Paul Lewis A THESIS submitted to Oregon State University in partial fulfillment of the requirements for the degree of Master of Science Presented June 1, 2010 Commencement June 2011

6 Master of Science thesis of Paul Lewis presented on June 1, APPROVED: Major Professor, representing Computer Science Director of the School of Electric Engineering and Computer Science Dean of the Graduate School I understand that my thesis will become part of the permanent collection of Oregon State University libraries. My signature below authorizes release of my thesis to any reader upon request. Paul Lewis, Author

7 TABLE OF CONTENTS Page 1 Introduction 1 2 Monte Carlo Planning with Sparse UCT 3 3 Ensemble UCT Motivation for Ensembles Parallelization Variance Reduction Space and Time Complexity Related Work 15 5 A Generic Java Planning Library Running Tests Adding New Domains Description and Evaluation of Domains Backgammon Biniax Connect Havannah Yahtzee Empirical Results Experimental Setup General Results Specific Observations Biniax Backgammon Connect Havannah Yahtzee

8 TABLE OF CONTENTS (Continued) Page 7.4 Parameter Sensitivity Summary 48 Bibliography 49

9 LIST OF FIGURES Figure Page 2.1 SparseUCT(s 0, t, c, ss) UCT Example Part UCT Example Part EnsembleUCT(s 0, t, c, ss, e) Various Ensemble Methods Test File backgammon random uct Output File backgammon random uct results Test File Test File Comparison of Domains Backgammon Board [9] Biniax Examples Havannah Board Yahtzee UCT Constant Table Connect 4 Ensemble Timing Table (ms) Biniax Sparseness Biniax UCT Ensemble Table Backgammon Ensemble Table Connect 4 Ensemble Table Havannah Ensemble Table Yahtzee Ensemble Table Alternate Yahtzee Ensemble Table Connect 4 Ensemble Parameter Sensitivity

10 LIST OF FIGURES (Continued) Figure Page 7.11 Yahtzee Ensemble Parameter Sensitivity

11 Chapter 1 Introduction UCT is a Monte-Carlo planning algorithm [10], which extends recent algorithms for multi-armed bandit problems to sequential decision problems including general Markov Decision Processes and games. The algorithm has received stature as the premiere computer algorithm for the game of Go [6]. UCT uses Monte-Carlo methods combined with an evaluation function to construct a search tree and select a best root action. There has also been recent work into the benefits of various parallel UCT methods in the domain of Go [4]. One of these methods, root parallelization, builds multiple trees from a common root state and combines the root evaluation into a weighted vote from each tree. This method is similar to bagging methods for classifiers. This has been shown to have parallel time benefits and in some cases sequential time benefits. We evaluate these results with an abstract software structure that can run a general purpose UCT algorithm on any one of five distinct domains: Biniax, Backgammon, Connect 4, Havannah and Yahtzee. With our software we were able to compare the performance of standard UCT methods to that of the ensemble method, root parallelization. We were able to show that across a variety of domains ensemble UCT is usually an effective tool for boosting performance in parallel time, will always conserve memory in sequential time and in certain circumstances may boost performance in sequential time. We are able to show that as a tree grows the time to add each new trajectory

12 2 increases. This finding increases the incentive to use ensembles as they keep the tree size low while boosting performance. We also found that ensembles work well with sub optimal values of the UCT constant, C, but work best with the best value of C for base UCT. The implications of this are that ensembles can be an easy extension to an already well tuned UCT implementation.

13 3 Chapter 2 Monte Carlo Planning with Sparse UCT The UCT algorithm uses Monte-Carlo methods to construct a game tree that balances exploration with exploitation. The basic algorithm only has two parameters to adjust; the number of roll-out trajectories and the UCT constant. The UCT constant, C, is a domain dependent parameter that controls exploration. Larger values of C increase exploration. The greater the number of trajectories (roll-outs) the more data UCT collects and the stronger it becomes. This algorithm is guaranteed to converge on the optimal solution given enough roll-outs. A well chosen C value will allow UCT to converge more quickly. The base UCT algorithm is also domain independent in that it can be an effective tool even without any domain heuristic. Given a current state, UCT selects an action by constructing a sparse lookahead tree over the state-space with the current state as the root and leaf nodes corresponding to terminal states. If the domain is deterministic then the edges may represent actions from one state node to another. If the domain is stochastic then the tree will contain alternating state and action nodes. Our implementation uses both state and action nodes as this works in both deterministic and stochastic domains. Each node, n, in the resulting tree stores the number of times visited and a cumulative reward. The value estimate of any node can be computed by dividing the cumulative reward by the number of visits. As shown by the following

14 4 formula. Q UCT (n) = n.rewards/n.visits UCT is distinct in the way that it constructs the tree and estimates action values. Unlike standard minimax search and sparse sampling [8], which typically build depth bounded trees and apply evaluation functions at the leaves, UCT does not impose a depth bound and does not require an evaluation function. Rather, UCT incrementally constructs a tree and updates action values by carrying out a sequence of Monte-Carlo roll-outs of entire decision making sequences starting from the root to a terminal state. The key idea behind UCT is to intelligently bias the roll-out trajectories toward ones that appear more promising based on previous trajectories, while maintaining sufficient exploration. Each roll-out begins at the root and actions are selected via the following process. If the current state contains actions that have not yet been explored in previous roll-outs, then a random action is chosen from among the unselected actions. Otherwise, if all actions in the current state have been explored previously then UCT selects the action that maximizes an upper confidence bound given by the following formula, where s is a state and a is a legal action taken from that state. Q UCT (s, a) = Q UCT (a) + C log(s.visits) a.visits The first term of the formula greedily selects the current best action while the second term gives a higher value to actions that have not been explored much.

15 5 The other decision that needs to be made while traversing the tree is at the action nodes. If the domain is deterministic then each action node trivially leads to a single state node. However, in stochastic domains a given action may lead to one of several state nodes. In this case base UCT will call the domain simulator to return the next state, s, from the current state and the chosen action. If the current action node already contains s then the algorithm will use that node otherwise a new node is created. In this way duplicate states from a given action are not generated. Some domains are highly stochastic. If the number of possible states per action is large compared to the number of roll-out trajectories then each time an action node is visited the simulator will generate a new s and no state from that action node will be visited more than once. This is where we introduce the concept of sparse sampling that is effective in these domains. All sparse sampling does is put a limit on the number of times an action node calls the simulator to generate a new s. Once the limit is reached the action node will randomly select among the already present state nodes. The state nodes will be weighted based on how many times they were originally visited. Thus if all the nodes were visited once except for one node that was visited twice the node that was visited twice will have a probability of being selected equal to 2/sparseSampleSize while all the other nodes will have a probability of 1/sparseSampleSize. The last point to make about the algorithm is that once it adds a new leaf node (leaf nodes are always state nodes) it will play a random complete game and return the result. The resultant reward is passed back up the tree and added to each node visited. If a leaf node in the tree is a terminal state then no random game is played and

16 6 the reward from that terminal state is passed back up the tree. The sparse UCT algorithm is given by algorithm 2.1. Note that sparse UCT is the same as normal UCT when sparse sample size is set to infinity. We present an example to show how sparse UCT would build a tree. Lets assume that our domain is stochastic with 2 competing agents and that there is a reward of 1 for winning and -1 for losing. In these examples 2.2 blue circles are state nodes, red circles are action nodes and the value inside each node represents the cumulative reward on the left of the colon and the number of visits on the right. The tree would start with a single state node. From here UCT would play a random game. The result of the game could be one of three values, [0, 0] (draw), [1, 1] (player 1 wins) or [ 1, 1] (player 2 wins). Part 1 of the example shows player 1 to win and part 2 shows how the root node gets updated appropriately. Part 3 shows how the next roll-out chooses a random action from the pool of unvisited actions and the simulator selects a state from that action then plays a random game from the new leaf state. Part 4 shows the updated nodes that were visited by the last trajectory. Note that since this is a two player game where the turns alternate, the leaf state updates its reward from the first value while the parent action and state update their reward from the second value which represents the other player s reward. Assuming that there are still unvisited nodes part 5 from figure 2.3 shows a new action and state added to the tree and another random game being played. The visited nodes in part 6 are similarly updated. If we assume that the root state has no other unvisited action then it will choose between the current two for its next roll-out. Since they were both visited once and the one on the left

17 7 Figure 2.1: SparseUCT(s 0, t, c, ss) Input: s 0 initial state t number of trajectories c UCT constant ss sparse sample size Output: values for each action in s 1: for i = 1 to t do 2: s s 0 3: while s terminal state do 4: if all actions of s have been sampled then 5: a argmax a Q UCT (s, a) 6: else 7: a random unsampled action from s 8: end if 9: if s.visits = ss then 10: s select weighted random child of a 11: else 12: s transition(s, a) 13: if a doesn t have child s then 14: create new state node s for a 15: s simulaterandomgamet oend(s) 16: end if 17: end if 18: end while 19: increment visits and add terminal state reward to all visited nodes 20: end for 21: return array of a.reward/a.visits for each action in s 0

18 8 Figure 2.2: UCT Example Part :0 1:1 1:1 2:2 [1,-1] 0:0 1:1 0:0-1:1 [-1,1] has a higher cumulative reward it will be selected for part 7. Two more nodes are added and part 8 shows the resulting updated tree. If another simulation were run then in this scenario the right action node would be selected from the root since both nodes have the same cumulative reward but the right node has been visited fewer times.

19 9 5 Figure 2.3: UCT Example Part :2 2:3 2:3 1:4 1:1 0:0 1:1 0:1 1:1 0:1 0:2 0:1-1:1 0:0-1:1 0:1-1:1 0:1 0:2 0:1 [0,0] 0:0 1:1 0:0-1:1 [-1,1]

20 10 Chapter 3 Ensemble UCT Ensemble UCT methods generate multiple UCT trees from a common root state. There are several ways to use the multiple trees to vote on the root actions. We used the root parallelization method 3.1; after all the trees are generated the visits and rewards of each root action in each tree are added together and then the average for each action is computed. This is the same as taking a weighted total as shown by the following formula. r 1 + r r n v 1 + v v n = v 1 v 1 + v v n r 1 v v n v 1 + v v n r n v n When we started working with ensembles one of our initial directions was to evaluate various ensemble techniques. All of the methods that we looked into used the root actions of several trees. The evaluation criteria was the only thing that changed. In this way we were also able to compare evaluations in different circumstances and see how they differed. Our candidates were root parallelization, root ensemble, plurality vote, instant runoff vote and Borda count vote. Root ensemble took the average of all the value estimates rather then the weighted average as root parallelization did.

21 11 Figure 3.1: EnsembleUCT(s 0, t, c, ss, e) Input: s 0 initial state t number of trajectories c UCT constant ss sparse sample size e number of ensembles Output: action 1: for i = 1 to e do 2: {r and v are arrays of rewards and visits to each action node} 3: (r, v) SparseUCT (s 0, t, c, ss) 4: for j = 1 to number of children root do 5: sumr j + = r j 6: sumv j + = v j 7: end for 8: end for 9: return action with highest average reward: sumr j /sumv j, where sumv j > 0 root ensemble = r 1 v 1 + r 2 v r n v n With plurality vote each tree selects the single best action and the action that has the most votes is selected. Ties are decided randomly. Instant runoff and board count are more complex voting systems where instance runoff has rounds of removing the worse action and giving its votes to another action and Borda count assigns points based on rank and the action with the overall most points is selected. Table 3.2 shows some early tests we ran on the Connect 4 domain with base UCT 4096 against the various ensemble methods where the number of ensemble trees we fixed at 8. In this case root parallelization performs the best while plurality vote is close behind and root ensemble does terrible. From other tests we performed in

22 12 Trajectories Figure 3.2: Various Ensemble Methods Root Parallelization Root Ensemble Ensembles Methods Plurality Vote Instant Runoff Vote Borda Count Vote various domains we got a clear picture that the root ensemble method was weaker than root parallelization and that even though some of the other voting methods showed some promise the root parallelization was overall stronger. 3.1 Motivation for Ensembles Parallelization From the ensemble algorithm 3.1 one can see that each iteration of the outermost loop that generates each UCT tree can be run in parallel. Once all the trees have been generated and the rewards and visits of each root action node have been summed, a short evaluation of the action with the best average reward can be made. The importance of running the algorithm in parallel is that any kind of time speed up with ensembles indicates that it may be useful in a cluster environment when all the processors need to be utilized.

23 Variance Reduction The idea of ensemble planning is related to the notion of bagging in machine learning where multiple independent classifiers are learned based on bootstrap training sets. Bagging has been shown to significantly improve performance for learning algorithms that have high variance [2]. As UCT builds a tree, the most promising actions get explored the most. Sometimes the optimal action gets unlucky early on and it takes a while to catch up because many more trajectories are being used to explore sub optimal actions. If the chance of selecting the optimal action at a given state is greater than all other actions, then ensembles over multiple trees reduce the variance in selecting that action. UCT converges on the optimal action but this may take a large number of trajectories. However, if a fixed number of trajectories has a good chance of selecting the optimal action then averaging over multiple trees will increase the chance of selecting the optimal action. So there may be cases where more improvement can be gained in the same amount of time for using ensembles as for increasing the number of trajectories. This motivates using ensembles to improve sequential time UCT algorithms. Any of the ensemble methods we tried would have the effect of reducing the variance. However, root parallelization, weighted the votes with more weight given to actions that ran more simulations. This is important so that outliers with few trajectories don t pull the results away from the optimal action.

24 Space and Time Complexity Ensembles are an alternaative to creating larger trees by building multiple smaller ones. The smaller trees take less memory which is important for UCT. The amount of memory increases linearly with the amount of time taken to run the base algorithm because a new node is added to the tree with each trajectory. Also, bigger trees take longer to traverse before a random game can be played. This increases the time of each new trajectory so that a tree of double the size takes more than double the time to run. This means that ensembles have an extra savings in time.

25 15 Chapter 4 Related Work Our work builds off of a few papers that started to explore parallelization methods of the UCT algorithm. A wide variety of methods were tried including complex message passing [3]. Some methods related to these were applied to the domain of Go [5]. But more recently the root parallelization method was introduced alongside a few other methods [4]. This work was part of an evaluation of parallel UCT methods. The root parallelization method that we are investigating was shown to not only be effective in parallel time but also in sequential time. The unit of measures used were Games-Per-Second (GPS) and strength speedup. The GPS is the increase in the number of complete UCT roll-outs performed per second. For root parallelization this means that 1 thread has a GPS of 1 while 4 threads each building a single tree has a GPS of 4. The strength speedup was a direct measure of how much stronger the algorithm performed compared to an equivalent single threaded algorithm. So a strength speedup of 2 would mean that the algorithm has the same strength as a single threaded program that takes 2 times as long to run. The results for a 2 threaded program were a GPS of 2 and a strength speed up of 3; a 4 threaded program had a GPS of 4 with a strength speedup of 6.5. These results showed it is possible for ensembles to increase the strength of the algorithm greater than the time speedup and thus may have usefulness in sequential time speedup.

26 16 The root parallelization method is similar to bagging predictors used in bootstrap learning [2]. Bagging methods run a classifier multiple times and average over the results. In cases where the classifier was highly unstable (there was a lot of variance) bagging worked best. In cases where the classifiers were stable or very accurate bagging was fairly useless although it did not hurt the results. One remark from this paper was the ease of parallelizing the bagging method. No communication between CPUs would be needed because each classifier could be run independently and then averaged over in the end. Prior work in the context of solitaire introduced a family of algorithms that contained UCT, HOP-UCT, and Ensemble UCT [1]. The focus there was on achieving the best results in the domain of Klondike solitaire. Some advantages were noted for the ensemble approach, but the evaluation was not exhaustive in that it focused on optimizing a single domain.

27 17 Chapter 5 A Generic Java Planning Library Our planning library was built using a Java code base that allows new agents and domains to be easily added and to interact with the already present agents and domains. The code base is open source under the GPL 2.0 license and can be found at Beaversource. There are currently six domains and 4 agents included in the library; The agents are Random, Human, UCT and Expectimax; The domains are Backgammon, Biniax, Connect 4, EWN, Havannah and Yahtzee. The original purpose of this library was to allow a generic UCT algorithm to play on a myriad domains. The library ended up growing into a general purpose library for any agent that could function in turn based discrete fully observable static domains. The library does not currently support simultaneous and partially observable domains. The base class structure consists of the following classes: Action, State, Simulator, Agent and Arbiter. The Action and State classes abstractly represent an action and a state in a given domain respectively. These classes typically just hold information about the action and state with some accessor methods. It is typical for these classes to be immutable. The Simulator class is tied to a specific state and action and computes legal actions and rewards at a given state and the state transition given an action. The simulator is used by the agents to explore the domain and the arbiter to regulate a game. The Agent class s main method is

28 18 selectaction that takes a state and simulator as parameters and returns an action. The Arbiter class calls the selectaction method on each Agent when it is time to make a move. The Arbiter class is what governs the legal play of a game. It takes in the agents and simulators needed to play a game. It also records some statistics such as time taken to move. The Arbiter class can run many games in the same domain against the same set of agents and collect statistics on average reward. In order to keep things fair it rotates the move order of agents each game. 5.1 Running Tests The UCTProject class is used to run tests. This class takes zero, one or two arguments. If it is run with no arguments then it runs in interactive mode. In this mode the user selects a simulator and agents. This is the only mode that the human agent can be used. If the human agent is run the the user inputs actions and gets feedback about the the current state and legal actions. If there are no human agents then the single game is simulated and the history and results are shown at the end. If a single argument is passed then it is the file path of a test file. An example test file is given 5.1. All white space and comments (any line starting with a #) are ignored. The first line read in (that isn t a comment or whitespace) indicates the number of complete games to be simulated. The second line indicates the simulator that will be passed to the arbiter. The third line indicates the simulator that each agent will use to make decisions. The reason this distinction is made is

29 19 #Number of Simulated Games 2000 #World Backgammon #Simulated World Backgammon Figure 5.1: Test File backgammon random uct #Agents Random UCT ROOT_PARALLELIZATION that an agent may not have access to the complete real world simulator but only a close approximation. The next two lines are used to specify the agents to run in the domain. Each line is given to an agent and the parameters for that agent are separated by white space. Note that if a domain such as backgammon requires two agents it will throw an exception if not enough agents are included in the test file and if more than two agents are specified it will just use the first two agents. The test file in figure 5.1 will play 2000 games in the domain of Backgammon between a Random agent and a UCT agent with the number of trajectories set to 1024, C = 1, sparse sample size set to infinity and the number of ensembles set to 1. The output of the test from 5.1 backgammon random uct will be appended to the file backgammon random uct results 5.2. All output will be appended to a file with the name of the input file plus results attached to the end of the name. If that file does not already exist it will be created. Figure 5.2 shows what

30 20 Figure 5.2: Output File backgammon random uct results # BackgammonSimulator - BackgammonSimulator , 0.109, 0.014, 0.023, Random 0.994, 0.109, , , UCT, 1024, 1, -1, 1, ROOT_PARALLELIZATION the output of figure 5.1 might look like. The fist line is just the first 3 lines of the test file put onto one line. Each line that follows is given to an agent in the tests. The first value is the average reward while the second parameter is the standard deviation of the average reward. The next two values are the average time per move and the standard deviation in time per move respectively. Then at the end is appended the agent name and the parameters it used. If a second argument is passed to UCTProject then that second argument is the name of the output file instead of the default name. Another useful feature of the test files is shown in figure 5.3. This time the domain is Yahtzee and since it is a single agent domain only one agent is given. However, the number of trajectories that UCT takes in as a parameter is given by an array of values, [128,256,512,1024], instead of a single value. What this does is run 4 separate tests and append the results of each test to the same output file. Each test will use a different value for the number of trajectories. Multiple parameters can be specified in this manner as shown in figure 5.4. In this case all possible combinations of the parameters are run. Thus there will be output for the following pairs of parameters in the order given: (128,2), (256,2), (512,2), (1024,2), (128,4), (256,4), (512,4), (1024,4). Instead of writing [2,4,6,8,10] you may input

31 21 #Number of Simulated Games 2000 #World Yahtzee #Simulated World Yahtzee Figure 5.3: Test File 2 #Agents UCT [128,256,512,1024] ROOT_PARALLELIZATION [2:4:10]. The first and laster values are the first and last values to test respectively. The difference between the first and last values divided by the middle value is the step value. Thus for [2:4:10] it would be (10 2)/4) = 2 and the results values would be [2,4,6,8,10]. 5.2 Adding New Domains Any new domain that is added to this library will need to meet a few requirements. All domains consist of at least three parts: a simulator, state and action class. There must be one top level state and action class that inherit from the Action and State classes respectively. If the domain has multiple types of states or actions these can in turn inherit from the top level action and state class. For instance in the domain of Yahtzee there is a single state class, YahtzeeState, that inherits from State and an action class, YahtzeeAction, that inherits from the Ac-

32 22 #Number of Simulated Games 2000 #World Yahtzee #Simulated World Yahtzee Figure 5.4: Test File 3 #Agents UCT [128,256,512,1024] 64-1 [2,4] ROOT_PARALLELIZATION tion class. Since in Yahtzee there are two different actions that can be performed, namely selecting dice to re-roll or selecting a category to score Yahtzee has two other action classes, YahtzeeRollAction and YahtzeeSelectAction that inherit from YahtzeeAction. YahtzeeAction is never used as an object. The only reason to have it is so that the Yahtzee simulator, YahtzeeSimulator, can use generics to specify that it only uses actions of type YahtzeeAction and states of type YahtzeeState. This protects the simulator from being misused by passing in a state or action that is not a Yahtzee state or action. Another thing to keep in mind is that this library was designed to use immutable state and action objects (objects that hold data that can be accessed by other objects but cannot be modified). Instead of changing a state a new state object is created to replace the old one. This has advantages and disadvantages. If a mutable action or state object is created then some of the basic simulator methods

33 23 also need to be overridden since they assume that the objects are immutable. The simulator class itself is a mutable object. It contains the methods for taking an action and computing the rewards and legal actions at a given state. A simulator has a method takeaction that replaces its current state with a new state object based on the passed in action. It is important for a simulator to check that the passed in action is indeed a legal action from that state or there could be problems. Also, the simulator keeps a record of the legal actions and rewards array for the current state. These should be updated whenever the state changes. The getlegalactions and getrewards methods don t need to be overridden as they just return these objects. This means that it is desirable to make some kind of method that computes the legal action and rewards that can be called whenever the state changes. For all of the current simulators computelegalactions and computerewards are used as private methods although the names don t matter.

34 24 Chapter 6 Description and Evaluation of Domains The following section describes the domains that were used to generate data for the paper. We included five domains all of which are discrete, fully observable, static and either deterministic or stochastic. Figure 6 is a quick comparison of the following features in the domains: number of agents, value of the UCT constant and upper bound on actions per state (APS) and states per action (SPA). A reasonable value of C was chosen for each domain by running base UCT at a fixed number of trajectories and converging on a local optimum. 6.1 Backgammon Backgammon is an ancient gambling game that is still popular today. A neural network approach TD-Gammon [11] has achieved mastery level play. Backgammon is a turn based two player domain, played on a board consisting Figure 6.1: Comparison of Domains Domain Agents C APS SPA Backgammon Biniax nelements! 5 (nelements 2)!2! Connect Havannah Yahtzee

35 25 Figure 6.2: Backgammon Board [9] of twenty-four narrow triangles called points. The triangles alternate in color and are grouped into four quadrants of six triangles each. The quadrants are referred to as a player s home board and outer board and the opponent s home board and outer board. The home and outer boards are separated from each other by a ridge down the center of the board called the bar. The points are numbered for either player starting in that player s home board. The outermost point is the twenty-four point, which is also the opponent s one point. Each player has fifteen checkers of his own color. The initial arrangement of checkers is: two on each player s twenty-four point, five on each player s thirteen point, three on each player s eight point, and five on each player s six point. Figure 6.2 shows the initial board setup.

36 26 The starting player is determined randomly and players take turns rolling a pair of six sided dice to determine possible moves. The value on each die corresponds to the number of points a single piece may be moved forward on the board. Players move pieces away from the home board. If doubles is rolled then the moves on the dice are made twice. A legal move is one in which a piece lands on a point where there are 1 or fewer enemy pieces. If there is one enemy piece at that location it is captured. A captured piece is placed on the bar and must be moved onto that player s home board before any other pieces that player controls may be moved. If all legal moves of captured piece are blocked by enemy pieces then that player can make no moves for that turn. Additionally, a player must make as many moves as possible during their turn. So if there is an option to make only 1 move or to make 2 moves the player must choose the move combination of 2 moves. The object of the game is to move all the checkers to the opponent s home board and then bear them off. The first player to bear off all checkers wins the game. In order to bear pieces off checkers a player must have all remaining pieces in the opponents home board. Backgammon has special scoring rules when gambling which we do not use for our domain. A win is 1 reward and a loss is -1 reward. The Backgammon domain is moderately stochastic with 15 possible states that any action can lead to. Sparse sampling isn t useful. The other interesting feature of this domain is that the number of possible actions from any given state can vary wildly. Some states may have no legal actions while others may have as many as a thousand. This means that for a fixed number of UCT trajectories the current

37 27 action may end up having a well explored or a sparsely explored set of root actions. 6.2 Biniax Biniax is a newer arcade style game that can be found online free to play. No previous research that we know of has been done in this domain. Biniax is a highly stochastic single agent domain. The agent controls a single element on a 5 by 7 board. An action consists of moving the single element to an adjacent non diagonal location (North, South, East and West). Some locations on the board are empty while others contain element pairs. A move can be made into an empty space or a space with an element pair where one element in the pair is the same as the player s element. When the piece moves into a location with an element pair it will change it s element value to that of the other element in the element pair. Every 2 moves all element pairs and the agent s element are moved down one location. Element-pairs move off the bottom of the board but the player s element does not. The top row fills in with 4 random element pairs and one random empty location. As the game progresses the number of elements possible in the element pairs increases. The game starts with 4 possible elements. The goal of the agent is to survive for as many turns as possible (1 reward per action). The game ends when the agent can make no more legal moves; this occurs only when the agent is on the bottom row and there are either no spaces the element can move into or any of the moves will cause the element to get pushed off the board by a falling element pair. Some examples are given by figure 6.3.

38 28 Figure 6.3: Biniax Examples Element moves North taking an element pair. (Free Moves:2) (Free Moves:1) : A-D A-D B-C A-C: : A-D A-D B-C A-C: :A-B A-B A-C A-B: :A-B A-B [C] A-B: : [A] : : : : : : : Element moves East getting pushed down by element pair. (Free Moves:1) (Free Moves:2) : A-B B-D B-C A-B: :B-D A-D C-D C-D: : A-D A-D B-C A-C: : A-B B-D B-C A-B: :A-B [A] A-B: : A-D A-D B-C A-C: :C-D : :A-B [A] A-B: Element moves East and takes element pair moving down. (Free Moves:1) (Free Moves:2) : A-B B-D B-C A-B: :B-D A-D C-D C-D: : A-D A-D A-C A-C: : A-B B-D B-C A-B: :A-B [A] A-B: : A-D A-D [C] A-C: :C-D : :A-B A-B: Element has no legal moves in each situation. (Free Moves:1) (Free Moves:1) :A-C A-C A-B A-D: :A-C A-C A-B A-D: :B-D C-D A-B A-C: :B-D C-D A-B A-C: : A-B A-C A-D B-D: : A-B A-D A-D B-D: :A-D A-B A-D [C]: :A-D [C] :

39 29 Biniax is an ideal domain to apply sparse sampling methods because it is highly stochastic and base UCT has trouble building a deep tree. Every other action ends up being a stochastic action with the following number of equally possible 4 nelements! states. Number of Next States = 5 Where nelements is (nelements 2)!2! the number of elements currently being generated. 6.3 Connect 4 Connect 4 has been the subject of past research because it was challenging enough of a domain to warrant study but had a significantly smaller state space than games such as chess or go. It is now a solved domain where agents with access to the move database can make optimal decisions quickly. Connect 4 is a connection game played on a grid of height 6 and width 7. It is a 2 agent deterministic domain where each agent alternates placing pieces on the board. The objective of the game is to create a connection of 4 pieces in a row either vertically, horizontally or diagonally. When a piece is placed on the board a column is chosen and the piece moves down the column until it sits above a non empty location. In this way there are a maximum of 7 actions per state where placing a piece in a column is a legal action if that column has at least one empty location. The game ends as a draw if the game board contains no empty locations and there are no four in a rows. In our research this domain was successful both because of its simplicity and the amount of previous work done. Fast open source implementations of the sim-

40 30 ulator for this domain could be found online and allowed us to generate trees with extremely high numbers of trajectories compared to the other domains. This domain was also challenging enough that UCT didn t converge on the optimal moves too quickly. 6.4 Havannah Havannah is a connection game invented by game designer Christian Freeling. The game is based on the classic connection game Hex. No existing computer agent is capable of beating the best human players on a full sized board (side length of 10). Christian Freeling has put out a prize for anyone that can create an agent that can beat him in 1 of 10 games. Havannah is a connection game where pieces are placed alternately by two competing players. Once a piece is placed it remains on the board for the remainder of the game. There are three possible win conditions; bridge, fork or ring. A bridge is a connection from 2 of the six corners, a fork is a connection of 3 of the six sides on the board where a side does not include a corner and a ring surrounds either enemy or empty locations. Figure 6.4 depicts a Havannah board of side length 8 with examples of the three possible win conditions highlighted in blue. For our tests we used a board with side length of 5 to speed up the collection of results. Unlike Hex, Havannah is not a determined game but it is rare to end in a draw. It is deterministic and has a maximum number of possible moves equal to

41 31 Figure 6.4: Havannah Board the number of empty spaces on the board so all search trees have a fixed depth. For our board size we had a maximum of 61 actions per state. 6.5 Yahtzee Yahtzee is a single agent stochastic domain. In its present form it has been around since An optimal algorithm has been found that has an average score of with a standard deviation of [7]. Also expert human players have been able to average around 250 points over the course of many games. The object of the game is to score the most points by rolling 5 dice to make certain combinations. The game consists of 13 rounds. In each round the dice may be rolled up to 3 times. All dice are rolled in the first roll of a round. In the second and third rolls the player may choose which dice to roll and which to keep

42 32 the same. After the third roll the agent must select an appropriate category to score the dice. There are 13 categories to choose from as described below. After a category has been scored it cannot be chosen again. There are two sections that may be scored; the upper section and the lower section. The upper section consists of 6 scoring categories: Ones, Twos, Threes, Fours, Fives and Sixes. These categories are scored by summing the total of matching die faces. For example, if you rolled [ ] and placed it into the Fives category you would receive 10 points. If you put the same combination in the Ones category you would receive 1 point. If the total of Upper scores is 63 or more, add a bonus of 35 points. The lower section categories are either scored a set amount or zero if the category requirements are not satisfied. Categories three and four of a kind need 3 and 4 of the same die faces respectively. These categories score the sum of all the dice. A Straight is a sequence of consecutive die faces, where a small straight is 4 consecutive faces and a large straight is 5 consecutive faces. Small straights score 30 and large straights 40 points. A Full House is where you have 3 of a kind and 2 of a kind. Full houses score 25 points. A Yahtzee is 5 of a kind and scores 50 points. The Chance category is scored by the sum of the dice. Each Yahtzee rolled after the Yahtzee category has scored 50 points yields a 100 point bonus. This roll must be put into another category, as follows: if the corresponding upper section category is not filled then the Yahtzee must be score there. For example, if [ ] is rolled, the Fours category must be scored if not filled. If the corresponding upper section category is filled you may then put the score anywhere on the upper

43 33 section or lower section and score appropriately. Yahtzee is a highly stochastic domain but actions taken by an agent can control the stochasticity, which can range from 1 to 252. We found that sparse sampling did not help much in this domain since good moves tended to limit the number of possible states.

44 34 Chapter 7 Empirical Results The main results that we were interested in were the effect of ensembles (root parallelization) applied to base UCT in various domains. Specifically, the benefits of ensembles in parallel time, sequential time and memory usage. Some secondary results we were interested in included the usefulness of sparseness in stochastic domains, the effect of ensembles with a sub optimal UCT constant and increased time per trajectory as a tree increases in size. We will present the ensemble results along with some other interesting results. 7.1 Experimental Setup Our timing results were collected on one machine with GHz dual core Intel Xeon processors and 24 gigabytes of memory. Our Java library only ran a single thread so each test used one core for processing and a second core for garbage collection (Java s way of cleaning up dynamically allocated memory). For each domain we started by generating a reasonable UCT curve. This meant first finding a UCT constant for base UCT and then if the domain was stochastic finding a sparseness value; in some cases this value was infinite (same as base UCT without sparseness). An example of how we determined a reasonable UCT constant for our original Yahtzee domain is given by figure 7.1. We increased the UCT constant

45 35 Figure 7.1: Yahtzee UCT Constant Table Trajectories UCT Constant across a range of trajectories until there was no further improvement to base UCT. For the deterministic domains we set the sparse sample size to 1 rather than infinite. The two values are equivalent logically for domains with only 1 state per action but sparseness size of 1 will run faster because the simulator will only need to be called the first time the state is visited from that action rather than each time. It is important to point out that this is not true for stochastic domains. For example, the domain of backgammon always has 15 distinct possible states from any one action. Yet sparse sample size of 15 is not logically equivalent to a sparse sample size of infinite. This is because during the course of filling up the 15 child states of the action there may be a state that is chosen more than once leaving out another state. Once the number of visits to the action node is 15 the current set of states is fixed (no other new states can be added) and the distribution remains unchanged. For domains that were 2 agents the opposing agent was a UCT agent fixed at some number of trajectories. From this point ensemble table results could be collected for 2, 4, 8 and 16 ensembles. A point to make about our timing measurements is that we assume that a single trajectory added to a tree takes constant time. This isn t completely true because

46 36 Figure 7.2: Connect 4 Ensemble Timing Table (ms) Total Ensembles Trajectories ± ± ± ± ± 6 as the tree becomes larger the time taken per trajectory increases. For some domains we found this time to be more of a factor than others but for the most part it was not huge. However, this means that the ensembles are usually performing slightly better than indicated by the results. Figure 7.2 gives an example from the Connect 4 domain of these results. The values in the table represent the average time it takes for the algorithm to select an action in milliseconds. The values to the right of ± indicates the %99 confidence interval for that value. Another thing to note before analyzing the results of the ensembles is that the only domain that used sparseness in these tests was Biniax. We found that for domains with low stochasticity such as Backgammon and Yahtzee sparse sampling did not improve performance in a significant way. However, for a highly stochastic domain such as Biniax where the tree was unable to grow deep, sparseness had a significant improvement. These results are shown in figure 7.3. The far left column shows the infinite sparseness values of base UCT. The sparseness values starting from 1 improve upon infinite sparseness up to 8 before declining again. We ended up setting the sparseness of Biniax to 8 for our tests.

47 37 Figure 7.3: Biniax Sparseness Number of Sparseness Trajectories General Results Figures 7.4 to 7.9 show a comparison of base UCT and various numbers of ensemble UCT for each domain. There is also a sixth alternate Yahtzee domain. This was our original Yahtzee domain but there was a flaw in how the scoring was done so that it was slightly easier than the normal Yahtzee domain. This isn t a problem except that we wanted to compare our results to the optimal Yahtzee values that have already been found. The domains with more than one agent (Backgammon, Connect4 and Havannah) all have a base UCT agent they compete against. All the base UCT agents use the UCT constant values from figure 6. The number of trajectories is set different for each domain. Backgammon is 256, Connect 4 is 4096 and Havannah is 128. These values allowed us to control the strength of the domain we were testing against. If the value was too large then the tests would take too long because both the base agent would take a while and the variable agent would need to be larger as well. On the other hand if the value was too low

48 38 then the base UCT version of the variable agent would quickly win %100 of the games and ensembles cannot improve from there. Since the Connect 4 simulator was faster and had smaller states than Backgammon and Havannah it was also able to generate much larger trees with less memory and in less time. For each ensemble table the column indicates the number of trajectories UCT uses to build each tree while the rows indicate the number of ensemble trees used to determine the correct action. We constructed the tables in such a way that it would be easy to compare the results of increasing the number of ensembles to that of increasing the number of trajectories. The number of ensembles and trajectories both double with each increase. In this way the differences in sequential time between various ensemble values using the same number of overall trajectories could be easily compared. For instance, on any of the tables starting from a base UCT result moving diagonally up and to the right results in 2 ensemble trees each with half the number of trajectories. Continuing along this diagonal all of the results use the same number of total trajectories. Just moving down a row shows parallel time benefits of using ensembles. The values in each table are the average reward acquired over a series of games (usually 1000 to 4000) and the values to the right of ± indicates the %99 confidence interval for that value. The tables show values increasing from lower numbers of ensembles to higher numbers of ensembles given a fixed number of trajectories. This indicates that ensembles improve performance in general when using parallel time. There are cases where ensembles don t improve performance but there were no statistically significant cases where performance suffered due to ensembles. This was the case for all

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask Set 4: Game-Playing ICS 271 Fall 2017 Kalev Kask Overview Computer programs that play 2-player games game-playing as search with the complication of an opponent General principles of game-playing and search

More information

Optimal Yahtzee performance in multi-player games

Optimal Yahtzee performance in multi-player games Optimal Yahtzee performance in multi-player games Andreas Serra aserra@kth.se Kai Widell Niigata kaiwn@kth.se April 12, 2013 Abstract Yahtzee is a game with a moderately large search space, dependent on

More information

CS-E4800 Artificial Intelligence

CS-E4800 Artificial Intelligence CS-E4800 Artificial Intelligence Jussi Rintanen Department of Computer Science Aalto University March 9, 2017 Difficulties in Rational Collective Behavior Individual utility in conflict with collective

More information

OCTAGON 5 IN 1 GAME SET

OCTAGON 5 IN 1 GAME SET OCTAGON 5 IN 1 GAME SET CHESS, CHECKERS, BACKGAMMON, DOMINOES AND POKER DICE Replacement Parts Order direct at or call our Customer Service department at (800) 225-7593 8 am to 4:30 pm Central Standard

More information

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5 CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5 Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri Topics Game playing Game trees

More information

AI Approaches to Ultimate Tic-Tac-Toe

AI Approaches to Ultimate Tic-Tac-Toe AI Approaches to Ultimate Tic-Tac-Toe Eytan Lifshitz CS Department Hebrew University of Jerusalem, Israel David Tsurel CS Department Hebrew University of Jerusalem, Israel I. INTRODUCTION This report is

More information

Intuition Mini-Max 2

Intuition Mini-Max 2 Games Today Saying Deep Blue doesn t really think about chess is like saying an airplane doesn t really fly because it doesn t flap its wings. Drew McDermott I could feel I could smell a new kind of intelligence

More information

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage

Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage Comparison of Monte Carlo Tree Search Methods in the Imperfect Information Card Game Cribbage Richard Kelly and David Churchill Computer Science Faculty of Science Memorial University {richard.kelly, dchurchill}@mun.ca

More information

Nested Monte-Carlo Search

Nested Monte-Carlo Search Nested Monte-Carlo Search Tristan Cazenave LAMSADE Université Paris-Dauphine Paris, France cazenave@lamsade.dauphine.fr Abstract Many problems have a huge state space and no good heuristic to order moves

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search 1 By the end, you will know Why we use Monte Carlo Search Trees The pros and cons of MCTS How it is applied to Super Mario Brothers and Alpha Go 2 Outline I. Pre-MCTS Algorithms

More information

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH Santiago Ontañón so367@drexel.edu Recall: Adversarial Search Idea: When there is only one agent in the world, we can solve problems using DFS, BFS, ID,

More information

Game Playing AI Class 8 Ch , 5.4.1, 5.5

Game Playing AI Class 8 Ch , 5.4.1, 5.5 Game Playing AI Class Ch. 5.-5., 5.4., 5.5 Bookkeeping HW Due 0/, :59pm Remaining CSP questions? Cynthia Matuszek CMSC 6 Based on slides by Marie desjardin, Francisco Iacobelli Today s Class Clear criteria

More information

Free Cell Solver. Copyright 2001 Kevin Atkinson Shari Holstege December 11, 2001

Free Cell Solver. Copyright 2001 Kevin Atkinson Shari Holstege December 11, 2001 Free Cell Solver Copyright 2001 Kevin Atkinson Shari Holstege December 11, 2001 Abstract We created an agent that plays the Free Cell version of Solitaire by searching through the space of possible sequences

More information

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( ) COMP3211 Project Artificial Intelligence for Tron game Group 7 Chiu Ka Wa (20369737) Chun Wai Wong (20265022) Ku Chun Kit (20123470) Abstract Tron is an old and popular game based on a movie of the same

More information

Adversarial Search (Game Playing)

Adversarial Search (Game Playing) Artificial Intelligence Adversarial Search (Game Playing) Chapter 5 Adapted from materials by Tim Finin, Marie desjardins, and Charles R. Dyer Outline Game playing State of the art and resources Framework

More information

Ar#ficial)Intelligence!!

Ar#ficial)Intelligence!! Introduc*on! Ar#ficial)Intelligence!! Roman Barták Department of Theoretical Computer Science and Mathematical Logic So far we assumed a single-agent environment, but what if there are more agents and

More information

A Bandit Approach for Tree Search

A Bandit Approach for Tree Search A An Example in Computer-Go Department of Statistics, University of Michigan March 27th, 2008 A 1 Bandit Problem K-Armed Bandit UCB Algorithms for K-Armed Bandit Problem 2 Classical Tree Search UCT Algorithm

More information

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Monte Carlo Tree Search and AlphaGo Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Zero-Sum Games and AI A player s utility gain or loss is exactly balanced by the combined gain or loss of opponents:

More information

More on games (Ch )

More on games (Ch ) More on games (Ch. 5.4-5.6) Alpha-beta pruning Previously on CSci 4511... We talked about how to modify the minimax algorithm to prune only bad searches (i.e. alpha-beta pruning) This rule of checking

More information

Playing Othello Using Monte Carlo

Playing Othello Using Monte Carlo June 22, 2007 Abstract This paper deals with the construction of an AI player to play the game Othello. A lot of techniques are already known to let AI players play the game Othello. Some of these techniques

More information

game tree complete all possible moves

game tree complete all possible moves Game Trees Game Tree A game tree is a tree the nodes of which are positions in a game and edges are moves. The complete game tree for a game is the game tree starting at the initial position and containing

More information

Game Playing Beyond Minimax. Game Playing Summary So Far. Game Playing Improving Efficiency. Game Playing Minimax using DFS.

Game Playing Beyond Minimax. Game Playing Summary So Far. Game Playing Improving Efficiency. Game Playing Minimax using DFS. Game Playing Summary So Far Game tree describes the possible sequences of play is a graph if we merge together identical states Minimax: utility values assigned to the leaves Values backed up the tree

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Adversarial Search Vibhav Gogate The University of Texas at Dallas Some material courtesy of Rina Dechter, Alex Ihler and Stuart Russell, Luke Zettlemoyer, Dan Weld Adversarial

More information

CMPT 310 Assignment 1

CMPT 310 Assignment 1 CMPT 310 Assignment 1 October 16, 2017 100 points total, worth 10% of the course grade. Turn in on CourSys. Submit a compressed directory (.zip or.tar.gz) with your solutions. Code should be submitted

More information

CS 387: GAME AI BOARD GAMES

CS 387: GAME AI BOARD GAMES CS 387: GAME AI BOARD GAMES 5/28/2015 Instructor: Santiago Ontañón santi@cs.drexel.edu Class website: https://www.cs.drexel.edu/~santi/teaching/2015/cs387/intro.html Reminders Check BBVista site for the

More information

Monte Carlo based battleship agent

Monte Carlo based battleship agent Monte Carlo based battleship agent Written by: Omer Haber, 313302010; Dror Sharf, 315357319 Introduction The game of battleship is a guessing game for two players which has been around for almost a century.

More information

Game-Playing & Adversarial Search

Game-Playing & Adversarial Search Game-Playing & Adversarial Search This lecture topic: Game-Playing & Adversarial Search (two lectures) Chapter 5.1-5.5 Next lecture topic: Constraint Satisfaction Problems (two lectures) Chapter 6.1-6.4,

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Programming an Othello AI Michael An (man4), Evan Liang (liange) Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black

More information

Monte Carlo tree search techniques in the game of Kriegspiel

Monte Carlo tree search techniques in the game of Kriegspiel Monte Carlo tree search techniques in the game of Kriegspiel Paolo Ciancarini and Gian Piero Favini University of Bologna, Italy 22 IJCAI, Pasadena, July 2009 Agenda Kriegspiel as a partial information

More information

Artificial Intelligence. Minimax and alpha-beta pruning

Artificial Intelligence. Minimax and alpha-beta pruning Artificial Intelligence Minimax and alpha-beta pruning In which we examine the problems that arise when we try to plan ahead to get the best result in a world that includes a hostile agent (other agent

More information

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) AI Plays 2048 Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng) Abstract The strategy game 2048 gained great popularity quickly. Although it is easy to play, people cannot win the game easily,

More information

Artificial Intelligence Adversarial Search

Artificial Intelligence Adversarial Search Artificial Intelligence Adversarial Search Adversarial Search Adversarial search problems games They occur in multiagent competitive environments There is an opponent we can t control planning again us!

More information

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009 By David Anderson SZTAKI (Budapest, Hungary) WPI D2009 1997, Deep Blue won against Kasparov Average workstation can defeat best Chess players Computer Chess no longer interesting Go is much harder for

More information

Mind Ninja The Game of Boundless Forms

Mind Ninja The Game of Boundless Forms Mind Ninja The Game of Boundless Forms Nick Bentley 2007-2008. email: nickobento@gmail.com Overview Mind Ninja is a deep board game for two players. It is 2007 winner of the prestigious international board

More information

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions Slides by Svetlana Lazebnik, 9/2016 Modified by Mark Hasegawa Johnson, 9/2017 Types of game environments Perfect

More information

Experiments on Alternatives to Minimax

Experiments on Alternatives to Minimax Experiments on Alternatives to Minimax Dana Nau University of Maryland Paul Purdom Indiana University April 23, 1993 Chun-Hung Tzeng Ball State University Abstract In the field of Artificial Intelligence,

More information

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville

Computer Science and Software Engineering University of Wisconsin - Platteville. 4. Game Play. CS 3030 Lecture Notes Yan Shi UW-Platteville Computer Science and Software Engineering University of Wisconsin - Platteville 4. Game Play CS 3030 Lecture Notes Yan Shi UW-Platteville Read: Textbook Chapter 6 What kind of games? 2-player games Zero-sum

More information

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game Outline Game Playing ECE457 Applied Artificial Intelligence Fall 2007 Lecture #5 Types of games Playing a perfect game Minimax search Alpha-beta pruning Playing an imperfect game Real-time Imperfect information

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence selman@cs.cornell.edu Module: Adversarial Search R&N: Chapter 5 1 Outline Adversarial Search Optimal decisions Minimax α-β pruning Case study: Deep Blue

More information

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie! Games CSE 473 Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie! Games in AI In AI, games usually refers to deteristic, turntaking, two-player, zero-sum games of perfect information Deteristic:

More information

5.4 Imperfect, Real-Time Decisions

5.4 Imperfect, Real-Time Decisions 5.4 Imperfect, Real-Time Decisions Searching through the whole (pruned) game tree is too inefficient for any realistic game Moves must be made in a reasonable amount of time One has to cut off the generation

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 7: Minimax and Alpha-Beta Search 2/9/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein 1 Announcements W1 out and due Monday 4:59pm P2

More information

CS221 Project Final Report Gomoku Game Agent

CS221 Project Final Report Gomoku Game Agent CS221 Project Final Report Gomoku Game Agent Qiao Tan qtan@stanford.edu Xiaoti Hu xiaotihu@stanford.edu 1 Introduction Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally

More information

Using Artificial intelligent to solve the game of 2048

Using Artificial intelligent to solve the game of 2048 Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial

More information

CS 771 Artificial Intelligence. Adversarial Search

CS 771 Artificial Intelligence. Adversarial Search CS 771 Artificial Intelligence Adversarial Search Typical assumptions Two agents whose actions alternate Utility values for each agent are the opposite of the other This creates the adversarial situation

More information

CS188 Spring 2014 Section 3: Games

CS188 Spring 2014 Section 3: Games CS188 Spring 2014 Section 3: Games 1 Nearly Zero Sum Games The standard Minimax algorithm calculates worst-case values in a zero-sum two player game, i.e. a game in which for all terminal states s, the

More information

DELUXE 3 IN 1 GAME SET

DELUXE 3 IN 1 GAME SET Chess, Checkers and Backgammon August 2012 UPC Code 7-19265-51276-9 HOW TO PLAY CHESS Chess Includes: 16 Dark Chess Pieces 16 Light Chess Pieces Board Start Up Chess is a game played by two players. One

More information

Learning from Hints: AI for Playing Threes

Learning from Hints: AI for Playing Threes Learning from Hints: AI for Playing Threes Hao Sheng (haosheng), Chen Guo (cguo2) December 17, 2016 1 Introduction The highly addictive stochastic puzzle game Threes by Sirvo LLC. is Apple Game of the

More information

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence Adversarial Search CS 486/686: Introduction to Artificial Intelligence 1 Introduction So far we have only been concerned with a single agent Today, we introduce an adversary! 2 Outline Games Minimax search

More information

Monte Carlo Tree Search. Simon M. Lucas

Monte Carlo Tree Search. Simon M. Lucas Monte Carlo Tree Search Simon M. Lucas Outline MCTS: The Excitement! A tutorial: how it works Important heuristics: RAVE / AMAF Applications to video games and real-time control The Excitement Game playing

More information

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1 Unit-III Chap-II Adversarial Search Created by: Ashish Shah 1 Alpha beta Pruning In case of standard ALPHA BETA PRUNING minimax tree, it returns the same move as minimax would, but prunes away branches

More information

More on games (Ch )

More on games (Ch ) More on games (Ch. 5.4-5.6) Announcements Midterm next Tuesday: covers weeks 1-4 (Chapters 1-4) Take the full class period Open book/notes (can use ebook) ^^ No programing/code, internet searches or friends

More information

5.4 Imperfect, Real-Time Decisions

5.4 Imperfect, Real-Time Decisions 116 5.4 Imperfect, Real-Time Decisions Searching through the whole (pruned) game tree is too inefficient for any realistic game Moves must be made in a reasonable amount of time One has to cut off the

More information

Artificial Intelligence Search III

Artificial Intelligence Search III Artificial Intelligence Search III Lecture 5 Content: Search III Quick Review on Lecture 4 Why Study Games? Game Playing as Search Special Characteristics of Game Playing Search Ingredients of 2-Person

More information

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function Davis Ancona and Jake Weiner Abstract In this report, we examine the plausibility of implementing a NEAT-based solution

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

Adversarial Search and Game Playing. Russell and Norvig: Chapter 5

Adversarial Search and Game Playing. Russell and Norvig: Chapter 5 Adversarial Search and Game Playing Russell and Norvig: Chapter 5 Typical case 2-person game Players alternate moves Zero-sum: one player s loss is the other s gain Perfect information: both players have

More information

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I Adversarial Search and Game- Playing C H A P T E R 6 C M P T 3 1 0 : S P R I N G 2 0 1 1 H A S S A N K H O S R A V I Adversarial Search Examine the problems that arise when we try to plan ahead in a world

More information

Automated Suicide: An Antichess Engine

Automated Suicide: An Antichess Engine Automated Suicide: An Antichess Engine Jim Andress and Prasanna Ramakrishnan 1 Introduction Antichess (also known as Suicide Chess or Loser s Chess) is a popular variant of chess where the objective of

More information

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters Scott Watson, Andrew Vardy, Wolfgang Banzhaf Department of Computer Science Memorial University of Newfoundland St John s.

More information

Game-playing: DeepBlue and AlphaGo

Game-playing: DeepBlue and AlphaGo Game-playing: DeepBlue and AlphaGo Brief history of gameplaying frontiers 1990s: Othello world champions refuse to play computers 1994: Chinook defeats Checkers world champion 1997: DeepBlue defeats world

More information

EXPLORING TIC-TAC-TOE VARIANTS

EXPLORING TIC-TAC-TOE VARIANTS EXPLORING TIC-TAC-TOE VARIANTS By Alec Levine A SENIOR RESEARCH PAPER PRESENTED TO THE DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE OF STETSON UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 4: Adversarial Search 10/12/2009 Luke Zettlemoyer Based on slides from Dan Klein Many slides over the course adapted from either Stuart Russell or Andrew

More information

Backgammon Basics And How To Play

Backgammon Basics And How To Play Backgammon Basics And How To Play Backgammon is a game for two players, played on a board consisting of twenty-four narrow triangles called points. The triangles alternate in color and are grouped into

More information

CS 221 Othello Project Professor Koller 1. Perversi

CS 221 Othello Project Professor Koller 1. Perversi CS 221 Othello Project Professor Koller 1 Perversi 1 Abstract Philip Wang Louis Eisenberg Kabir Vadera pxwang@stanford.edu tarheel@stanford.edu kvadera@stanford.edu In this programming project we designed

More information

Adversary Search. Ref: Chapter 5

Adversary Search. Ref: Chapter 5 Adversary Search Ref: Chapter 5 1 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans is possible. Many games can be modeled very easily, although

More information

Decision Making in Multiplayer Environments Application in Backgammon Variants

Decision Making in Multiplayer Environments Application in Backgammon Variants Decision Making in Multiplayer Environments Application in Backgammon Variants PhD Thesis by Nikolaos Papahristou AI researcher Department of Applied Informatics Thessaloniki, Greece Contributions Expert

More information

Generalized Game Trees

Generalized Game Trees Generalized Game Trees Richard E. Korf Computer Science Department University of California, Los Angeles Los Angeles, Ca. 90024 Abstract We consider two generalizations of the standard two-player game

More information

AN ABSTRACT OF THE THESIS OF

AN ABSTRACT OF THE THESIS OF AN ABSTRACT OF THE THESIS OF Jason Aaron Greco for the degree of Honors Baccalaureate of Science in Computer Science presented on August 19, 2010. Title: Automatically Generating Solutions for Sokoban

More information

Application of UCT Search to the Connection Games of Hex, Y, *Star, and Renkula!

Application of UCT Search to the Connection Games of Hex, Y, *Star, and Renkula! Application of UCT Search to the Connection Games of Hex, Y, *Star, and Renkula! Tapani Raiko and Jaakko Peltonen Helsinki University of Technology, Adaptive Informatics Research Centre, P.O. Box 5400,

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory AI Challenge One 140 Challenge 1 grades 120 100 80 60 AI Challenge One Transform to graph Explore the

More information

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence Adversarial Search CS 486/686: Introduction to Artificial Intelligence 1 AccessAbility Services Volunteer Notetaker Required Interested? Complete an online application using your WATIAM: https://york.accessiblelearning.com/uwaterloo/

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificial Intelligence Spring 2007 Lecture 7: CSP-II and Adversarial Search 2/6/2007 Srini Narayanan ICSI and UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell or

More information

ARTIFICIAL INTELLIGENCE (CS 370D)

ARTIFICIAL INTELLIGENCE (CS 370D) Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-5) ADVERSARIAL SEARCH ADVERSARIAL SEARCH Optimal decisions Min algorithm α-β pruning Imperfect,

More information

SEARCHING is both a method of solving problems and

SEARCHING is both a method of solving problems and 100 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 3, NO. 2, JUNE 2011 Two-Stage Monte Carlo Tree Search for Connect6 Shi-Jim Yen, Member, IEEE, and Jung-Kuei Yang Abstract Recently,

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Adversarial Search Instructor: Stuart Russell University of California, Berkeley Game Playing State-of-the-Art Checkers: 1950: First computer player. 1959: Samuel s self-taught

More information

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu DeepStack: Expert-Level AI in Heads-Up No-Limit Poker Surya Prakash Chembrolu AI and Games AlphaGo Go Watson Jeopardy! DeepBlue -Chess Chinook -Checkers TD-Gammon -Backgammon Perfect Information Games

More information

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1 Lecture 14 Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1 Outline Chapter 5 - Adversarial Search Alpha-Beta Pruning Imperfect Real-Time Decisions Stochastic Games Friday,

More information

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5 Adversarial Search CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2017 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5 Outline Game

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 6. Board Games Search Strategies for Games, Games with Chance, State of the Art Joschka Boedecker and Wolfram Burgard and Frank Hutter and Bernhard Nebel Albert-Ludwigs-Universität

More information

Adversarial Search Aka Games

Adversarial Search Aka Games Adversarial Search Aka Games Chapter 5 Some material adopted from notes by Charles R. Dyer, U of Wisconsin-Madison Overview Game playing State of the art and resources Framework Game trees Minimax Alpha-beta

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

AN ABSTRACT OF THE THESIS OF

AN ABSTRACT OF THE THESIS OF AN ABSTRACT OF THE THESIS OF Radha-Krishna Balla for the degree of Master of Science in Computer Science presented on February 19, 2009. Title: UCT for Tactical Assault Battles in Real-Time Strategy Games.

More information

CS 331: Artificial Intelligence Adversarial Search II. Outline

CS 331: Artificial Intelligence Adversarial Search II. Outline CS 331: Artificial Intelligence Adversarial Search II 1 Outline 1. Evaluation Functions 2. State-of-the-art game playing programs 3. 2 player zero-sum finite stochastic games of perfect information 2 1

More information

MONTE-CARLO TWIXT. Janik Steinhauer. Master Thesis 10-08

MONTE-CARLO TWIXT. Janik Steinhauer. Master Thesis 10-08 MONTE-CARLO TWIXT Janik Steinhauer Master Thesis 10-08 Thesis submitted in partial fulfilment of the requirements for the degree of Master of Science of Artificial Intelligence at the Faculty of Humanities

More information

Beeches Holiday Lets Games Manual

Beeches Holiday Lets Games Manual Beeches Holiday Lets Games Manual www.beechesholidaylets.co.uk Page 1 Contents Shut the box... 3 Yahtzee Instructions... 5 Overview... 5 Game Play... 5 Upper Section... 5 Lower Section... 5 Combinations...

More information

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game?

Game Tree Search. CSC384: Introduction to Artificial Intelligence. Generalizing Search Problem. General Games. What makes something a game? CSC384: Introduction to Artificial Intelligence Generalizing Search Problem Game Tree Search Chapter 5.1, 5.2, 5.3, 5.6 cover some of the material we cover here. Section 5.6 has an interesting overview

More information

Theory and Practice of Artificial Intelligence

Theory and Practice of Artificial Intelligence Theory and Practice of Artificial Intelligence Games Daniel Polani School of Computer Science University of Hertfordshire March 9, 2017 All rights reserved. Permission is granted to copy and distribute

More information

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Valentijn Muijrers 3275183 Valentijn.Muijrers@phil.uu.nl Supervisor: Gerard Vreeswijk 7,5 ECTS

More information

An Empirical Evaluation of Policy Rollout for Clue

An Empirical Evaluation of Policy Rollout for Clue An Empirical Evaluation of Policy Rollout for Clue Eric Marshall Oregon State University M.S. Final Project marshaer@oregonstate.edu Adviser: Professor Alan Fern Abstract We model the popular board game

More information

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel Foundations of AI 6. Adversarial Search Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard & Bernhard Nebel Contents Game Theory Board Games Minimax Search Alpha-Beta Search

More information

CMSC 671 Project Report- Google AI Challenge: Planet Wars

CMSC 671 Project Report- Google AI Challenge: Planet Wars 1. Introduction Purpose The purpose of the project is to apply relevant AI techniques learned during the course with a view to develop an intelligent game playing bot for the game of Planet Wars. Planet

More information

CS61B, Fall 2014 Project #2: Jumping Cubes(version 3) P. N. Hilfinger

CS61B, Fall 2014 Project #2: Jumping Cubes(version 3) P. N. Hilfinger CSB, Fall 0 Project #: Jumping Cubes(version ) P. N. Hilfinger Due: Tuesday, 8 November 0 Background The KJumpingCube game is a simple two-person board game. It is a pure strategy game, involving no element

More information

Game-playing AIs: Games and Adversarial Search I AIMA

Game-playing AIs: Games and Adversarial Search I AIMA Game-playing AIs: Games and Adversarial Search I AIMA 5.1-5.2 Games: Outline of Unit Part I: Games as Search Motivation Game-playing AI successes Game Trees Evaluation Functions Part II: Adversarial Search

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 6. Board Games Search Strategies for Games, Games with Chance, State of the Art Joschka Boedecker and Wolfram Burgard and Bernhard Nebel Albert-Ludwigs-Universität

More information

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46. Foundations of Artificial Intelligence May 30, 2016 46. AlphaGo and Outlook Foundations of Artificial Intelligence 46. AlphaGo and Outlook Thomas Keller Universität Basel May 30, 2016 46.1 Introduction

More information

A Quoridor-playing Agent

A Quoridor-playing Agent A Quoridor-playing Agent P.J.C. Mertens June 21, 2006 Abstract This paper deals with the construction of a Quoridor-playing software agent. Because Quoridor is a rather new game, research about the game

More information

The game of Reversi was invented around 1880 by two. Englishmen, Lewis Waterman and John W. Mollett. It later became

The game of Reversi was invented around 1880 by two. Englishmen, Lewis Waterman and John W. Mollett. It later became Reversi Meng Tran tranm@seas.upenn.edu Faculty Advisor: Dr. Barry Silverman Abstract: The game of Reversi was invented around 1880 by two Englishmen, Lewis Waterman and John W. Mollett. It later became

More information